Sunday, October 12, 2008

Give em what they want

I’ve worked within life science informatics for many years across a variety of companies. In the life sciences things at the bench can change very quickly. This often means that existing systems and applications are inappropriate for capturing the data for the latest experiments or techniques being carried out. It’s always been a goal of mine to provide ultimately flexible systems but I have to admit that I have never succeeded. Whether it is a rigid database or UI, the process workflow, or the fact that the data is just so completely different from what we have seen before, things just don’t seem to work out. Even when we try to provide a specific system for them, it takes time to implement and could be out of date even before it is delivered and is often rejected by the users. However having said all of that, the scientists always seem to manage on their own and are often more content with their solutions than those provided by the techies. This is largely down to Microsoft Excel which has established itself as the firm favourite for data capture amongst the scientists. It doesn’t seem to matter to them that everyone is doing their own thing, using different formats and storing the files all over the place on the network. They just seem to get by. I think there is a very good reason for this, it does everything they need it to do and it does not restrict them to some workflow or constrain them to some defined UI.

However mention the words Microsoft Excel to informatics and you will get a very different response, usually consisting of expletives. For informatics, Excel is definitely not considered as a data capture environment and is a pain to integrate especially when the same data is captured in many formats. If and when informatics come to add this data to newly devised systems there is often a huge overhead in formatting, potentially hundreds of files, into a standard that can be imported. It’s partly our own fault as we should have worked with the scientists to define templates in the first place, at least as an interim solution, but for one reason or another, and I can think of many, that doesn’t happen.

I have seen how the Semantic Web aids with search and query and how it provides a flexible UI experience. However I have struggled to see how it could help with a flexible UI for data entry and especially one that requires little training or understanding from the user perspective. That is until recently when I saw a demo from Lee Feigenbaum of Cambridge Semantics of their Anzo for Excel product. It was really cool; simply map the data in your spreadsheet to an ontology and magically it’s in the database. Not only that; another user could connect to the database from excel and format the same data in a completely different way for their own purpose. Even more impressive was that a web page could be generated of the data. All of this was made even more impressive still by the fact that the data in the spreadsheets and the webpage was live and the database could be updated through either.

It is easy to see how a product like Anzo could revolutionise data capture using simple tools like Excel. After all, it’s already on everyone’s desktop, no one needs training to use it and our scientists have already shown us that it can work for them. The only issue might be that we have to get our informatics people to embrace Excel as a key tool in our box and not to dismiss it based on prior prejudice.

Friday, October 3, 2008

How things come around

It amazes me how some things seem to work out, or come around, when you least expect them to. Even when an answer isn't really important to you.
A example that happened to me recently started out when I purchased an new gps (Zumo 550). Like most gadgets these days it's multifunctional, yes it is a gps, but it can also play music and games, hold pictures and can bluetooth to my phone and helmet intercom providing directions, music and the ability to take and receive calls whilst on the move (don't worry I won't do this). As with most boys toys it's great fun playing with them, especially when they are new. After a few experiments I decided to have a look on the web for points of interest databases (POIs) that I could load up. I found some, but I also found an interesting presentation from Ordnance Survey aimed at a high level sales pitch for their POI database. In the slides they show some graphical overlays of ATM's with criminal activity and also spatial analysis overlays. This got me thinking and I wondered how one might go about asking questions like "Show me all the criminal activity within a 1 mile radius of a particular ATM" in a quey and what systems you might have to put together to do it. Being a semantic person at heart I could see that it wouldn't take much to converts POI's to RDF, which could obviously be held in a triple store. I know about Oracle's spatial cartridge for holding spatial data (well it would wouldn't it) but I couldn't see an easy way to query both the triple store, lets assume Oracle's, and the spatial data in a single sparqlish statement. This was confirmed ( it just happened to come up in conversation) during a very pleasant evening in a pub with Dean Allemang a few days later.
The following week I was at ESTC2008 in Vienna where I found myself meeting and talking with John Goodwin from Ordnance Survey's semantic research group. He's doing some really interesting stuff, and we had a lot in common especially about introducing the Semantic Web into our companies. The next day we both attended a session given by Jans Aasman where he demonstrated some of the spatial features in Allegrograph and how this could be searched in conjunction with POIs. It was a great demo and it showed me how to solve some of the questions I had posed to myself a few weeks earlier, even though it wasn't going to help in my day job.

Thursday, October 2, 2008

Introductions

This is my first ever blog entry and at the moment I have no idea what I'm going to blog about, so in this first entry I'll just introduce myself.
I'm Phil Ashworth and I'm currently a principal scientist within the informatics computational research group at UCB. I've been looking into the Semantic Web and how it can help our company evolve in a few areas for a while now.

A few years back I stumbled across this thing called the Semantic Web, which then started to take over my life. I could see how it could be used to great advantage in almost every aspect of the business. I started to read avidly about rdf, rdfs, sparql, owl an all the other things that led off from these. It was a very stimulating period, but what I didn't realise at that time, was the complete state of confusion I was in about the true meaning of the Semantic Web. Hopefully I'm on the road to recovery now, but that has largely been due to people like Dean Allemang and Irene Polikoff, attending conferences and meeting people like Eric Neumann and Susie Stephens. All of which has helped to guide me back to reality. I'll try and introduce some aspects of my journey in future blogs.

I'm currently reading "The Working Ontologist" by Dean Allemang, I really wish this book had been around a couple of years ago as it would have made everything so much clearer from the start. It's probably no surprise to Dean, but it's amazing how much I'm still learning from it.

For the most part my work has been largely been behind closed doors, however I am going to make a real effort to do something on the real web. It's not just this blog that I have created recently, I've gone a bit crazy and created accounts on Flickr and linkedin, a foaf file and even a webpage where I intend to put up some work I'm looking into about Semantic Web for molecules.

It's all a work in progress so don't expect too much too soon.