Monday, February 1, 2010

FDA Electronic Orange Book in RDF

I was hoping to have posted this entry before Christmas, but I ran into a few issues.
Issue 1: I bought a Wii console and got a bit tied up in it and Issue 2: involved trying to find a suitable application in which to demo the data in this blog.
Due to the size of the data it is impossible to use Exhibit as a front end, so I have been trying to find out what other tools are out there. Unfortunately I’m still looking for suitable tools, so I’ve decided to publish the data files while I keep looking.
Preparing this data was an experience in learning some new tools and some new features of old favourites. Everything was completed using two products. Knime for the original data manipulation and clean up, whilst Topbraid Composer was used to generate RDF and manipulate it to the appropriate graph structure. I’ve made quite a few decisions along the way including issues around URI’s but I won’t bore you with the details.
In the end I’m pretty pleased with the result, but frustrated that I can’t create a UI to show you. I have even tried to work out a sparql query to create a sample of linked data which could be shown. But I’ve also failed in this task.
For those of you who don’t know, the FDA Electronic orange book contains details of drug products, active ingredients and some patient information. Having created the data I was pretty amazed at the results.
For example it contains details of 24.5K products used to create 15.7K drugs which are made up of just 1.8K active ingredients!.
I’ve zipped the files to one folder which contains the orangebook.owl (holding the concepts and properties used) and FDAEOB.owl which contains the RDF data.
I’ve also been doing some experiments and found that it is pretty easy to link ingredients, products and drug entries to instances in the open linked data cloud using DBpedia and other sources. I’m going to give this some more thought and publish an RDF file containing these links. I’m hoping this will increase the utility of the RDF Orange book data giving people some initial jump out points to other sources so you can combine yet more information. This is the whole point of the Semantic web for me.
You can grab the files here

No comments: