Tools for (data) mashups & remixes
So here we go again with all those bioinformatic data locked in web portals build upon a LAMP (Linux / Apache / MySQL / PHP) stack. Some of the databases indeed they offer APIs,,, but how about these that do not ? Screen scrapers come to the rescue, but if you lazy, or busy,,, or efficient, and you don’t want to write screen scrapers maybe you should try one of these below, and see how it works. These are a few tools on my radar, to experiment with for trying to easily scrape data from webpages and see how it works mashing them up,,,
Google Gadgets: XML for defining their interface, javascript for the programmatic actions and HMTL for presenting the data.Screen scraper in Javascript ? (yes, but still you have to write it) Piggy Bank: has its own scrapers, but can easily create new ones in a snap using Solvent. The plus with Piggy Bank is that it saves everything in RDF in a local repository, and a server / repository can be installed were users share their collected data (there’s a nice peer-reviewed paper on Piggy Bank). GreaseMonkey : FireFox plugin that runs javascripts and alters the way your HTML is presented, one good example of such script for sending references from the NCBI/PubMed search directly to Connotea. Last but not least, my favorite one (well the one most experimented with, the previous ones just came up lately), Yahoo! Pipes. Module / workflow type for adding-mixing-removing-modifying-parsing-…-… data from various types of APIs, with no code writing at all (I have to admit though that connecting the right modules together in the right way, and fidgeting with them until you get the right output, takes its own time as well).
More to come as I experiment with these tools,,,,


on Twitter

[...] technologies applied in data-intensive Life Science research, has a take this week involving “Tools for (data) mashups and remixes“. These are a few tools on my radar, to experiment with for trying to easily scrape data [...]
Nice post on screen scrapers, simple and too the point
, I use python for simple html screen scrapers, but for larger projects i used extractingdata.com screen scrape which worked great, they build custom screen scrapers and data extracting programs