OK this is a quick one… I’ve just spend 1hr fixing my LinkedIn profile. One very nice feature of this service, is that it can import your contacts from Gmail, do a string match against its database, and if some of your email contacts are already in LinkedIn, send them requests to join your professional network.
I just realize now, how easy it would have been if I had all the emails from my company going to Gmail: I would be able to easily leverage the proffesional network tracking capabilities of the LinkedIn service, keeping connected with all the people I exchanged email for business, without any effort from my part (and besides I wouldn’t have to deal with Outlook madness daily, rather do some Gmail Fu).
All these might seem self-intuitive, but look behind the lines: as many of the people I’ve met for the past couple of years (both online and in real space) we have been saying to each other, when data stay closed behind the firewalls, they are siloed and they cannot gain in value by being connected.
So put everything on the biggest existing cloud out there, the internet, and let it be connected by the duct tape called Web 2.0. A devil’s advocate might bring up the issue of privacy, which is a significant one: for example many of my email contacts might not want their address being parsed by LinkedIn, because we don’t know what LinkedIn does with them…
Now I am eager to see if I can evade IT, and fwd all my email to my Gmail account….
Imagine to be able to write code to process a large dataset, and immediately being able with a click of a button to run that code on a compute cluster, without worrying about setting up anything like job submission to a grid, resource allocation, compilation etc etc etc
Now stop imagining, because this is reality today. The details how this can happen is via your NetBeans IDE (Integrated Development Environment) and Amazon’s Elastic MapReduce. Leveraging the capability of this Amazon web service for provisioning Hadoop (the open source implementation of Google’s Map/Reduce parallel programming framework) compute clusters of any size on the fly, Karmasphere offers an amazing Netbeans plugin.
Developers can add their Amazon credentials within the IDE after installing the plugin, write their code, and perform parallel computing on as big cluster as they desire (or to be realistic, as big as the credit limit their card has). The plugin takes care of communicating with the Amazon Elastic MapReduce API and submitting the code for execution, while Amazon sets-up the cluster in minutes and pulls the desired data from S3 storage.
The most impressive thing for me in this story, is the abstraction for all the layers that used to require an expert a few years back (see MPI and C), in order to perform high performance computing. And the second impressive thing is the democratization of access to large computing resources. Definitely infrastructures and software are in place in big corporations, allowing business analysts and researchers to write and execute code on big clusters without worrying about the details of cluster setup. But how about those outside those corporations ? And how about those that don’t want to code in C or C++ ?
Through Hadoop’s Streaming option though, anyone that can write a Perl script which will operate in a large dataset on a compute cluster, by following the most abstract programming model of Map / Reduce. With a laptop having an internet connection and a credit card, a Bionformatics researcher anywhere in the world can write code, and execute it without worrying for resource constraints, but only worrying about his or her application logic.
Given the fact that I am bioinformatics engineer, sleeping, eating and drinking cloud computing, the following article caught my attention right away:
It is about GenomeQuest, a company that offers services covering all that follows completion of the sequencing of a genome (assembly, annotation etc) The most interesting part of this article, is that this company has an in-house cloud computing platform.
The best point from this article though which follows that, is that they made the infrastructure choice of an in-house cloud, which in turns allows them to scale up, by outsourcing to Amazon Web Services when business needs increase…
Now talk on scalability, and the power of the clouds and economies of scale in Bio-IT. Read the Big Switch by Nicholas Carr, and then you will be thinking like me on the issue of traditional data center VS cloud, to evolve or go the way of the Dodo (extinct).