scientist vs developer (in bioinformatics)
This debate stated so boldly on the tittle above, is not something that I’m going to solve in this post. But is something that has made me thinking a lot lately, partly for personal reasons while in the process of wrapping up a PhD, and trying to evaluate what I have learned as the years have passed (and what I want to do with my life next)…
So to get right into the point, I see a distinction in the bioinformatics field between scientists who go after testing hypotheses and bringing out new knowledge, and developers, who are basically engineers that built,break, hack etc. bytes which make computers do interesting stuff. What I want to emphasize though, is the catch for the ones that lie in the middle, not few cases of these observed within this field.
Many where the times that I fell (and still falling) into this “middle” trap myself, something I realize as I look in the past and the present of my work. Let me explain with a simple example what this trap is: suppose you want to use a piece of software that visualizes some data. Suppose also that this software was not made by Microshoot (you know which one I mean), but rather is a product of a small lab, presumably a result of a graduate student’s PhD project. Let’s say that this software is written well enough (i.e. a real program, not a collection of scripts), but still since it’s one grad. student’s creation and not Microshoot’s, it’ll take you some time until you get it working (see also related post from this blog).
And where will that time go? Probably you’ll need to do some *nix hacking to get the thing working (even if it comes with make/configure files, still you might need to set-up some libraries etc). Then add some time on top of that if you have to bring your data in the format appropriate for the software (often case in bioinformatics, with its babel tower of data formats). All-in-all the setup process will take 50% of the time you decided to allocate for trying your data with that software (and that 50% might be in any scale, from a day to weeks, depending on the scale of your endeavors). The remaining 50% will go to the scientific discovery / hypothesis testing part. On the other hand, a -scientist- (don’t want to use quotes cause it might seem I’m being ironic, but still want to emphasize the word) would go for something that works right out of the box, so that he/see can spent 100% of the time testing hypotheses with the data (that’s why people use Spotfire for analysis of microarray data).
The problem that originates here if you are on the 50-50 approach, is that you get to be dead in the middle. Why this is not good ? Well, personally I enjoy building stuff and would opt for spending 100% to develop the thing and forget the hypothesis testing. Or if I didn’t like development, I’d immerse myself into reading all the papers related to the hypothesis I want to test, curate knowledge in my head, and go for 100% there (spend some good bucks too for buying Spotfire).
I know many of you will tell me you use open source software developed by small labs and available for free, hack it and make it do you job. But think about how much time you spend on that. The time spend is often pretty significant, but it’s rather invested in setting it up the software on your machine, or doing a small hack that scratches the surface of what lies beneath the years of development that it took (a student probably) to build the software. Does the investment of time to set it up offer you something in learning ? Learn your system ? Maybe, but once you learn how to setup software on a *nix machine, I think it doesn’t offer anything to repeat it with any program you wanna experiment with.
I’ll close here, but before that I’ll underline my main point for this post: I believe everything you do must give you something in return, something that make you know more and be more experienced after you did it. This may sound like seeing things just black or white, but wouldn’t be better to spend your full time in either building the softare, or transparently using it to do your biological discoveries – so in the end you become a strong developer or a strong scientist ? I see lots of the middle-way in bioinformatics, and maybe that’s because we have the bulk quantity of papers published in the field being of average quality. Do we need segregation of the developers and the scientists ? Do we need journals that publish on bioinformatics software and rigorously review its quality, and journals that publish strong theories developed using large bioinformatic data from scientists that know their specialty ? Do we need scientists that are software developers ? The answer is yours.


on Twitter

Great observations on software in bioinformatics. I think this is a natural evolution of a new field of research, especially one that is dependent on technology. I liken the current state of the field to molecular biology in *its* infancy – another tech-heavy scientific arena.
For example, the first time that I ran a yeast two-hybrid experiment, I had to request the plasmids from the original manuscript’s author, make the constructs myself using PCR, make all of the solutions, tweak the hybridization conditions, etc. Needless to say, this took a lot of work and time. After a few years, though, pharma and biotech companies standardized the method and starting selling kits. Now I can start from nothing and be running Y2H within a week. (OK, maybe not quite that fast, but you get my point.) I can still get all of the materials independently, and on the cheap, but as you point out, the trade-off is person-hours.
In any case, I don’t see this as a troubling development in bioinformatics, but as a familiar turn of events along the way to maturity.
This is also related to being lured to the dark side of bioinformatics (as described by Mike at BioinformaticsZen).
I agree that the biggest bang for your buck comes from specializing in either application development or scientific research. However, in order to give the developers adequate feedback to build good and useful apps, it certainly helps to have folks who can speak both languages. I have a background in science but have learned a few programing tools and cobbled together functional prototypes that much more skilled software engineers can refactor and improve.
Since I am a bench scientist, I do have actual experiments to do and biology papers to publish. Nonetheless, I feel the draw to the dark side of bioinformatics…it is way too easy to sit at your desk and noodle away your time analyzing data with every tool at hand. Part of what I hope that some of the ‘web2.0′ or social networking tools can do for biological researchers is give them more frequent feedback from collaborators. This might keep us from going too far down a non-productive track.
[...] month, an article debating Scientist vs Developer in Bioinformatics was posted on Web 2.0 and Semantic Web for Bioinformaticians. I would fall into the first category [...]
[...] scientist vs developer, in bioinformatics(Web 2.0 and Semantic Web for Bioinformatics) [...]