software quality in bioinformatics…
I’d like to duplicate here a long comment I made in response to nsaunder’s post on the development of bioinformatics software he did as part of his research. This comment ended up being a long spill of thoughts from personal experiences, in regards to the approach of software is development in academic settings of the bioinformatics field. So here’s how I think this approach affects the quality of the developed software…
The bad software design due to rush of getting things ready soon, and is eminent all over the bioinformatics field. I have experienced it personally, and in addition to your “publish or perish” reason which I completely agree, I add one more: many (probably the greater part) of P.I.s for grants in bioinformatics are biologists. This translates to a boss who knows minimal or even nothing about implementing software, has no notion of software usabilty, that a clean implementation needs time, testing, user feedback etc. Also we have to throw a portion of the responsibility to funding agencies, which again put mainly biologists on the committees, have again no idea about software, give a timeline for which the funded research has to be out and no standards requirement for how the software should be implemented. This to my humble opinion is the whole reason which we see so much of the development in bioinformatics being replicated efforts. A little bit of a corporate approach where the software is a product and if it’s bad it will not sell, I think would not hurt . Having interoperable software in bioinformatics is a whole ‘nother story – even the commercial software cannot work well with each other cause the development is closed behind each company’s walls. But the difference with commercial software is the quality of code written, primarily through procedures that test it, but also because not having a single grad student to build the whole thing, take classes, write thesis, oh and not forget, please the boss with the biological insights that he gets after he / she uses the software…


on Twitter

As someone who has spent time as a product manager in the life science software industry and has been exposed to academic software, I have to agree. Here are some problems with academic software (my issues with commercial software are a different post)
1. Most of the time (although this is getting a lot better now) code quality is really bad
2. Too much “reinventing the wheel”. How many software packages that do the same thing do you need? In commercial software if you do that you’re in a market that’s going to be very unattractive (although I still think too many people duplicate)
3. Very poor QC … this even includes things like snippets of code that shouldn’t be there (leads to legal nightmares if you ever in license academic code).
4. Most of the time software development essentially stops when it is good enough (I know a lot of people who care about good software models and want to work in an academic research environment who find that absolutely frustrating)
The folks in commercial software, myself included, look to academia for some of the reasons you’ve just mentioned. As an employee of a commercial software vendor, I don’t have time to attend lectures, go to journal clubs, etc. We generally focus our attention on bioinformatics that is being done in the academic world to get an idea of where our products should be going. You’d be hard pressed to find a pharmaceutical company or biotech that doesn’t have at least some open source or grad school related algorithm or software package used in day to day life. Innovation is driven by the publish or perish paradigm. (At least, that’s what I’m hoping is happening). As for efforts being duplicated – so what. If multiple people get to similar answers writing different code it only helps solidify that the underlying hypotheses are correct.
Thank you Deepak and Andreas for the good comments.
Concerning the innovation that happens to academia I agree with you Andreas, because that’s where people really explore ideas without the pressure of having a product to the market by a deadline (well there’s the pressure to produce a publication). But I disagree that duplicated effort is good; and for the purpose of testing an algorithm I think rather than re-implementing it, it’s better if the software is easy to use by other than the original developer, so people can apply it in different datasets for the purpose of validating it.
Just think if the different groups in bioinformatics made clean software, maybe even say based on standards (Java?). Then researchers would get the code made by the different groups, plug the modules together and built on top of it. What currently happens is that you have one algorithm implemented in C++, another in Java, a third in Perl; you want to do a computational experiment using all three of them in a row, so you end up parsing the output of one software and preparing it as the input of the other. And in case you get a good idea to test a new algorithm, which combines some of the steps from the different algorithms? Not easy to do with all the different implementations….
Take as example the Linux operating system. When you want to built a new application, you have all the code base you need for window display, network connection etc etc in modules that work easy with each other, so you just focus gluing them together and build on top the functionality you need in your new software. That happens because the code base follows standards,and that’s why this operating system is complete and keeps growing…
I am a full supporter of innovation in academia, what I am arguing about is to give people the time to think, curate and research the software they are creating. Since bioinformatics unlike other sciences is heavily weighted towards software creation, rather than knowledge generated from discoveries and written in a journal article, it has to take a different approach in my opinion. Now it’s all just about to get the grant, run the wet lab part and produce the data, and get the graduate student to quickly come up with a piece of software to analyze the data. No documentation, no standards, no extensible testing of the software rather than getting a good biological conclusion from the data. And in the software the student rigged together might indeed be a good algorithmic idea; but it’ll be difficult for other people to built upon it.
To close, and again reiterate on my original post’s idea, the problem comes due to the people holding the reins (and money) in bioinformatics research in academia: people from funding agencies and P.I.s who are biologists, have no idea, and do not care what will happen to the software after the funding cycle ends (there’s always good exceptions, like the NCI’s caBIG – https://cabig.nci.nih.gov/). The measuring quantity for the success of funded research is only the publications with good biological insights, and the software gets a little paragraph in the Materials & Methods section of the publication.
This situation will hopefully start changing once the generation of currently bioinformatics students become P.I.s and committee members in funding agencies, hoping also they have seen the problems with the current paradigm. Also my personal wish is for a bioinformatics journal, which publishes articles on the software per se, and has knowledgeable reviewers that scrutinize the quality of the software as far as inter-operability is concerned… (yes N.A.R. does it currently, but the reviewers part is missing)
Thanks for the comment; first-time commenters are moderated, which is why it didn’t appear immediately.
It’s good to know that others share my frustrations! Basic problem is: the software need only be capable of generating results to go in the paper. After that, nobody cares about quality, readability, maintenance or portability. I don’t know what the answer is. Well I do – dedicated software developers in every biology department – but we won’t see that any time soon.
[...] couple of articles came out recently regarding the quality of good UI design and since this is an area [...]
[...] and not Microshoot’s, it’ll take you some time until you get it working (see also related post from this [...]