a shell logger for bioinformatics
The world of Bioinformatics can be a scary place for a Biologist. A lot of the software is hard to install, frustrating to use, and
impossible to debug. If one finds oneself needing help with a tool, it is often easier to ask
other struggling users on websites like
rather than trying to contact the original developers directly.
Sure the tools are a pain - but is it really a problem?
Yes, actually, it's a big problem.
It creates a culture of trust, which on the surface might seem nice and lovely, but has serious consequences on the quality of our research. Biologists trust that analysis.exe is doing what they think it should, whilst the developers of analysis.exe trust that Biologists will only ever use their tool as intended. That's a lot of trust - and not a lot of accountability. As mistakes can go so easily undetected in silico, is it really any surprise that results cannot be reproduced when even the scientists publishing the results struggle to understand what is happening to their data?
Enter the Bioinformatician
Whilst both Computer Science and Biology can exist as huge fields of research in their own right, it is
becoming more and more obvious that the biggest scientific advances of our lifetimes
will most likely come from the intersection of the two. How should we intergrate these two wildly different specialities?
Although Computer Scientists and Biologists rarely understand each other, both would agree it is the job of the Bioinformatician to bridge this gap. However, if you ask either what exactly Bioinformaticians should be doing, you will get subtly different answers... To the Biologist, a Bioinformatician needs to run the programs required for their analysis. To the Computer Scientist, a Bioinformatician needs to identify biological questions and present them as computational problems for them to solve. To the aspiring Bioinformatician, this can sound a lot like "know everything" - from the chemistry that provides the data, to the algorithms that manipulate it.
And many Bioinformaticians take up this challenge; maybe because one has to be eternally optimistic to be a Scientist, or maybe because the alternative of getting Computer Scientists to talk directly to Biologists seems, by comparison, a lot harder. Either way, the result is a mess - with experienced Biologists handing over their precious data without a clear understanding of what is going to happen to it (because they dont run the software themselves), and Computer Scientists without a clear understanding of the biological problem writing whole programs (not just algorithms, but also the interface, the documentation, the output, etc.).
A talented Bioinformatician will try to sort all of this out by running the programs themselves, juggling incompatibility issues between data and programs - and whilst they may have some success, they are not acting as a bridge!
At AC.GT, we like to think of Bioinformatics as building bridges - big, beautiful, functional bridges that make people want to travel and explore 'the other side' as easily and efficiently as possible. We want Computer Scientists to write code that Biologists love to use, and we want Biologists to present problems that Computer Scientists love to solve! But neither will ever happen if Computer Scientists keep writing software that Biologists will never use...
So where do we start?
The heart of the problem lies in the code itself, and how that code is developed. To fully understand why, let's take a
look at some of the biggest intellectual bridges humans have ever built: web browsers.
Web browsers connect people with tasks & problems to services & solutions much like bioinformaticians should connect biological tasks to computational solutions. Although the browser cannot create webpages on its own, it can translate very complicated web code into a pleasing interface for the user, whatever their screen resolution, operating system, or hardware. If the website tries to do something the user probably doesn't want, say, show a load of popups, the browser will block them from ever reaching the user. Visa versa, if the user tries to submit their bank account details to a website without first setting up a properly secured SSL connection, they will get a warning that explains, at different levels of detail, what the technical problems are and how to fix it. And unlike a lot of existing bioinformatic software, the best ones also try to stay out of both the user's and the website developer's way. Why can't bioinformatic software be like this?
Well, when we talk about bioinformatic software here, we are talking about the big projects designed specifically with the intention of publication - whether directly in a journal like Bioinformatics, or indirectly as part of an analysis in Nature or Cell. When software is written for publication, it is not subject to the same sorts of pressures and influences that typical non-scientific software is. Where usage is a key metric of success outside science, within science the spotlight is very much on novelty. What new functionality does the program provide? What new analysis will it do for me?
To highlight how totally bizzare it is to value novelty over usage, lets imagine a world where regular software had to appeal to the same sensibilities as Bioinformatic software to become accepted. Imagine you were responsible for publishing the web browser Chrome, as academic software, back in 2008. Browser market share in August of 2008 - w3counter.com "Can Chrome view websites that existing state-of-the-art browsers like Internet Explorer cannot?"
No (they all must support the same web standards)
"Will Chrome load webpages significantly faster than other browsers?"
No (relative to the amount of time downloading the page in 2008)
"Will Chrome work on some computers that other browsers cannot?"
No (IE only requires Windows as a "dependancy" )
"Are many people already using Chrome?"
No (just a few Google employees too Hipster for even firefox...)
Even if you managed to get the software published, you would have a hard time convincing anyone in Academia to use it...
"I'm sorry, but in my field we always use Internet Explorer. Reviewers wouldn't like it."
"Really? Another browser? I just switched to Safari! Why can't you guys come to a consensus on what I should be using..."
So what, if anything, can we learn from all this?
At AC.GT, we would say that novelty is a poor metric of success. So long as the driving force behind bioinformatic software development is publication, and publication requires novelty, the user experience and thus usefulness of a tool will always come second to quirks and gimmicks like novel file formats, esoteric algorithums, and inexplicably complicated user interfaces.
If you want to build bridges, you do not make it a requirement that every bridge looks different, or is demonstratably better than the last. The only requirement for building a bridge is that it is well built, and in the right place.
Interested in writing code for AC.GT? Click here!