EBS 2001 to 2015

The Enterprise Biology Software Project takes a somewhat unconventional approach to biology.  Instead of trying to explore biology with existing technologies, it first identifies a problem and then assembles a new technology to solve it.  In short, many complex problems can be solved with software when we play by the rules of biology.  Finding the rules, of course, becomes the most challenging part of the game.   

In effect, the enterprise software serves as a new enabling technology for the biology community.  The software packages have included short stories, courses, databases, appendices, puzzles, software tools, blueprints, rules, yearly reports, and an evolving strategy for innovation, discovery, and productivity.  It is a work in progress. 


Data City – A short story (2001)  The story explains how we can invent the future of biology by using our imagination to build a data city.  Thus far, the story includes four chapters.  Chapter 1 introduces Data City and explains how we plan to use it.  Chapter 2 takes us into the quantitative core of biology by peeling away layers of complexity.  It then explains how to design the buildings of the city.  Chapter 3 uses these buildings to explore the underlying principles of biology.  Chapter 4 relates these principles to laws of nature.  In effect, the story shares an extremely well hidden secret with the reader.  It shows that the biology literature contains a vast reservoir of new data sitting just beyond our reach.  The secret of the short story is that it explains how to extend our technology far enough to capture some of this prodigious wealth.


Course 1: Human Biology (2001)  The course is configured for undergraduate biology students.  However, it can be readily reconfigured into other biology courses - such as high school biology, advanced histology (for graduate, medical, and dental students), human pathology, etc.  The HuBio course includes thirty-eight lectures, assignments, exercises, simulators, and quizzes.  Here technology is used extensively to create a new environment for students – one that offers experiences quite unlike those they currently enjoy with either textbooks or lectures.  


Course 2: Mathematics - Stereology (2001)  Fundamental to an enterprise approach is the importance of being connected – well connected.  The mathematics course looks at research data in biology with the view of connecting them across a hierarchy of size – extending from individual molecules to complete organisms.  The course identifies data collected with unbiased methods as the best for making these connections and explains how to recognize and collect such data.


Course 3: Technology - Enterprise Biology (2001)  The course begins by defining enterprise biology as the conjunction of three models: qualitative, quantitative, and relational.  The point of the course is to define a mathematical framework for the biology literature, one that can serve as a springboard into the unknown.  Using this new framework, the student quickly sees how discovery depends – decisively - on creating new data from old.  The course continues from this observation with illustrations of how an enterprise approach to biology can be applied to challenging problems, such as explaining gene function.   


Appendix (2001)  The appendix includes directions, tools, and data entry screens for managing the software.  In effect, it offers a realistic view of what it takes to build and maintain an enterprise system in an academic setting.  It also demonstrates the effectiveness of an enterprise approach by creating a host of new opportunities.  Each release of the Enterprise Biology Software package includes new and upgraded programs.  


BIOLOGYtabs 2002  BIOLOGYtabs 2002 includes selected EBS programs that have research data stored therein.  As such, they do not require the client-server configuration of the original package.  New features include abstracts online, methods for minimizing bias, strategies for unfolding complexity, and methods for generating organs from seed values with biological algorithms.


BIOLOGYtabs 2003  BIOLOGYtabs 2003 includes selected EBS programs that have research data stored therein.  As such, they do not require the client-server configuration of the original package.  New features include design codes (simple and complex), ladder equations, change equations, and a strategy for dealing with complexity by moving to a higher dimension.  



BIOLOGYtabs 2004  BIOLOGYtabs 2004 includes selected EBS programs that have research data stored therein.  New features include four equation libraries (repertoire, analogy, drill-down, and ladder), graphs (citations and methods), rule-based connections, structural networks, and new strategies for finding mathematical order in biology.  



BIOLOGYtabs 2005  BIOLOGYtabs 2005 includes selected EBS programs that have research data stored therein.  New features include a decimal equation library, puzzles (counting molecules, unfolding and refolding the hippocampus, and building a universal biology database), and examples of interpreting the results of an experiment within the framework of a data-driven biology.  



Puzzle 1 (2005): COUNTING MOLECULES - for students in molecular, cellular, and systems biology...  The program introduces the student to the process of (1) designing experiments as equations, (2) running experiments and interpreting results - with and without complexity, and (3) evaluating research publications.  The program will be of special interest to anyone reporting research results as optical densities, concentrations, or stereological densities - it shows how to increase the reliability of these data types. 


Puzzle 2 (2005): HIPPOCAMPUS - unfolded & refolded.  The program introduces the process of (1) writing equations for organs - using published data, (2) predicting the structure of a hippocampus from a single value (a volume or cell count), and (3) identifying unique phenotypic patterns in species with similar genotypes, namely the human, monkey, mouse, rat, and shrew.  The program will be of special interest to investigators wishing to unravel complexity - across species - as a way of exploring the relationship of the genome to phenotypic expression.      

Puzzle 3 (2005): UNIVERSAL BIOLOGY DATABASE 1.0  The program introduces a modern digital library for the basic and clinical sciences.  The data of published research papers are stored in a relational database, standardized to a universal format, hardened by minimizing bias and variability, integrated across disciplines, transformed into equations, and equipped with a user-friendly interface.  It offers the investigator a working model of a data-driven biology, one designed specifically for exploring biology in novel ways.                      


UNIVERSAL BIOLOGY DATABASE 2.0 (2006)  The database was upgraded to include both control and experimental data, connected to the original stereology literature database, and refitted with a "query by example" interface. In turn, the new database was used to translate research papers into stacks of equations, to summarize biological rules for assembling parts into larger structures, and to explain the process of reverse engineering.            


UNIVERSAL BIOLOGY DATABASE 3.0 (2007)  The software package was updated by adding new data and tools - including a concentration trap, cluster analysis module, and Rule Book. The solution to Puzzle 4, which began with a careful look at semiquantitative data, led to hybrid hierarchy equations, gold standards, and a strategy for capturing the data of molecules and genes.  In effect, we can now access, integrate, and interpret data from all sixteen levels of the biology hierarchy - seamlessly.  One finding was most surprising.  Biology turns out to be unique among the basic sciences in that its data are inextricably bound to their locations.  Such a relationship defines a key element of biological complexity.  This means that the interpretation of research data requires a consideration of two elemental factors - a numerical quantity and a location – both of which are embedded in the equation of the experiment.  In other words, biological complexity becomes manageable when our research data become fully quantitative and integrated.  


UNIVERSAL BIOLOGY DATABASE 4.0  (2008)  The software package challenges the reader by increasing the difficulty of the puzzles to the level of an information science.  At this level, we play biology’s game according to biology’s rules.  The process is surprisingly straightforward.  Find out what biology does and then do exactly the same thing.  When biology changes its phenotype, mirror the changes with a phenotype of our own.  When biology behaves quantitatively, we behave quantitatively.  And the reward for our efforts is?  We can design a quantitative model for systems biology - one that can be readily translated into software and shared with contributing authors.    


SYSTEMS BIOLOGY TWO 1.0 (2009)  The software package includes an Information Infrastructure and its first offspring, Systems Biology Two (SB2).  Within the information infrastructure, research publications become translated into tables of digitized data that, when allowed to interact and connect, emerge as a robust platform for exploring biological complexity.  In addition to creating new opportunities for diagnosis and prediction, we also increase our changes of detecting biological changes - routinely - by at least an order of magnitude.  The report explores ways in which this new information infrastructure can serves as an engine for innovation, discovery, and productivity. 


ORGANISM CODES (2010)  The software package includes an Information Infrastructure with its second offspring, a collection of Organism Codes.  These codes, which are based on data triplets, offer an unrestricted view of phenotypes defining themselves quantitatively in terms of nodes and connections.  The codes show us exactly how phenotypes change and predict the existence of template codes for phenotypes.  The report examines the relationship of phenotype to complexity and considers how this association impacts all segments of the biology enterprise. 


MATHEMATICAL MAPPING (2011)  The software package includes an Information Infrastructure with its third offspring, Mathematical Mapping.  These maps, which are based on data triplets, allow us to extract biological rules of structural design from its parts and to interpret large and complex data sets with equations and cutting edge graphics.  The report examines the relationship of triplets to complexity and considers how this association advances the development of theory structure in biology.  An example, taken from the literature, explains how we can unravel a complex disease (schizophrenia) by extracting the rules and using them to play the complexity game.


MATHEMATICAL Markers (2012)  The software package includes an Information Infrastructure with its fourth offspring, Mathematical Markers.  Such markers, which are based on data triplets, allow us to extract rules of design from biological parts and connections and to transform reductionist data of the literature into complex data sets.  In turn, these new data sets - including more than 700,000 mathematical markers - can be assembled into specific configurations that provide simple solutions to otherwise extremely complex biological problems.  Examples of such solutions offered in the report include generalizing clinical diagnosis and defining new theory structure.


COMPLEXITY GAMES (2013)  A complexity game begins by constructing a playing field, finding the rules that apply, and them making moves - in the form of questions.  The report includes the following moves.  Move 1: If mathematical markers can diagnose disorders in living brains, can similar markers from postmortem brains do the same?  The answer is no because the same parts and connections of living and postmortem brains are different.  Move 2: Why?  In postmortem brains, parts tend to shrink or swell differently in control and experimental settings.  By introducing this  unnatural complexity, our data become chaotic.  Move 3:  Can we manage this chaos with correction factors?  Yes.  Move 4: Can we now diagnose disorders with postmortem brains?.  Apparently yes.      


BIG DATA (2014)  By translating data of the human brain (Internet Brain Volume Database, Kennedy et al, 2012) into mathematical markers, we can assemble an objective - data driven - approach to clinical diagnosis.  The exercise, however, triggered several challenges.  Since disorders of the brain display a modular format wherein different disorders often share identical modules, extracting a unique and diagnostic set of markers for each disorder involved the removal of many false positives.  Moreover, with more than 15,000,000 mathematical markers in play, extensive automation was needed to manage the big data sets.  The product of this effort included two diagnosis databases capable of delivering the correct diagnosis 100% of the time.  Such an outcome illustrates the promise of applying a new theory structure to the task of capturing complex phenotypes from patient data.      


DISORDER ... ORDER (2015)  Order is fundamental to disorder.  By generating mathematical markers for 21 disorders of the brain, selecting those that occur as duplicates, and then mixing them all together, disorders with the closest affinities can be identified.  Such an exercise indicates that abnormal markers are highly conserved in that the same markers occur routinely in different disorders.  Moreover, certain markers display a remarkable persistence, as evidenced by their reproducibility across a broad spectrum of disorders.  The results begin to suggest that the complexity of disorders expressed by the human brain extends far beyond even the most optimistic of predictions.