Reverse Engineering

For the vast majority of biological systems, we lack precise knowledge regarding the structure of their molecular interaction networks. Even where network structures are known, we typically lack the parameters required to model such systems reliably. In a Bayesian framework, we can integrate prior information using rigorous statistical inferential procedures. We are developing and applying tools for the inference of networks and dynamical systems from biological high-throughput data.

Approximate Bayesian Computation for Biological Systems

Obtaining reliable parameter estimates for dynamical models of biological systems is fraught withdifficulties: data are notoriously noisy and sparse, and have often been collected under varying conditions. Even if we have a good idea as to the structure of the mechanistic model that generated the data, estimating the corresponding parameters is far from trivial. Conventional approaches for fitting models to such data – using, for example, non-linear optimization routines – routinely fail to capture this complexity by (often grossly) underestimating the uncertainty in the fitted parameters.

Bayesian approaches yield the full posterior probability distribution over the parameters (given the data), which allows us to appreciate the reliability of parameter estimates. At the same time, analysis of the posterior distribution also provides information regarding parameter sensitivity and model robustness. In practice, however, the posterior distribution is often intractable, especially for stochastic systems. In such cases, approximate Bayesian computation (ABC) can provide a practical alternative. We have developed sequential Monte Carlo implementations of ABC, ABC-SMC, which can be applied to parameter inference, and – more importantly – model selection in systems biology.

We are continuing to develop ABC-SMC further, and are employing it in an increasing number of biological systems. These range from bacterial stress response mechanisms to the signalling and regulatory processes that underlie human disease. Having recourse to the full (if approximate) posterior probability distribution over the parameters also allows a more consistent and global analysis of recurring issues in reverse engineering such as identifiability, sensitivity and “sloppiness” of models and model parameters than would be possible if only point estimates were available.

Network Inference

In general, we lack good starting network models for dynamic analysis. In order to deal with such situations, we are working with two complementary approaches that allow us to infer the structure of molecular interaction networks from either high-throughput functional data or from evolutionary comparisons.

Dynamical Bayesian Networks (DBNs) allow us to infer regulatory interactions among genes from transcriptomic data. We are not only interested in the structure of these networks, but also in how they change in response to external and physiological cues or over the life-cycle of an organism. In order to make best use of sparse and frequently disparate data, we are developing a version of the DBN formalism that integrates time-course gene expression data with other, time-independent data. Although these different experimental set-ups often probe only subtly different aspects of molecular systems, a combined analysis can give more highly resolved insights into the workings of transcriptional networks and gene regulatory processes. We are using these tools in order to understand regulatory processes in Escherichia coli and a range of Neisseria species.

Evolutionary Inference Methods use the fact that all life on earth has a single common ancestor. Coupled with a suitable inferential framework, comparative analysis allows us to predict molecular interactions in our target species based on observations made in different species. We are using this approach, coupled to extensive data-integration efforts, to predict protein-protein interactions – especially in the pathogenic fungus Candida glabrata – and to understand signalling and stress response processes across medically and industrially important bacteria and other human pathogens.