Statistical Bioinformatics

Data Integration

Modern systems biology data is plentiful but often still sparse and plagued by high levels of noise. Until recently these data had been studied largely in isolation of one another. In reality, however, protein interaction networks, metabolic networks and transcriptional regulation networks are intricately interwoven and need to be considered as such.

Thus, in order to understand the function of biological systems, it is important to combine these different data-types. To do so consistently and coherently is a major statistical challenge, and the need for data-integration arises in all aspects of our work. We are maintaining extensive data resources which capture published data on the model organisms studied in the Theoretical Systems Biology Group and are continually revising these.

Data-integration is particularly important at the system level and in evolutionary problems considered by the group: here, both the data and intrinsic system dynamics are highly variable – the variance frequently “overwhelms” the mean behaviour – and we are actively engaged in developing and applying error models to handle noise and uncertainty in physical interaction data.

Complex Networks

Most network data collected to date are incomplete in the sense that not all nodes and edges have been observed, or can be observed given present experimental procedures. It is possible to show that the properties of incomplete networks will generally differ from those of the complete networks, sometimes quite considerably. A statistical perspective, however, allows us to relate properties of such noise and incomplete networks to those of the “true” network. Such situations also arise in the social sciences, engineering and physical sciences.

Related to this problem is the frequently recurring need to compare networks. These can either be networks collected at different times or under different circumstances. This is an area of longstanding and continuing interest in the group.