|
|
Research |
| |
| |
| Systems biology |
Systems biology aims at modeling the complexity of living organisms at a systems level. The hallmark of systems biology is the focus on interactions and the idea that entities of a system needs to be understood in the context of each other rather than individually. Only then can phenotypes arising from networks of interacting genes be understood (so-called emergent properties). We have devised an approach to network inference that incorporates interactions between regulators, including synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms. The approach integrates promoter information and gene expression profiles to reverse-engineer regulatory networks. We are also developing methods for incorporating proteomics and metabolomics data, and for comparing networks across species (comparative regulomics) . |
|
 |
Figure description. The transcriptional network of Populus leaves. Regulators (transcription factors) are red diamonds, while transcriptional modules are blue circles. |
| |
Machine learning |
The machine learning strategy to experimental biology iteratively uses experiments to provide representative examples and computational models to provide experimentalists with new, testable hypotheses. There are two conceptually different approaches to the computational step. The most common is that of nearest neighbor(s) approaches where, for example, protein function is transferred from the closest sequence for which such information is available. The second approach is that of inducing a general model from the available examples and to use this model for prediction. The advantage of the latter approach is that similarities can be found among many otherwise dissimilar examples and these patterns can be used to predict distant homologues. Another advantage is that models can be inspected and interpreted, and thereby providing us with insight into the biological system. |
| |
| Gene regulation |
One of the major challenges faced by molecular biology is to dissect the regulatory circuitry of cells. We have shown that we can describe the specific mechanism behind the combinatorial nature of gene regulation by constructing IF-THEN rules that identify sets of binding sites (IF-part) that are associated with particular gene expression profiles (THEN-part) in Yeast. We then demonstrated the biological significance of these regulatory mechanisms by showing that they could explain information on gene function (Gene Ontology annotations) and experimental binding (ChIP-chip data). We have also shown how the biological significance of the discovered regulatory mechanism is enhanced by considering gene expression similarity limited to phases of the cell cycle. However, instead of relying on expression clustering (common to almost all published studies in this field), we can obtain regulatory hypotheses that are much more specific by using available knowledge about the studied process; in this case the cell cycle. By dividing genes into classes according to their periodic expression in three different synchronization experiments, we identified the different mechanisms operating in the different environments and even proposed a structure for how these synchronization environments are regulated.
Rule examples:
IF RAP1 AND SWI5 AND MCM1' THEN expression similar to RPL18A
IF MBP1-STRE' AND SWI6-MCB THEN Periodic(110)
The approach is firmly based on a formal mathematical framework that describes the regulatory logic in terms of IF-THEN rules. The proposed regulatory mechanisms are completely transparent and can be inspected and understood by experts. |
| |
| Protein structure prediction |
 |
We have introduced the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We have build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. Thus the method identifies, in a systematic way, the local building blocks that are common to many proteins with otherwise unrelated global structures (folds). The descriptor building block approach has many possible applications and we have specifically found successful practical use in prediction of protein-ligand interactions, fold recognition, residue-residue contact prediction and prediction of function from structure.

|

|

|
| Local descriptor |
SCOP |
Segment 1 |
Segment 2 |
Segment 3 |
| 1f5ma_#167 |
d.110.2.1 |
108-112 |
KETQI |
133-137 |
IVVPII |
161-169 |
VDKEFLEKLA |
| 1gr8a_#407 |
c.117.1.1 |
164-168 |
GTAAI |
373-378 |
VNVPVL |
401-410 |
ATAWFLEDAL |
| 1i50a_#1268 |
e.29.1.1 |
1148-1152 |
IASEI |
1193-1197 |
LRLELD |
1262-1270 |
KIENTMLENI |
| 1mc0a1#360 |
d.110.2.1 |
296-300 |
KQCIQ |
324-328 |
LCVPVI |
354-362 |
EDEHVIQHCF |
| 1mt5a_#569 |
c.117.1.1 |
250-254 |
GICGL |
505-509 |
GVVPVT |
567-571 |
RFMREVEQLM |
|
Figure description. The local descriptor denoted 1gr8a_#407 (i.e. the local neighborhood around amino acid number 407 in protein domain 1gr8a_). The left figure shows the local descriptor 1gr8a_#407 (red) in the structure domain 1gr8a_, while the middle figure shows a close up of the same local descriptor. It consists of three fragments that are in proximity to each other in space but not along the amino acid sequence. The right figure shows the structural alignment of similar local descriptors in other ASTRAL domains. The corresponding sequence alignment is shown below the figures. |
|
| |
| Enzyme-ligand interactions |
The experimental assignment of functions and the mapping of interactions over whole proteomes are unfeasible. For example, the number of possible interactions between pairs of proteins and small organic compounds (ligands) may be in the area well over 10^400. We have introduced a computational approach to modeling and predicting molecular interaction in which proteins are represented using local descriptors of protein structure. A local descriptor is a discrete structural entity encompassing the complete local neighborhood around an amino acid. A library of commonly occurring local descriptors has been created, describing the binding pockets of protein-ligand complexes regardless of sequence similarity or global structural similarity. The local descriptors in this library can be matched to previously unseen protein structures without knowledge of the binding site of those proteins. In principle, this means that models can be generated that span the entire enzyme-ligand space, containing proteins that vary greatly in terms of sequence, structure and function. We have employed a comprehensive training and test set consisting of enzymes from all the major enzyme classes (EC) and shown that our approach indeed is capable of predicting binding affinity values for these enzyme-ligand complexes.
| (A) Descriptor 1a0la_#160 |
Figure description: (A) The protein substructure 1a0la_#160 (i.e. the backbone fragments centered on amino acid 160 in protein domain 1a0la_) (green). (B,C) The matches of the descriptor in (A) to enzymes 1tnl (red) and 1rgk (blue). The figure illustrates how the interaction of two protein-ligand complexes of proteins with different global structure (i.e. folds) can be described by the same local substructure. |
 |
| (B) Match: 1tnl#160 |
(C) Match: 1rgk#77 |
 |
 |
|
| |