The Pharmaceutical Bioinformatics lab is working on several projects dealing with the discovery of new drugs, analysis of the mechanisms of known compounds as well as the biosynthesis of natural compounds. Some selected projects are described here.

Automated recognition of functional compound-protein relationships in literature

PubMed is a database containing millions of references to biomedical publications. Searching for protein-compound interactions in this continuously growing amount of literature can be an difficult and time-consuming task.

Text mining and machine learning techniques are applied here to identify the functional compound-protein relationships in all titles and abstracts of the PubMed database. (Publication in preparation.)

DNA Methylation

Methylation of cytosins within a CpG dinucleotide is a common epigenetic DNA modification and may arrest cells in a pathogenic state in complex disorders, e.g. cancer or rheumatoid arthritis. CpGs occur mainly in clusters, called CpG islands (CPIs), being present in nearly 70% of the human genes’ promotor region. The Illumina HumanMethylation450 Beadchip platform provides a genome-wide coverage of 485,577 CpGs. Analysis of these CpGs reveals a correlation between changes in DNA methylation and gene expression, even though not all sites have the same impact.

The methylation state of one CpG or a whole CPI may influence the expression of the corresponding gene due to binding of Methylation-Binding-Domains (MBD) and other methylation dependent proteins. To identify CpGs influencing gene expression and common methylation patterns we use several approaches, e.g. network analysis and machine learning techniques. (Related article: Heßelbach et al., 2017.)

Epigenetic drug discovery

Epigenetic mechanisms are essential for normal cellular development and maintenance of cellular homeostasis. In the past few years, it has been well established that epigenetic aberrations play an important role in a wide range of human diseases, including cancer. Unlike genetic mutations, epigenetic modifications are reversible, which makes them an attractive target for disease therapy. As a consequence, the last decade has seen an emergence of small molecule inhibitors against new epigenetic targets being implicated in various diseases, particularly cancer.

We use in silico methods (including molecular docking and virtual screening as well as ligand-based approaches such as pharmacophore modeling and QSAR, network analysis and MD simulations) for the design of inhibitors for a broad array of potential targets including epigenetic targets (e.g. bromodomains, chromodomains, HDACs etc.), methyltransferases as well as cofactor-binding proteins. (Publication in preparation.)

The diagram displays the superposition of the KAc binding sites of p300 (green) and CBP (yellow) in complex with XDM-CBP (Hügle et al., 2017).

Genome-based secondary metabolite prediction

The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not yet been identified. Structural elucidation of the actual secondary metabolite is still challenging, especially due to the currently unpredictable post-modifications.

To address this, SeMPI was designed, a web server providing a Secondary Metabolite Prediction and Identification pipeline for natural products synthesized by polyketide synthases of type I modular (Zierep et al., 2017). Further extensions of SeMPI require the improvement of state-of-the-art prediction algorithms, but also the implementation of new algorithms adapted to the metabolite in focus.

The core of all secondary metabolite prediction approaches is based on the accurate functional classification of the proteins responsible for its synthesis. In order to facilitate this task a pipeline was designed which merges all crucial steps for efficient protein classification. It allows for the collection and annotation of related sequences. These can be used for parallel benchmarking of suitable machine learning algorithms, such as hidden Markov profiles, position specific scoring matrices and optimized decision trees, accompanied by careful parameter optimization of each design. The most efficient classification system can then be incorporated into the prediction software. Advantages of this set-up are the straightforward evaluation of a newly created classification algorithm, as well as an update option, which keeps the learning data up-to-date and therefore the ability to build prediction rules for various kinds of gene cluster products, such as polyketides of type I iterative and nonribosomal peptides.

Currently the feasibility of structure based classifications are also evaluated by incorporation of secondary structure assignment tools.

Genome-scale metabolic modelling and Flux Balance Analysis

Streptomyces are a genus of ubiquitous soil bacteria known for the rich diversity of biological active secondary metabolites they produce. The expression of these compounds is often coupled to complex phenotypic changes, including morphological differentiation and reorganization of the organism's metabolic machinery. Despite these comprehensive changes, the yield of the produced substances is typically rather low.

In order to explore these intricate regulation mechanisms and their effects, we employ genome-scale metabolic models. These models are generated from genomic sequence data and include all metabolic processes the organism is capable of. These models can be used to systematically organize and interpret data from all kinds of ‘omic’ experiments. With their help, the effect of transcriptional changes or differing protein levels can be deduced using a detailed metabolic simulation. (Publication in preparation.)

Additionally, we use the insights gained to develop efficient and novel metabolic engineering strategies. The models open up the possibility of predicting the consequences of genetic modifications on a global level, allowing for a rational pre-selection of in vitro experiments.

With these combined efforts we want to improve access to rare and therefore often expensive biogenic drugs.

Molecular Dynamics (MD) simulation of therapeutic relevant protein targets

We are interested in understanding the structural dynamics and the signaling cascade mechanism of relevant biological targets with therapeutic potentials. We study the microscale atomistic dynamics through distributed MD simulations, using a BinAC high performance computing cluster. Currently, we are working on kinases and epigenetic drug targets. (Publication in preparation.)