Bioinformatics Master Practical SS20

Master Practical

Fabian TheisCarsten MarrMathias Heinig
Tel.: +49 89 2891-7961Tel.: +49 89 3187-2158Tel.: +49 89 3187-4217
E-mail: E-mail:  E-mail:  


Room:HMGU, Institute of Computational Biology, Building 58a,
Date & Time:     

Block course: 6 weeks from July 29 on, Weekly meeting with research group 

Prerequisites:Bachelor in mathematics, bioinformatics, statistics or related fields.
Language:      English       


NOTE: Preliminary course descriptions

Project I: Model inference from protein time-courses in hematopoietic stem cells

Supervisor: Carsten Marr

Abstract: Stochasticity of gene expression becomes apparent when we study the dynamics of single cells rather than a population of cells. In fact, fluctuations in gene expression reveal more information about the underlying mechanisms of transcription and translation than one could obtain from population averages.

In this project we will develop a particle filtering algorithm to infer the parameters of stochastic models from single cell time course data. In particular, we will apply this method to time-lapse microscopy data of two transcription factors in murine blood stem cells. These transcription factors play a major role in stem cell differentiation.

During the project, you will learn about stochastic dynamics, associated inference method and models of stem cell fate decisions. Requirements are good programming skills and solid background knowledge in statistics/machine learning.

Meetings: Mondays


  1. Kaern, M., Elston, T. C., Blake, W. J. & Collins, J. J. Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451–464 (2005).
  2. Zechner, C., Pelet, S., Peter, M. & Koeppl, H. Recursive Bayesian Estimation of Stochastic Rate Constants from Heterogeneous Cell Populations. 50th IEEE Conf. Decis. Control (2011).

Project II: Inference of transcriptional regulators from single cell open chromatin and gene expression data

Supervisor: Matthias Heinig

Abstract: The identity of a cell is mirrored in its global gene expression profile. Transcription is regulated by transcription factors that recognize specific sequences in cis regulatory elements residing in regions of open chromatin. Recent technological developments allow for the measurement of genome-wide transcriptional profiles as well as open chromatin regions in single cells.

In this project we will develop an approach to infer which transcription factors are defining the cellular identities by integrating gene expression, open chromatin and known sequence motifs of transcription factors.

During this project you will learn to analyze single cell RNA-seq and single cell ATAC-seq data, to identify transcription factor binding sites from ATAC-seq footprints and to integrate these data in a statistical model.

Meetings: Monday


  1. Schmidt, F., Gasparoni, N., Gasparoni, G., Gianmoena, K., Cadenas, C., Polansky, J. K., et al. (2017). Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Research, 45(1), 54–66.
  2. Gusmao, E. G., Allhoff, M., Zenke, M., & Costa, I. G. (2016). Analysis of computational footprinting methods for DNase sequencing experiments. Nature Methods, 13(4), 303–309.
  3. Cusanovich, D. A., Hill, A. J., Aghamirzaie, D., Daza, R. M., Pliner, H. A., Berletch, J. B., et al. (2018). A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility. Cell, 174(5), 1309–1324.e18.

Project III: Exploring the cellular heterogeneity of the mouse hippocampus under stress

Supervisor: Fabian Theis

Abstract: Droplet-based single cell RNA-sequencing has enabled us to profile the cellular heterogeneity of tissues, organs, and even whole organisms. The largest of these so-called “atlases” is a 1.3 million cell dataset of the mouse brain. The next step in single-cell profiling is to understand how these atlases change under perturbations. Together with collaborators at the MPI for psychiatry, we are generating a dataset of over 120,000 cells to profile how the mouse hippocampus responds to stress under various conditions.

In this project we will begin to analyse the first perturbation atlas of the mouse brain. Using our popular analysis platform, scanpy, we will investigate how hippocampal cellular diversity is affected by stress and neuronal receptor knockout. This project will involve using and adapting various machine learning methods for cellular embedding and clustering, computing gene expression signatures, comparing cellular compositions between conditions, and inferring developmental and other trajectories of brain cell types. We will further adapt machine learning methods for trajectory inference to describe continuous cellular phenotypes in the brain.

During this project you will learn how single-cell RNA-seq data is pre-processed, analysed, and interpreted to obtain biological insights. You will familiarize yourself with how machine learning is applied in a biological context and you will work with analysis methods that represent the current state-of-the-art in the single-cell field. Experience in Python and a basic understanding of statistics are required for this project.

Meetings: Thursdays


  1. Schaum, N., Karkanias, J., Neff, N., et al., Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris, Nature (2018) 562(7727) 367-372
  2. Wolf, F.A., Angerer, P., Theis, F.J., SCANPY: large-scale single-cell gene expression data analysis, Genome Biology (2018) 19(1) 15
  3. Haghverdi, L., Buettner, M., Wolf, F.A., Buettner, F., Theis, F.J., Diffusion pseudotime robustly reconstructs lineage branching, Nature Methods (2016) 13(10) 845-848