Multivariate Statistics

Course description


Programming skills with R, e.g. course Introduction to R and basic knowledge of statistics, e.g. course Introduction to Statistics. Some practice in ggplot2 is also welcome, that can be achieved in the course Graphics with R (not mandatory).

Course overview:

The participants will learn when and how to apply unsupervised learning methods such as PCA, MDS, t-SNE or UMAP for dimension reduction and k-means, hierarchical clustering or some hybrid approaches for clustering. The course also covers two supervised learning methods, namely principal component regression and partial least squares regression. The content of the course will help to understand the basis of the theory when doing a multivariate analysis. All topics are accompanied with hands-on exercises using the statistical software R. The participants are invited to ask as many questions as they want about the analyses on their own dataset.


This course on multivariate statistics covers two different topics:

  • Dimension reduction methods. This first chapter focuses more on principal component analysis (PCA), what is "under the hood", how many principal components to choose, how to visualize and interpret the results. A short overview on other unsupervised multivariate methods (e.g. for categorical variables, data structured into groups, multidimensional scaling - MDS) is also part of the lecture. This chapter includes a part on dimension reduction for omics data (t-SNE, UMAP) as well. Finally, two supervised learning methods are covered by this chapter: principal component and partial least squares (PLS) regressions.
  • Cluster analysis. This second chapter focuses on the two most frequently used clustering methods: k-means and hierarchical clustering (HC). It describes the different measures of dissimilarity and distances that can be used to define clusters. A short part also illustrates how to combine both algorithms (k-means and HC) into hybrid algorithms. Finally, this chapter covers the R commands that permit to produce heatmaps together with the result of a clustering algorithm.


Each day consists of blocks covering first the theory behind the methods and their applications in R, and then hands-on examples with best-practice solutions.


  • This course will be offered either on campus (in person), or online. The dates of online and on campus courses are indicated in the table here.
  • For on campus courses please check the restrictions and hygiene rules described here*.
  • For online courses we use the software Zoom, for further information, please check the description here here*.

Duration: 3 Days

Language: English


  • Material for the course can be found here*.
  • Please install the necessary R-packages prior to the course. The packages are listed in "Material_Multivariate_Statistics.html" which is part of the linked ZIP-folder.
  • Please be aware that the materials will be updated shortly before the next course.

Dates and Application:


 * Links marked with * are only available for Helmholtz Munich staff.