Skip to main content
Speaker Giving a Talk at Business Meeting.
kasto - stock.adobe.com

Multivariate Statistics 1

Principal component analysis and clustering algorithms

Course description

Requirements:

Programming skills with R, e.g., course Introduction to R and basic knowledge of statistics, e.g., course Introduction to Statistics. Some practice in ggplot2 is also welcome, that can be achieved in the course Graphics with R (not mandatory).

Course overview:

The participants will learn when and how to apply unsupervised learning methods such as PCA for dimension reduction and k-means, hierarchical clustering, or some hybrid approaches for clustering. The course also covers some rotation techniques after dimensionality reduction as well as mixture models, heatmaps and other clustering methods (DBSCAN, Louvain). The content of the course will help understand the basis of the theory when doing a multivariate analysis. All topics are accompanied by hands-on exercises using the statistical software R. The participants are invited to ask as many questions as they want about the analyses on their own data set.

Topics:

This course on multivariate statistics covers two different topics:

  • Dimension reduction methods: This first chapter focuses more on principal component analysis (PCA), what is "under the hood", how many principal components to choose, how to visualize and interpret the results. A brief overview of rotation techniques and other unsupervised multivariate methods (e.g., for categorical variables, data structured into groups) is also part of the lecture.
  • Cluster analysis: This second chapter describes the different measures of dissimilarity and distances that can be used to define clusters. It focuses on the two most frequently used clustering methods: k-means and hierarchical clustering, and the combination of these two methods into hybrid algorithms. This chapter also covers the theory and application of mixture models as well as the R commands that permit to produce heatmaps together with the result of a clustering algorithm. Finally, two other clustering methods, namely DBSCAN and Louvain method for community detection, are introduced at the end of this lecture.

Methods:

Each day consists of blocks covering first the theory behind the methods and their applications in R. Theoretical lessons will be followed by hands-on examples with best-practice solutions.

Format:

  • Duration: 3 Days
  • Language: English
  • This course will be offered either on campus (in person), or online.
  • For online courses we use the software Zoom

 

Dates and Application:

  • Courses provided for Helmholtz Munich:
    • You can check the current dates and whether the courses are already fully booked here*. The course registration will usually open 8 weeks prior to the course.
    • Please read the corresponding FAQ* before applying via the forms of the HR Development department*.
  • Courses provided for HIDA:
    • You can check the current dates and whether the courses are already fully booked here.
    • Registrations for these courses are exclusively possible via the provided homepage.

 * Links marked with * are only available for Helmholtz Munich staff.