Multivariate Statistics 1
Principal component analysis and clustering algorithmsCourse description
Requirements:
Programming skills with R, e.g. course Introduction to R and basic knowledge of statistics, e.g. course Introduction to Statistics. Some practice in ggplot2 is also welcome, that can be achieved in the course Graphics with R (not mandatory).
Course overview:
The participants will learn when and how to apply unsupervised learning methods such as PCA for dimension reduction and k-means, hierarchical clustering or some hybrid approaches for clustering. The course also covers some rotation techniques after dimensionality reduction as well as mixture models and heatmaps (clustering). The content of the course will help to understand the basis of the theory when doing a multivariate analysis. All topics are accompanied by hands-on exercises using the statistical software R. The participants are invited to ask as many questions as they want about the analyses on their own dataset.
Topics:
This course on multivariate statistics covers two different topics:
- Dimension reduction methods. This first chapter focuses more on principal component analysis (PCA), what is "under the hood", how many principal components to choose, how to visualize and interpret the results. A short overview on rotations techniques and other unsupervised multivariate methods (e.g. for categorical variables, data structured into groups) is also part of the lecture.
- Cluster analysis. This second chapter focuses on the two most frequently used clustering methods: k-means and hierarchical clustering (HC). It describes the different measures of dissimilarity and distances that can be used to define clusters. A short part also illustrates how to combine both algorithms (k-means and HC) into hybrid algorithms. Finally, this chapter covers the theory and application of mixture models as well as the R commands that permit to produce heatmaps together with the result of a clustering algorithm.
Methods:
Each day consists of blocks covering first the theory behind the methods and their applications in R. Theoretical lessons will be followed by hands-on examples with best-practice solutions.
Format:
- Duration: 3 Days
- Language: English
- This course will be offered either on campus (in person), or online.
- For online courses we use the software Zoom
Materials:
- Material for the course can be found here*.
- Please install the necessary R-packages prior to the course. The packages are listed in "Materials_Multivariate_Statistics_1.html" which is part of the linked ZIP-folder.
- Please be aware that the materials will be updated shortly before the next course.
Dates and Application:
- Courses provided for Helmholtz Munich:
- You can check the current dates and whether the courses are already fully booked here*. The course registration will usually open 8 weeks prior to the course.
- Please read the corresponding FAQ* before applying via the forms of the HR Development department*.
- NEWS: As long as the internal registration homepage CaMS is not working, please contact Elmar Spiegel for registrations.
- Courses provided for HIDA:
* Links marked with * are only available for Helmholtz Munich staff.