Data Mining Lab

Organisation

Credits	4SWS/6ECTS
Exam	Lab Exercises + oral
Maximum number of students	20
Room	s104
Date	Monday, 10:00h-13:15h

Announcements

First lesson in SS 24: 25.03.2024
This lecture is held by Johannes Theodoridis and Manuel Eberhardinger

Data Mining Lab: Contents

In this course 6 different data mining and pattern recognition applications are implemented by all student groups. A group contains at most 3 students. The implementation of each application should be done within one session. The applications, which have to be implemented, are described in the subsections below.

For each of the 6 lab excercises:

a jupyter-notebook is provided, which contains the task-description and questions.
students have to prepare themselves before the exercise-date. For a focused preparation a list of preparation questions is contained in the jupyter-notebook of each exercise. These questions will be interrogated randomly at the start of each excercise.
the tasks as formulated in the jupyter-notebook must be implemented in the code-cells. Moreover, the questions must be answered in the jupyter-notebook.
Important: Even though it is not always explicitly stated, the obtained results must be discussed scientifically: Try to explain the results, document what you find interesting, propose improvements, …This discussion must also be included in the jupyter-notebook.
the prepared jupyter-notebooks (as described in the previous items, including the answers on the preparation questions!) must be submitted to the lecturer. Due date for each notebook, is immediately before the start of the next lab-exercise. The Jupyter Notebook (.ipynb), it’s .html representations and a link to download the entire project must be submitted.
Unexcused absence yields a submark of 4.7.

Supervised Learning & Data Visualization

Vehicle Data Analysis

This exercise applies a comprehensive collection of 25000 vehicles. Based on this dataset, we implement

an Explorative Data Analysis (EDA) to understand the data
an entire Machine Learning process, from data access to model evaluation
a classifier to predict productgroup from input-features
a regression-model to predict CO2-emissions from Input features
Hyperparameter-Tuning

Unsupervised Learning & REST APIs

Clustering of Pokemons

In this exercise, the Pokemon API is used to query data for Pokemons and then process and analyze that data using various clustering algorithms. The following topics will be discussed or implemented:

Learn how to query APIs with Python
Querying Pokemon features via the API (Data Collection)
Creating features for Pokemons (Feature Extraction + Data Preprocessing)
Implement and analyze different clustering algorithms
Selection of the most meaningful features (Feature Selection)
Clustering of similar Pokemons

Collaborative Recommender Systems

Movie Recommendations

Recommender Systems are applied in E-commerce for generating customized recommendations. Well known are the Amazon.com recommendations which are either distributed by e-mail or presented on the Amazon web page after login. For generating these recommendations the products which have already purchased or reviewed by the user are taken into account. In this exercise the currently most popular algorithms (Collaborative Filtering) for generating recommendations are implemented, tested and analysed.

Naive Bayes

Document Classification

A Naive Bayes classifier is implemented for document classification. It is shown how this algorithm can be applied for the classification of different RSS web feeds.

Principal Component Analysis & Eigenfaces

Face Recognition

In this excercise a programm for face recognition is implemented. For a given set of training images (biometrical face photos) the Principal Component Analysis (PCA) is applied to calculate the space of eigenfaces. Then a photo which has to be recognized is transformed to the space of eigenfaces and the closest training photo is calculated.

Convolutional Neural Networks with Keras

Traffic Sign Recognition

In this excercise a Convolutional Neural Network (CNN) for the recognition of German traffic signs must be implemented, using tensorflow and keras.

Dates and Documents

All notebooks and resources can be downloaded from Ilias Data Mining. For executing jupyter-notebooks, Python and jupyter-notebooks must be installed. It is strongly recommended to install the Anaconda Python distribution. This distribution does not only contain Python and Jupyter-Notebooks but also nearly all packages, which are required in this lab-exercise.

Date	Title
Week 1	Introduction, Organizational aspects
Week 2	Registration, Python Introduction, Environment Setup
Week 3-4	Supervised Learning & Data Visualization
Week 5-6	Unsupervised Learning & REST APIs
Week 7-8	Collaborative Recommender Systems
Week 8	Naive Bayes
Week 9	Principal Component Analysis & Eigenfaces
Week 10	Convolutional Neural Networks with Keras

Literature

Programming collective intelligence : building smart web 2.0 applications by Toby Segaran
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems) (22 June 2005) by I. H. Witten, Eibe Frank
Natural Language Processing with Python by Steven Bird, Ewan Klein, Edward Loper