Time Monday, 8.15h-9.45h
Room S105
Credits 2SWS/3ECTS for 113446b + 4SWS/6ECTS for 113446a = 6SWS/9ECTS
Exam t.b.d.

Note: This lecture and the lab exercise Data Mining Lab together constitute the module 113446 Data Mining. It is not possible to attend and credit only one of these lectures! However, it is not necessary to attend these lectures in the same term. There are 2 independent grades in both lectures of the module. The grade for the module is a weighted average of both parts. The weights are:
- 35 for lab exercise data mining and pattern recognition
- 25 for the NLP lecture.


  • First lesson in term WS 1819: Monday, 15.10. 2018

Natural Language Processing

Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part of Human Computer Interaction (HCI). As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.

NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining.

In this lecture the basic techniques of NLP will be taught. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures.

Structure, Contents, Documents

Note: Some of the links below are not yet active, since the corresponding documents are currently updated! The current state of all jupyter notebooks can be downloaded from NLP Jupyter Notebooks.

Lecture Contents Document Links
1. Introduction Definition, Application and Challenges of NLP, Structure of this course
2. Access Text and Preprocess Access Text from textfiles, Web-Sites (HTML), RSS-Feeds, API, Corpora, Segmentation into word and sentences, Regular Expressions
3. Morphology Normalisation, Stemming, Lemmatisation, Word Similarity, Correction
4. POS-Tagging and Chunking Part-Of-Speech, Tagsets, Tagging
5. Vector-Space Document Models Bag-of-Words, tf-idf, similarity-measures, Latent-Semantic-Indexing, Information Retrieval Gensim Document Model, LSI Model, RSS Topic Extraction
6. Text Classification I Naive Bayes Classifier, Smoothing
7. Language Models N-Gram language models, training- and evaluation
8. Distributional Semantics I Count based language models: HAL, Random Indexing. Neural Network Language Models: Word2Vec, GloVe, … Generate CBOW from Wikidump
9. Distributional Semantics II DSMs for Sentences, Paragraphs, Textpieces: Compositional Models, Doc2Vec, Skip-Thought, …
10. Text Classification II Neural Network based Classifiers CNN Text classification

Further Links:


  • Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition (2009) by Dan Jurafsky
  • Foundations of statistical natural language processing (18 June 1999) by Christopher D. Manning, Hinrich Schuetze
  • Natural Language Processing with Python (2009) by Steven Bird, Ewan Klein, Edward Loper