Organisation

Time Tuesday, 10:00h-11:30h
Room i003
Credits 2SWS/3ECTS
Exam written exam

Announcements

  • First lesson in term WS 22-23: Tuesday, 11.10. 2022
  • This lecture is held by Daniel Grießhaber and Marcel Heisler

Natural Language Processing

Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part of Human Computer Interaction (HCI). As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.

NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining.

In this lecture the basic techniques of NLP will be taught. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures.

New Content (by october 2021)

Old Content (before SS 2021)

Note: Some of the links below are not yet active, since the corresponding documents are currently updated! The current state of all jupyter notebooks can be downloaded from NLP Jupyter Notebooks.

Lecture Contents Document Links
1. Introduction Definition, Application and Challenges of NLP, Structure of this course
2. Access Text and Preprocess Access Text from textfiles, Web-Sites (HTML), RSS-Feeds, API, Corpora, Segmentation into word and sentences, Regular Expressions
3. Morphology Normalisation, Stemming, Lemmatisation, Word Similarity, Correction
4. POS-Tagging and Chunking Part-Of-Speech, Tagsets, Tagging
5. Vector-Space Document Models Bag-of-Words, tf-idf, similarity-measures, Latent-Semantic-Indexing, Information Retrieval Gensim Document Model, LSI Model, RSS Topic Extraction
6. Text Classification I Naive Bayes Classifier, Smoothing
7. Language Models N-Gram language models, training- and evaluation
8. Information Extraction Chunking, Named Entity Recognition
9. Distributional Semantics I Count based language models: HAL, Random Indexing. Neural Network Language Models: Word2Vec, GloVe, … Word Embeddings,Generate CBOW from Wikidump
10. Distributional Semantics II DSMs for Sentences, Paragraphs, Textpieces: Compositional Models, Doc2Vec, Skip-Thought, …
11. Recurrent Neural Networks RNN, LSTM, GRU, Attention, seq2seq, HAN RNN Theorie, Hierarchical Attention Networks,seq2seq Model for Translation
12. Text Classification II Neural Network based Classifiers CNN Text classification

Literature

  • Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition (2009) by Dan Jurafsky
  • Foundations of statistical natural language processing (18 June 1999) by Christopher D. Manning, Hinrich Schuetze
  • Natural Language Processing with Python (2009) by Steven Bird, Ewan Klein, Edward Loper
  • Neural Network Methods in Natural Language Processing (Morgan and Claypool Publishers, 2017) by Yoav Goldberg
  • Natural Language Processing with PyTorch (O' Reilly 2019) by D. Rao, B. MacMahan
  • Natural Language Processing in Action (Manning 2019) by H. Lane, H. Hapke, C. Howard