Lecture Natural Language Processing

Organisation

Time	Tuesday, 10:00h-11:30h
Room	i003
Credits	2SWS/3ECTS
Exam	written exam

Announcements

First lesson in term WS 22-23: Tuesday, 11.10. 2022
This lecture is held by Daniel Grießhaber and Marcel Heisler

Natural Language Processing

Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part of Human Computer Interaction (HCI). As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.

NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining.

In this lecture the basic techniques of NLP will be taught. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures.

New Content (by october 2021)

Jupyterbook as .html: https://griesshaber.pages.mi.hdm-stuttgart.de/nlp/
Github repo of Jupyterbook and all sources: https://gitlab.mi.hdm-stuttgart.de/griesshaber/nlp
Link to Checker Quests

Old Content (before SS 2021)

Note: Some of the links below are not yet active, since the corresponding documents are currently updated! The current state of all jupyter notebooks can be downloaded from NLP Jupyter Notebooks.

Lecture	Contents	Document Links
1. Introduction	Definition, Application and Challenges of NLP, Structure of this course
2. Access Text and Preprocess	Access Text from textfiles, Web-Sites (HTML), RSS-Feeds, API, Corpora, Segmentation into word and sentences, Regular Expressions
3. Morphology	Normalisation, Stemming, Lemmatisation, Word Similarity, Correction
4. POS-Tagging and Chunking	Part-Of-Speech, Tagsets, Tagging
5. Vector-Space Document Models	Bag-of-Words, tf-idf, similarity-measures, Latent-Semantic-Indexing, Information Retrieval	Gensim Document Model, LSI Model, RSS Topic Extraction
6. Text Classification I	Naive Bayes Classifier, Smoothing
7. Language Models	N-Gram language models, training- and evaluation
8. Information Extraction	Chunking, Named Entity Recognition
9. Distributional Semantics I	Count based language models: HAL, Random Indexing. Neural Network Language Models: Word2Vec, GloVe, …	Word Embeddings,Generate CBOW from Wikidump
10. Distributional Semantics II	DSMs for Sentences, Paragraphs, Textpieces: Compositional Models, Doc2Vec, Skip-Thought, …
11. Recurrent Neural Networks	RNN, LSTM, GRU, Attention, seq2seq, HAN	RNN Theorie, Hierarchical Attention Networks,seq2seq Model for Translation
12. Text Classification II	Neural Network based Classifiers	CNN Text classification

Literature

Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition (2009) by Dan Jurafsky
Foundations of statistical natural language processing (18 June 1999) by Christopher D. Manning, Hinrich Schuetze
Natural Language Processing with Python (2009) by Steven Bird, Ewan Klein, Edward Loper
Neural Network Methods in Natural Language Processing (Morgan and Claypool Publishers, 2017) by Yoav Goldberg
Natural Language Processing with PyTorch (O' Reilly 2019) by D. Rao, B. MacMahan
Natural Language Processing in Action (Manning 2019) by H. Lane, H. Hapke, C. Howard