- First lesson in term WS 21-22: Wednesday, 06.10. 2021
Natural Language Processing
Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part of Human Computer Interaction (HCI). As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.
NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining.
In this lecture the basic techniques of NLP will be taught. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures.
New Content (by october 2021)
- Jupyterbook as .html: https://hannibunny.github.io/nlpbook/intro.html
- Github repo of Jupyterbook and all sources: https://github.com/hannibunny/nlpbook
- Link to Checker Quests
Old Content (before SS 2021)
Note: Some of the links below are not yet active, since the corresponding documents are currently updated! The current state of all jupyter notebooks can be downloaded from NLP Jupyter Notebooks.
|1. Introduction||Definition, Application and Challenges of NLP, Structure of this course|
|2. Access Text and Preprocess||Access Text from textfiles, Web-Sites (HTML), RSS-Feeds, API, Corpora, Segmentation into word and sentences, Regular Expressions|
|3. Morphology||Normalisation, Stemming, Lemmatisation, Word Similarity, Correction|
|4. POS-Tagging and Chunking||Part-Of-Speech, Tagsets, Tagging|
|5. Vector-Space Document Models||Bag-of-Words, tf-idf, similarity-measures, Latent-Semantic-Indexing, Information Retrieval||Gensim Document Model, LSI Model, RSS Topic Extraction|
|6. Text Classification I||Naive Bayes Classifier, Smoothing|
|7. Language Models||N-Gram language models, training- and evaluation|
|8. Information Extraction||Chunking, Named Entity Recognition|
|9. Distributional Semantics I||Count based language models: HAL, Random Indexing. Neural Network Language Models: Word2Vec, GloVe, …||Word Embeddings,Generate CBOW from Wikidump|
|10. Distributional Semantics II||DSMs for Sentences, Paragraphs, Textpieces: Compositional Models, Doc2Vec, Skip-Thought, …|
|11. Recurrent Neural Networks||RNN, LSTM, GRU, Attention, seq2seq, HAN||RNN Theorie, Hierarchical Attention Networks,seq2seq Model for Translation|
|12. Text Classification II||Neural Network based Classifiers||CNN Text classification|