|Credits||2SWS/3ECTS for 113446b + 4SWS/6ECTS for 113446a = 6SWS/9ECTS|
Note: This lecture and the lab exercise Data Mining Lab together constitute the module 113446 Data Mining. It is not possible to attend and credit only one of these lectures! However, it is not necessary to attend these lectures in the same term. There are 2 independent grades in both lectures of the module. The grade for the module is a weighted average of both parts. The weights are:
- 3⁄5 for lab exercise data mining and pattern recognition
- 2⁄5 for the NLP lecture.
- First lesson in term WS 18⁄19: Monday, 15.10. 2018
Natural Language Processing
Natural Language Processing (NLP) deals with techniques that enable computers to understand the meaning of text, which is written in a natural language. Thus NLP constitutes an essential part of Human Computer Interaction (HCI). As a science NLP can be considered as the field, where Computer Science, Artificial Intelligence, Machine Learning and Linguistics overlap.
NLP enables applications like intelligent search engines, dialog systems, question-answering systems, machine translation, document classification, sentiment analysis or opinion mining.
In this lecture the basic techniques of NLP will be taught. However, the lecture does not only provide the theory but also the implementation of the relevant NLP procedures.
Structure, Contents, Documents
Note: Some of the links below are not yet active, since the corresponding documents are currently updated! The current state of all jupyter notebooks can be downloaded from NLP Jupyter Notebooks.
|1. Introduction||Definition, Application and Challenges of NLP, Structure of this course|
|2. Access Text and Preprocess||Access Text from textfiles, Web-Sites (HTML), RSS-Feeds, API, Corpora, Segmentation into word and sentences, Regular Expressions|
|3. Morphology||Normalisation, Stemming, Lemmatisation, Word Similarity, Correction|
|4. POS-Tagging and Chunking||Part-Of-Speech, Tagsets, Tagging|
|5. Vector-Space Document Models||Bag-of-Words, tf-idf, similarity-measures, Latent-Semantic-Indexing, Information Retrieval||Gensim Document Model, LSI Model, RSS Topic Extraction|
|6. Text Classification I||Naive Bayes Classifier, Smoothing|
|7. Language Models||N-Gram language models, training- and evaluation|
|8. Distributional Semantics I||Count based language models: HAL, Random Indexing. Neural Network Language Models: Word2Vec, GloVe, …||Generate CBOW from Wikidump|
|9. Distributional Semantics II||DSMs for Sentences, Paragraphs, Textpieces: Compositional Models, Doc2Vec, Skip-Thought, …|
|10. Text Classification II||Neural Network based Classifiers||CNN Text classification|