|Credits||4 SWS / 5 ECTS|
- First lesson of term SS 21: 16.03.2021
Important notice for the summer term 2020: The course will initially be offered as a synchronous distance learning course during the SARS-CoV-2-related restrictions in term SS 20. I will post the corresponding Zoom link to the group of online registered users in the Persönlicher Stundenplan before 16.03.2021. The timetable according to the Starplan will apply. If or when we may meet again in lecture halls, the room indicated in Starplan will apply.
The goal of computer vision is to enable machines to see and understand data from images and videos. To achieve this goal the central computer vision task is object recognition. Due to the immense increase of image and video data, provided by digital cameras and made available in the internet, intelligent systems to monitor, find, filter and automatically organize visual data are urgently needed. In recent years, Deep Learning has revolutionized object recognition applications.
This lecture provides a comprehensive insight into state of the art object recognition methods and algorithms and presents modern applications in which these techniques are implemented. Well established methods for image-processing, filtering, feature-extraction and machine-learning are covered as well as the most recent and performant Deep Learning architectures.
In order to sketch a better picture of visual object recognition only a few applications are listed here:
- Face recognition
- Driver assistance systems and autonomous driving
- optical inspection
- Video surveillance systems, Tracking
- Document forgery detection
- Content-based image search (CBIR), automatic image clustering (Photo smartphone apps)
- Video-data mining
- Automatic image annotation and captioning
- Background subtraction
- Autostitching to create panorama views (several apps in the app stores)
- Vision based interfaces, e.g. Kinect
- Medical- and Neuroimaging, e.g. cancer detection
- Pose Estimation
- Style Transfer
- Automatic Image Generation and Modification
- Super Resolution
A common categorization of object recognition is to divide into
- Recognition of specific objects (identification): E.g. to find a particular person or face or a particular building or a particular traffic sign
- Recognition of object categories: Here the task is to find and locate instances of a given category in an image, e.g. find faces or find pedestrians or find cars. This category can be further subdivided in image classification, object localisation, object recognition, semantic segmentation
Machine Learning algorithms are applied for both of these recognition categories. The choice of a suitable ML algorithm for a given task is important. However, even more important is the modelling and description of visual features. Here a common categorization is to divide into
- Global features, e.g. the pixel values of the entire image, color histograms or multidimensional receptive field histograms. Gloabal features can be applied directly or after a transformation to a subspace, e.g. by applying Principal Component Analysis or Linear Discriminant Analysis
- Local Features: In contrast to global features local features do not encode the appearance of the entire image in a single descriptor. Instead a local feature describes only a small region around a keypoint in the image. Keypoints are e.g. edges. Usually a large amount of local features can be extracted from a single image.
Structure, Contents, Documents
In the last weeks of the term, student groups have to prepare selected topics in object recognition and present them in 60-minute lectures. This presentation constitutes the exam. Prerequisite for admission to the exam is the completion and submission of the 4 assignments.
Gitlab Repo for this course (contains all slides, notebooks and jupyterbook (html version of jupyter notebooks)). Note that the repo will be extended and modified during the term. Therefore, to stay up to date, regularly pull it.
Today Machine Learning constitutes an essential part in Object Recognition. Therefore, it is best if you attend the Machine Learning lecture before the Object Recognition lecture. If this is not possible, you may self-study the basics of Machine Learning, Neural Networks and Deeplearning by these videos:
|Introduction||Course Structure, Motivation, Definitions, Applications|
|Image Processing Basics||Filtering, Noise Surpression, Pyramids and Scale, Template Matching, Edge Detection||Access and Display Images, Basic Filter Operations, Low Pass Filter,Convolution Filtering video|
|Global Features||Pixel Intensities, Color Histograms, Multidimensional Receptive Field Histograms, Probabilistic Recognition|
|Subspace Features||PCA, LDA, Face Recognition with Eigenfaces and Fisherfaces|
|Histogram of Oriented Gradients||HoG feature descriptors, pedestrian detection and -tracking||HoG features [.ipynb]; [Pedestrian Detection [.ipynb]|
|Local Features||Harris-Förstner Corner detection, SIFT-Features||Harris Förstner [.ipynb]; SIFT Features [.ipynb]|
|Specific Object Recognition with local features||Efficient Similarity Search, Indexing Features with Visual Vocabularies, Geometric Verification|
|Window based object detection||Vioal-Jones Face Detection; Pedestrian Detection|
|Generic Object Recognition||Clustering of Local Features, Visual Words, Spatial Pyramid Matching, Sparse Coding|
|Deep Neural Networks for Object Recognition||Convolutional Neural Networks, AlexNet, OverFeat, VGGNet, ResNet, Semantic Segmentation, Deconvolution, Unpooling||CNN|
|Object Detection||R-CNN, SPPnet, Fast R-CNN|
|CNN-based 2D Multiperson Pose Estimation||Pose Estimation Notebook|
|Segmentation||Hierarchical Clustering, Mean-Shift Clustering|
|Tracking||Simple Tracking Strategies, Background Subtraction, Kalman-Filter|