|Credits||4 SWS / 5 ECTS|
- First lesson of term SS 22: 15.03.2023
The goal of computer vision is to enable machines to see and understand data from images and videos. To achieve this goal the central computer vision task is object recognition. Due to the immense increase of image and video data, provided by digital cameras and made available in the internet, intelligent systems to monitor, find, filter and automatically organize visual data are urgently needed. In recent years, Deep Learning has revolutionized object recognition applications.
This lecture provides a comprehensive insight into state of the art object recognition methods and algorithms and presents modern applications in which these techniques are implemented. Well established methods for image-processing, filtering, feature-extraction and machine-learning are covered as well as the most recent and performant Deep Learning architectures.
In order to provide a better picture of visual object recognition only a few applications are listed here:
- Face recognition
- Driver assistance systems and autonomous driving
- Optical inspection
- Video surveillance systems, Tracking
- Document forgery detection
- Content-based image search (CBIR), automatic image clustering (Photo smartphone apps)
- Video-data mining
- Automatic image annotation and captioning
- Background subtraction
- Autostitching to create panorama views (several apps in the app stores)
- Vision based interfaces, e.g. Kinect
- Medical- and Neuroimaging, e.g. cancer detection
- Pose Estimation
- Style Transfer
- Automatic Image Generation and Modification
- Super Resolution
A common categorization of object recognition is to divide into
- Recognition of specific objects (identification): E.g. to find a particular person or face or a particular building or a particular traffic sign
- Recognition of object categories: Here the task is to find and locate instances of a given category in an image, e.g. find faces or find pedestrians or find cars. This category can be further subdivided in image classification, object localisation, object recognition, semantic segmentation
Machine Learning algorithms are applied for both of these recognition categories. The choice of a suitable ML algorithm for a given task is important. However, even more important is the modelling and description of visual features. Here a common categorization is to divide into
- Global features, e.g. the pixel values of the entire image, color histograms or multidimensional receptive field histograms. Gloabal features can be applied directly or after a transformation to a subspace, e.g. by applying Principal Component Analysis or Linear Discriminant Analysis
- Local Features: In contrast to global features local features do not encode the appearance of the entire image in a single descriptor. Instead a local feature describes only a small region around a keypoint in the image. Keypoints are e.g. edges. Usually a large amount of local features can be extracted from a single image.
The entire lecture consists of two phases:
- The first phase (approx. 10 weeks) is a conventional lecture
- In the last 2 weeks students present their work on selected topics related to Object Recognition
The student presentations constitute the exam. Prerequisite for admission to the exam is the completion and submission of the 4 assignments.
Jupyterbook of lecture contents: Click on the card Object Recognition on page https://lectures.mi.hdm-stuttgart.de and ask the instructor for the credentials.
Today Machine Learning constitutes an essential part in Object Recognition. Therefore, it is best if you attend the Machine Learning lecture before the Object Recognition lecture. If this is not possible, you may self-study the basics of Machine Learning, Neural Networks and Deeplearning by these videos:
The table below contains old lecture materials. Not relevant for SS 23.
|Introduction||Course Structure, Motivation, Definitions, Applications|
|Image Processing Basics||Filtering, Noise Surpression, Pyramids and Scale, Template Matching, Edge Detection||Access and Display Images, Basic Filter Operations, Low Pass Filter,Convolution Filtering video|
|Global Features||Pixel Intensities, Color Histograms, Multidimensional Receptive Field Histograms, Probabilistic Recognition|
|Subspace Features||PCA, LDA, Face Recognition with Eigenfaces and Fisherfaces|
|Histogram of Oriented Gradients||HoG feature descriptors, pedestrian detection and -tracking||HoG features [.ipynb]; [Pedestrian Detection [.ipynb]|
|Local Features||Harris-Förstner Corner detection, SIFT-Features||Harris Förstner [.ipynb]; SIFT Features [.ipynb]|
|Specific Object Recognition with local features||Efficient Similarity Search, Indexing Features with Visual Vocabularies, Geometric Verification|
|Window based object detection||Vioal-Jones Face Detection; Pedestrian Detection|
|Generic Object Recognition||Clustering of Local Features, Visual Words, Spatial Pyramid Matching, Sparse Coding|
|Deep Neural Networks for Object Recognition||Convolutional Neural Networks, AlexNet, OverFeat, VGGNet, ResNet, Semantic Segmentation, Deconvolution, Unpooling||CNN|
|Object Detection||R-CNN, SPPnet, Fast R-CNN|
|CNN-based 2D Multiperson Pose Estimation||Pose Estimation Notebook|
|Segmentation||Hierarchical Clustering, Mean-Shift Clustering|
|Tracking||Simple Tracking Strategies, Background Subtraction, Kalman-Filter|