|Credits||4 SWS / 5 ECTS|
- First lesson of term SS 20: 18.03.2020
The goal of computer vision is to enable machines to see and understand data from images and videos. To achieve this goal the central computer vision task is object recognition. Due to the immense increase of image and video data, provided by digital cameras and made available in the internet, intelligent systems to monitor, find, filter and automatically organize visual data are urgently needed. In recent years, Deep Learning has revolutionized object recognition applications.
This lecture provides a comprehensive insight into state of the art object recognition methods and algorithms and presents modern applications in which these techniques are implemented. Well established methods for image-processing, filtering, feature-extraction and machine-learning are covered as well as the most recent and performant Deep Learning architectures.
In order to sketch a better picture of visual object recognition only a few applications are listed here:
- Digital cameras integrate face detection and automatically focus on the detected face. There exists even cameras, which recognize smile.
- Face recognition is implemented e.g. in access control systems or in social media platforms to identify persons.
- Driver assistance systems implement e.g. pedestrian detection, lane detection, etc.
- Autonomous cars and robots are utilized with cameras. But they don’t understand their environment without recognition algorithms.
- Applications like Google Goggles recognize landmarks, text, artwork and so on and provide information to the recognized objects.
- Video surveillance systems
- Content-based image search
- Video-data mining
- Automatic image annotation and captioning
- Background subtraction, e.g. in MS Office 2010
- Autostitching to create panorama views (several apps in the app stores)
- Vision based interfaces, e.g. Kinect
- Medical- and Neuroimaging
A common categorization of object recognition is to divide into
- Recognition of specific objects : E.g. to find a particular person or face or a particular building or a particular traffic sign
- Recognition of object categories: Here the task is to find and locate instances of a given category in an image, e.g. find faces or find pedestrians or find cars. This task, where a single category is given and instances of this category must be found is also called object detection. A more challenging task is to recognize instances of multiple categories. E.g. if a robot shall identify all objects in its environment.
Machine Learning algorithms are applied for both of these recognition categories. The choice of a suitable ML algorithm for a given task is important. However, even more important is the modelling and description of visual features. Here a common categorization is to divide into
- Global features, e.g. the pixel values of the entire image, color histograms or multidimensional receptive field histograms. Gloabal features can be applied directly or after a transformation to a subspace, e.g. by applying Principal Component Analysis or Linear Discriminant Analysis
- Local Features: In contrast to global features local features do not encode the appearance of the entire image in a single descriptor. Instead a local feature describes only a small region around a keypoint in the image. Keypoints are e.g. edges. Usually a large amount of local features can be extracted from a single image.
Structure, Contents, Documents
|Introduction||Course Structure, Motivation, Definitions, Applications|
|Image Processing Basics||Filtering, Noise Surpression, Pyramids and Scale, Template Matching, Edge Detection||Access and Display Images, Basic Filter Operations, Low Pass Filter|
|Global Features||Pixel Intensities, Color Histograms, Multidimensional Receptive Field Histograms, Probabilistic Recognition|
|Subspace Features||PCA, LDA, Face Recognition with Eifenfaces and Fisherfaces|
|Histogram of Oriented Gradients||HoG feature descriptors, pedestrian detection and -tracking||HoG features [.html], HoG features [.ipynb]; Pedestrian Detection [.html]; Pedestrian Detection [.ipynb]|
|Local Features||Harris-Förstner Corner detection, SIFT-Features||Harris Förstner [.html], Harris Förstner [.ipynb]; SIFT Features [.html]; SIFT Features [.ipynb]|
|Specific Object Recognition with local features||Efficient Similarity Search, Indexing Features with Visual Vocabularies, Geometric Verification|
|Window based object detection||Vioal-Jones Face Detection; Pedestrian Detection|
|Generic Object Recognition||Clustering of Local Features, Visual Words, Spatial Pyramid Matching, Sparse Coding|
|Deep Neural Networks for Object Recognition||Convolutional Neural Networks, AlexNet, OverFeat, VGGNet, ResNet, Semantic Segmentation, Deconvolution, Unpooling||CNN|
|Object Detection||R-CNN, SPPnet, Fast R-CNN|
|CNN-based 2D Multiperson Pose Estimation||Pose Estimation Notebook|
|Segmentation||Hierarchical Clustering, Mean-Shift Clustering|
|Tracking||Simple Tracking Strategies, Background Subtraction, Kalman-Filter|