Time only SS
Date Wednesday, 08.15h-11.30h
Room 136
Credits 4 SWS / 5 ECTS
Exam Presentation


  • First lesson of term SS 22: 15.03.2023

Object Recognition

The goal of computer vision is to enable machines to see and understand data from images and videos. To achieve this goal the central computer vision task is object recognition. Due to the immense increase of image and video data, provided by digital cameras and made available in the internet, intelligent systems to monitor, find, filter and automatically organize visual data are urgently needed. In recent years, Deep Learning has revolutionized object recognition applications.

This lecture provides a comprehensive insight into state of the art object recognition methods and algorithms and presents modern applications in which these techniques are implemented. Well established methods for image-processing, filtering, feature-extraction and machine-learning are covered as well as the most recent and performant Deep Learning architectures.

In order to provide a better picture of visual object recognition only a few applications are listed here:

  • Face recognition
  • Driver assistance systems and autonomous driving
  • Optical inspection
  • Video surveillance systems, Tracking
  • Document forgery detection
  • Content-based image search (CBIR), automatic image clustering (Photo smartphone apps)
  • Video-data mining
  • Automatic image annotation and captioning
  • Background subtraction
  • Autostitching to create panorama views (several apps in the app stores)
  • Vision based interfaces, e.g. Kinect
  • Medical- and Neuroimaging, e.g. cancer detection
  • Pose Estimation
  • Style Transfer
  • Automatic Image Generation and Modification
  • Super Resolution

A common categorization of object recognition is to divide into

  • Recognition of specific objects (identification): E.g. to find a particular person or face or a particular building or a particular traffic sign
  • Recognition of object categories: Here the task is to find and locate instances of a given category in an image, e.g. find faces or find pedestrians or find cars. This category can be further subdivided in image classification, object localisation, object recognition, semantic segmentation

Machine Learning algorithms are applied for both of these recognition categories. The choice of a suitable ML algorithm for a given task is important. However, even more important is the modelling and description of visual features. Here a common categorization is to divide into

  • Global features, e.g. the pixel values of the entire image, color histograms or multidimensional receptive field histograms. Gloabal features can be applied directly or after a transformation to a subspace, e.g. by applying Principal Component Analysis or Linear Discriminant Analysis
  • Local Features: In contrast to global features local features do not encode the appearance of the entire image in a single descriptor. Instead a local feature describes only a small region around a keypoint in the image. Keypoints are e.g. edges. Usually a large amount of local features can be extracted from a single image.


The entire lecture consists of two phases:

  1. The first phase (approx. 10 weeks) is a conventional lecture
  2. In the last 2 weeks students present their work on selected topics related to Object Recognition

The student presentations constitute the exam. Prerequisite for admission to the exam is the completion and submission of the 4 assignments.

Lecture Contents

Today Machine Learning constitutes an essential part in Object Recognition. Therefore, it is best if you attend the Machine Learning lecture before the Object Recognition lecture. If this is not possible, you may self-study the basics of Machine Learning, Neural Networks and Deeplearning by these videos:

The table below contains old lecture materials. Not relevant for SS 23.

Lecture Contents Document Links
Introduction Course Structure, Motivation, Definitions, Applications
Image Processing Basics Filtering, Noise Surpression, Pyramids and Scale, Template Matching, Edge Detection Access and Display Images, Basic Filter Operations, Low Pass Filter,Convolution Filtering video
Global Features Pixel Intensities, Color Histograms, Multidimensional Receptive Field Histograms, Probabilistic Recognition
Subspace Features PCA, LDA, Face Recognition with Eigenfaces and Fisherfaces
Histogram of Oriented Gradients HoG feature descriptors, pedestrian detection and -tracking HoG features [.ipynb]; [Pedestrian Detection [.ipynb]
Local Features Harris-Förstner Corner detection, SIFT-Features Harris Förstner [.ipynb]; SIFT Features [.ipynb]
Specific Object Recognition with local features Efficient Similarity Search, Indexing Features with Visual Vocabularies, Geometric Verification
Window based object detection Vioal-Jones Face Detection; Pedestrian Detection
Generic Object Recognition Clustering of Local Features, Visual Words, Spatial Pyramid Matching, Sparse Coding
Deep Neural Networks for Object Recognition Convolutional Neural Networks, AlexNet, OverFeat, VGGNet, ResNet, Semantic Segmentation, Deconvolution, Unpooling CNN
Object Detection R-CNN, SPPnet, Fast R-CNN
CNN-based 2D Multiperson Pose Estimation Pose Estimation Notebook
Segmentation Hierarchical Clustering, Mean-Shift Clustering
Tracking Simple Tracking Strategies, Background Subtraction, Kalman-Filter