Methods Based on Random Finite Sets for Object Tracking inComputer Vision and Robotics

Name

Nicolai Wojke

Status

Abgeschlossen

Abschluss der Promotion

11.7.2018

Erstbetreuer*in

Prof. Dr.-Ing. Dietrich Paulus

This thesis addresses the automated identification and localization of a time-varying number ofobjects in a stream of sensor data. The problem is challenging due to its combinatorial nature: Ifthe number of objects is unknown, the number of possible object trajectories grows exponentiallywith the number of observations. Random finite sets are a relatively new theory that has beendeveloped to derive at principled and efficient approximations. It is based around set-valuedrandom variables that contain an unknown number of elements which appear in arbitrary orderand are themselves random. While extensively studied in theory, random finite sets have not yetbecome a leading paradigm in practical computer vision and robotics applications.This thesis explores random finite sets in visual tracking applications. The first methoddeveloped in this thesis combines set-valued recursive filtering with global optimization. Theproblem is approached in a min-cost flow network formulation, which has become a standardinference framework for multiple object tracking due to its efficiency and optimality. A mainlimitation of this formulation is a restriction to unary and pairwise cost terms. This circumstancemakes integration of higher-order motion models challenging. The method developed in thisthesis approaches this limitation by application of a Probability Hypothesis Density filter. TheProbability Hypothesis Density filter was the first practically implemented state estimator basedon random finite sets. It circumvents the combinatorial nature of data association itself bypropagation of an object density measure that can be computed efficiently, without maintainingexplicit trajectory hypotheses. In this work, the filter recursion is used to augment measurementswith an additional hidden kinematic state to be used for construction of more informed flownetwork cost terms, e.g., based on linear motion models. The method is evaluated on publicbenchmarks where a considerate improvement is achieved compared to network flow formulationsthat are based on static features alone, such as distance between detections and appearancesimilarity.A second part of this thesis focuses on the related task of detecting and tracking a singlerobot operator in crowded environments. Different from the conventional multiple object trackingscenario, the tracked individual can leave the scene and later reappear after a longer periodof absence. Therefore, a re-identification component is required that picks up the track on re-entrance. Based on random finite sets, the Bernoulli filter is an optimal Bayes filter that providesa natural representation for this type of problem. In this work, it is shown how the Bernoulli filtercan be combined with a Probability Hypothesis Density filter to track operator and non-operatorssimultaneously. The method is evaluated on a publicly available multiple object tracking datasetas well as on custom sequences that are specific to the targeted application. Experiments showreliable tracking in crowded scenes and robust re-identification after long-term occlusion.Finally, a third part of this thesis focuses on appearance modeling as an essential aspect of anymethod that is applied to visual object tracking scenarios. Therefore, a feature representation thatis robust to pose variations and changing lighting conditions is learned offline, before the actualtracking application. This thesis proposes a joint classification and metric learning objectivewhere a deep convolutional neural network is trained to identify the individuals in the trainingset. At test time, the final classification layer can be stripped from the network and appearancesimilarity can be queried using cosine distance in representation space. This framework representsan alternative to direct metric learning objectives that have required sophisticated pair or tripletsampling strategies in the past. The method is evaluated on two large-scale person re-identificationdatasets where competitive results are achieved overall. In particular, the proposed method bettergeneralizes to the test set compared to a network trained with the well-established triplet loss.

FACHBEREICHE

Bildungswissenschaften

Philologie / Kulturwissenschaften

Mathematik / Naturwissenschaften

SCHNELLEINSTIEG

FOLGE UNS