Computer Vision | Kutz Research Group

Foreground/Background Separation in Video

Sparse Classification of Features

Optimal Sensors and Pixel Measurements for Classification

Pre-processing Strategies for Optimal Gesture Recognition

Selecting Optimal Gestures from Extenstive Lexicon

Decision and Classification Protocols for Gesture Recognition

Multi-Resolution Analysis for Target Detection

Foreground/Background Separation in Video Streams

Background/foreground separation is typically an integral step in detecting, identifying, tracking, and recognizing objects in video sequences. Most modern computer vision applications demand algorithms that can be implemented in real-time, and that are robust enough to handle diverse, complicated, and cluttered backgrounds. Competitive methods often need to be flexible enough to accommodate changes in a scene due to, for instance, illumination changes that can occur throughout the day, or location changes where the application is being implemented. We have recently introduced the method of dynamic mode decomposition (DMD) for robustly separating video frames into background (low-rank) and foreground (sparse) components. DMD terms with Fourier frequencies near the origin (zero-modes) are interpreted as background (low-rank) portions of the given video frames, and the terms with Fourier frequencies bounded away from the origin are their sparse counterparts. An approximate low-rank/sparse separation similar to robust principal component analysis is achieved at the computational cost of just one singular value decomposition and one linear equation solve. This method has been quite promising for streaming video applications and we have continued to develop DMD methods around this objective.

Multi-Resolution Matrix Decompositions for Target Tracking

Our recent development of the multi-resolution dynamic mode decomposition (mrDMD) is inspired by the observation that the slow- and fast-modes can be separated for such applications as foreground/background subtraction in video feeds. The mrDMD recursively removes low-frequency, or slowly-varying, content from a given collection of snapshots. In applications such as target tracking, the recursive decomposition is capable of separating objects moving at different speeds in a video feed. Our current efforts are aimed at making this a more robust method for accurately tracking targets.

Streaming Methods for Matrix Decompositions and Video

High-dimensional streaming data is difficult to assimilate unless principled strategies are put into practice. This is especially true for HD-video feeds which generate enormous amounts of data in a very short amount to time. Analysis of such HD-quality data requires algorithms capable of on-the-fly analysis of the video content. We have been developing streaming architectures capable of being implemented on a GPU-chip computing platform. Methods centered around target detection and foreground/background separation are especially relevant to consider.

Gesture Recognition

Research in computer vision continues to be of great technological importance given the potentially vast impact in a wide range of applications in recognition software as well as human-computer interactions. Computer vision broadly includes mathematical methods and algorithms for acquiring, processing, analyzing, and understanding images, often which are high-dimensional, in order to produce accurate decisions and classifications about what is observed. One increasingly important branch of the computer vision field concerns gesture recognition, where computers are trained to recognize hand signals, facial expressions, and/or eye movements in order to better interface and interact with humans. For many software applications that require gesture detection, it is often the case that only a few gestures are needed for controlling the software or program from among a nearly endless number of gestures that a person can articulate. Thus given a large lexicon of gestures and an application that requires only a small subset, one would like to know which are the best gestures to choose for the given application. The solution to this problem is also aided by intelligent pre-processing choices of the video. Our work integrates these concepts towards producing better gesture recognition protocols.

Recent Select Publications

J. N. Kutz, X. Fu, S. Brunton and J. Grosek, Dynamic Mode Decomposition for Robust PCA with Applications to Foreground/Background Subtraction in Video Streams and Multi-Resolution Analysis, in CRC Handbook on Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing, T. Bouwmans Ed. (CRC Press, 2015).
J. Grosek and J. N. Kutz, Dynamic mode decomposition for real-time background/foreground separation in video, arXiv:1404.7592.
J. Grosek and J. N. Kutz, Selecting a small set of optimal gestures from an extensive lexicon, International Journal of Computer Applications 119 (2015) 1-8.
J. Grosek, P. Shi and J. N. Kutz, Enhanced Gesture Recognition Performance through Improved Pre-Processing, International Journal of Computer Applications 62 (2013) 1-8.