Action recognition is to classify videos containing human actions based on the modeling of the video sequences. Existing methods can be divided into two categories, of which 3D CNN based methods achieve satisfactory results but suffer from huge computational cost, while 2D methods are efficient with relatively poor performance. Thus the key challenge is to design effective and efficient algorithms to capture representative features from videos.
VSLab has been working on human action recognition for years, and proposed a variety of methods, such as linear dynamical system (LDS), tree-pattern graph matching, graph convolutional network and so on.