LAC: Latent Action Composition for Skeleton-based Action Segmentation
(ICCV'2023)

Di Yang1     Yaohui Wang1†     Antitza Dantcheva1     Quan Kong3     Lorenzo Garattoni2
Gianpiero Francesca2     François Brémond1

1Inria,  Université Côte d'Azur    2Toyota Motor Europe    3Woven by Toyota

Corresponding author

Abstract

In this work, we propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation. LAC is composed of a novel generation module towards synthesizing new sequences. Specifically, we design a linear latent space in the generator to represent primitive motion. New composed motions can be synthesized by simply performing arithmetic operations on latent representations of multiple input skeleton sequences. LAC leverages such synthesized sequences, which have large diversity and complexity, for learning visual representations of skeletons in both sequence and frame spaces via contrastive learning. The resulting visual encoder has a high expressive power and can be effectively transferred onto action segmentation tasks by end-to-end fine-tuning without the need for additional temporal models. We conduct a study focusing on transfer-learning and we show that representations learned from pre-trained LAC outperform the state-of-the-art by a large margin on TSU, Charades, PKU-MMD datasets.

[Paper]      [Code]      [Bibtex]