A deep learning framework for gym-gesture recognition using the combination of transformer and 3D poseestimation
Hung Le Viet, Han Le Hoang Ngoc, Khoa Tran Dinh Minh, Son Than Van Hong
In recent years, the gym has become a vital part of social life, with many people hiring personal trainers to monitor their form and enhance their fitness more effectively. To support this trend, we propose an innovative deep learning framework to recognize 22 different types of gym gestures, aimed at developing AI-powered personal trainer applications. Our framework identifies gym gestures based on video in two stages. The first stage is a feature extraction module that employs a 3D pose estimation model to extract skeleton data. The second stage is a fusion module that classifies this skeleton data. Utilizing a Transformer-based fusion module, our framework achieves an impressive 91.07% accuracy and an 88.94% F1-score. This method is both fast and effective, demonstrating significant potential for real-world applications.
CYBERNETICS AND PHYSICS, VOL. 13, NO. 2, 2024, 161–167
DOI: 10.35470/2226-4116-2024-13-2-161-167