LatentHuman: Shape-and-Pose Disentangled Latent Representation
for Human Bodies

3DV 2021

Sandro Lombardi1, Bangbang Yang2, Tianxing Fan2, Hujun Bao2, Guofeng Zhang2, Marc Pollefeys1 3, Zhaopeng Cui2

1ETH Zurich    2State Key Lab of CAD & CG, Zhejiang University    3Microsoft



3D representation and reconstruction of human bodies have been studied for a long time in computer vision. Traditional methods rely mostly on parametric statistical linear models, limiting the space of possible bodies to linear combinations. It is only recently that some approaches try to leverage neural implicit representations for human body modeling, and while demonstrating impressive results, they are either limited by representation capability or not physically meaningful and controllable. In this work, we propose a novel neural implicit representation for the human body, which is fully differentiable and optimizable with disentangled shape and pose latent spaces. Contrary to prior work, our representation is designed based on the kinematic model, which makes the representation controllable for tasks like pose animation, while simultaneously allowing the optimization of shape and pose for tasks like 3D fitting and pose tracking. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses. Experiments demonstrate the improved 3D reconstruction performance over SoTA approaches and show the applicability of our method to shape interpolation, model fitting, pose tracking, and motion retargeting.

Framework Overview


For learning SDF functions per body part, we use a piecewise deformable model conditioned on a shape code $z_{s}$, a per-joint feature $z_{p}^{b}$ describing pose-dependent deformations and canonical query points $x^{i}$. The bone transformations needed for obtaining $z_{p}^{b}$ can be computed with SMPL pose joint rotations and skeleton joints in canonical space (i.e. T-pose). For the former, we adopt VPoser while for the latter, we introduce our novel VJointer module. The per-joint SDF predictions are combined with a SoftMin function to obtain the SDF values of the final mesh.

2-Minutes Introduction


Shape Swapping and Pose Animation

By exchanging shape latent codes, we can swap between different shape identities. By exchanging pose parameters, we can also animate one subject with the novel poses taken from others.

Pose Tracking and Motion Retargeting

Given a sparse point cloud (shown on the top), we can optimize over the pose latent space and retarget the tracked motion to cartoon characters (on the bottom right).

Fine-tuning on Raw Scans

Thanks to our non-rigid geometric supervision with self-supervised losses, we can fine-tune LatentHuman on the DFaust raw scan dataset or even on the CAPE clothed human dataset.

Representation Comparison


We compare our method with NASA and LEAP on the AMASS-DFaust and the AMASS-MoVi dataset. Our LatentHuman preserves details better while avoiding blend skinning artifacts, e.g., when non-adjacent body parts come closer together where the hands are moved close to the head or in complex situations like cross-legged sitting.

10-minutes Talk


    Title = {LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies},
    Author = {Sandro, Lombardi and Bangbang, Yang and Tianxing, Fan and Hujun, Bao and Guofeng, Zhang and Marc, Pollefeys and Zhaopeng, Cui},
    Year = {2021},
    journal = {International Conference on 3D Vision (3DV)},