LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

LatentHuman: Shape-and-Pose Disentangled Latent Representation
for Human Bodies

3DV 2021

Sandro Lombardi¹, Bangbang Yang², Tianxing Fan², Hujun Bao², Guofeng Zhang², Marc Pollefeys^{1 3}, Zhaopeng Cui²

¹ETH Zurich ²State Key Lab of CAD & CG, Zhejiang University ³Microsoft

Paper

Code (Comming Soon)

Supplementary

Abstract

3D representation and reconstruction of human bodies have been studied for a long time in computer vision. Traditional methods rely mostly on parametric statistical linear models, limiting the space of possible bodies to linear combinations. It is only recently that some approaches try to leverage neural implicit representations for human body modeling, and while demonstrating impressive results, they are either limited by representation capability or not physically meaningful and controllable. In this work, we propose a novel neural implicit representation for the human body, which is fully differentiable and optimizable with disentangled shape and pose latent spaces. Contrary to prior work, our representation is designed based on the kinematic model, which makes the representation controllable for tasks like pose animation, while simultaneously allowing the optimization of shape and pose for tasks like 3D fitting and pose tracking. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses. Experiments demonstrate the improved 3D reconstruction performance over SoTA approaches and show the applicability of our method to shape interpolation, model fitting, pose tracking, and motion retargeting.

Framework Overview

For learning SDF functions per body part, we use a piecewise deformable model conditioned on a shape code $z_{s}$, a per-joint feature $z_{p}^{b}$ describing pose-dependent deformations and canonical query points $x^{i}$. The bone transformations needed for obtaining $z_{p}^{b}$ can be computed with SMPL pose joint rotations and skeleton joints in canonical space (i.e. T-pose). For the former, we adopt VPoser while for the latter, we introduce our novel VJointer module. The per-joint SDF predictions are combined with a SoftMin function to obtain the SDF values of the final mesh.

Applications

Shape Swapping and Pose Animation

By exchanging shape latent codes, we can swap between different shape identities. By exchanging pose parameters, we can also animate one subject with the novel poses taken from others.

Pose Tracking and Motion Retargeting

Given a sparse point cloud (shown on the top), we can optimize over the pose latent space and retarget the tracked motion to cartoon characters (on the bottom right).

Fine-tuning on Raw Scans

Thanks to our non-rigid geometric supervision with self-supervised losses, we can fine-tune LatentHuman on the DFaust raw scan dataset or even on the CAPE clothed human dataset.

Representation Comparison

AMASS/DFaust

AMASS/MoVi

We compare our method with NASA and LEAP on the AMASS-DFaust and the AMASS-MoVi dataset. Our LatentHuman preserves details better while avoiding blend skinning artifacts, e.g., when non-adjacent body parts come closer together where the hands are moved close to the head or in complex situations like cross-legged sitting.

Citation

		
@inproceedings{lombardi2021latenthuman,
    Title = {LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies},
    Author = {Sandro, Lombardi and Bangbang, Yang and Tianxing, Fan and Hujun, Bao and Guofeng, Zhang and Marc, Pollefeys and Zhaopeng, Cui},
    Year = {2021},
    journal = {International Conference on 3D Vision (3DV)},
}