Generative Tweening

LONG-TERM INBETWEENING OF 3D HUMAN MOTIONS

Yi Zhou1    Jingwan Lu1    Connelly Barnes1   Jimei Yang1   Sitao Xiang2    Hao Li3

1Adobe Research    2University of Southern California   3Pinscreen

paper

The ability to generate complex and realistic human body animations at scale, while following specific artistic constraints, has been a fundamental goal for the game and animation industry for decades. To this end, we introduce the problem of long-term inbetweening,which involves automatically synthesizing complex motions over a long time interval given very sparse keyframes by users. We identify a number of challenges related to this problem, including maintaining biomechanical and keyframe constraints, preserving natural motions, and designing the entire motion sequence holistically while considering all constraints. To solve these problems, we introduce a biomechanically constrained generative adversarial network that performs long-term inbetweening of human motions, conditioned on keyframe constraints. Trained with 79 classes of captured motion data, our network performs robustly on a variety of highly complex motion styles.

Architecture.

Our network uses a novel two-stage approach where it first predicts local motion in the form of joint angles, and then predicts global motion, i.e. the global path that the character follows. It also uses a Range-Constrained Forward Kinematic layer (RC-FK layer) to predict the joint rotations in Euler angles. The RC-FK layer can handle noise in the motion dataset and avoid rotation discontinuities including gimbal lock.

Flexible Input Keyframe Formats

Our model can be trained to incorporate different types of input keyframes (pink) and generate 3D motions (blue) inbetween the keyframes. Click to watch the examples below.

1. 3D whole body skeleton

2. 3D partial body joints

3. Spots on the ground

4. 2D Skeletons (X-Y Plane)

5. 2D Sketches.


Style and Variation Control

Since there are typically a number of possible motions that could satisfy the given user constraints, we also enable our network to generate a variety of outputs with a scheme that we call Motion DNA. This approach allows the user to manipulate and influence the output content by feeding seed motions (DNA) to the network.

Representative poses: randomly or manualy picked poses that can represent a kind of motion.

MotionDNA: latent code generated from a set of representative poses to indicate the target style of motion

1. Different MotionDNAs produce different types of motions

2. Similar MotionDNAs produce similar types of different motions

3. Synthesized motion can have different magnitudes of variations.

Big variations

Small variations

4. Different motions given the same keyframe spots.