Lightweight 3D Dense Face Alignment? New Chinese Model is ‘Fast, Accurate and Stable’

  • Synced
  • Published: 2020-09-28
  • 757

In recent years, 3D face reconstruction and face alignment tasks have gradually been combined into one task: 3D dense face alignment, which is the reconstruction of a face’s 3D geometric structure with pose information. In a new paper, a group of researchers from the Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beihang University and Westlake University propose a novel regression framework in pursuit of “fast, accurate and stable 3D dense face alignment simultaneously.”

3D dense face alignment can power face-related tasks such as facial recognition, animation, facial tracking, attribute classification and image restoration. The researchers say existing methods of 3D dense face alignment mainly concentrate on accuracy, which can for example decrease system speed, limiting the scope of practical applications.

Aiming to strike a balance among speed, accuracy, and stability, they propose a meta-joint optimization strategy to dynamically regress a small set of 3D Morphable Model (3DMM) parameters, which greatly enhances speed and accuracy simultaneously. They then present a virtual synthesis method to further improve stability on videos.

Recent studies are mainly divided into two categories — 3D Morphable Model (3DMM) parameters regression and dense vertices regression, the researchers explain. Dense vertices regression methods directly regress the coordinates of all the 3D points (usually more than 20,000) through a fully convolutional network. The resolution of reconstructed faces however relies on the size of the feature map and these methods rely on heavy networks which are slow and memory-consuming.

Compared with dense vertices, 3DMM parameters have low dimensionality and low redundancy, which the researchers regard as more appropriate to regress using a lightweight network. The regression however becomes challenging, as different 3DMM parameters influence the reconstructed 3D face differently, and parameters must be re-weighted according to their importance during training.

The proposed method architecture comprises four parts: a lightweight backbone-like MobileNet for predicting 3DMM parameters, a meta-joint optimization of fWPDC and VDC, a landmark-regression regularization, and a short-video-synthesis for training.

To handle the optimization problem of the parameters regression framework, the researchers exploited two different loss terms — Vertex Distance Cost (VDC) and Weighted Parameter Distance Cost (WPDC) — and proposed a fast WPDC (fWPDC) as well as a meta-joint optimization to combine the advantages of both fWPDC and VDC.

An overview of the 3D Dense Face Alignment method

The experimental results show the proposed short video-synthesis method significantly improving stability on videos. The model runs at over 50fps on a single CPU core and outperforms previous state-of-the-art heavy models on accuracy and stability.

“Our promising results pave the way for real-time 3D dense face alignment in practical use and the proposed methods may improve the environment by reducing the amount of carbon dioxide released by the huge amounts of energy consumed by GPUs,” the researchers conclude.