Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis#

  • Authors: Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler

  • Affiliations: NVIDIA, University of Toronto, Vector Institute

  • NeurIPS 2021

  • Links: PDF, Project Page

Summary#

Key Ideas#

3D Representation#

A shape is represented with a signed distance function (SDF) encoded with a deformable tetrahedral grid [1]. The grid fully tetrahedralizes a unit cube, where each cell is a tetrahedron with 4 vertices and faces. Moreover, the grid vertices can deform to represent the geometry of the shape more efficiently. The use of SDF, instead of occupancy values [1], provides more flexibility in representing the underlying surface.

Deformable tetrahedral mesh. The deformable tetrahedral mesh is denoted with \((V_T, T)\), where \(V_T\) is the set of vertices and \(T = \{T_k\}_{k=1}^K\) is the set of tetrahedra. \(T_k\) is representated with four vertices \(\{v_{ak}, v_{bk}, v_{ck}, v_{dk}\}\). We represent the SDF by interpolating the SDF values defined on the vertices with the barycentric interpolation.

Volume subdivision. Shapes are represented in a coarse to fine manner for efficiency. Surface tetrahedra are defined as those with vertices of different SDF signs. We subdivide surface tetrahedra, as well as their immediate neighbors, by adding a mid point to each edge.

../_images/fig-01.png

Marching tetrahedra for converting between an implicit and explicit representation. We use Marching Tetrahedra [2] to convert the SDF into an explicit triangular mesh. Based on the signs of the SDF values, MT determines the surface typology inside the tetrahedron. Once the surface typology is identified, the vertex location of the iso-surface is computed at the zero corssings of the linear interpolation along the tetrahedron’s edges.

../_images/fig-02.png

Surface subdivision. Having a surface mesh as output allows us to increase the representation power and the visual quality with a differentiable subdivision module. The authors follow the Loop Subdivision method [3,4] and used a fixed set of learnable parameters, including positions \(v_i'\) and smoothness \(\alpha_i\).

DMTet#

DMTet is a neural network that utilizes the proposed 3D representation and aims to output a high resolution 3D mesh \(M\) from input \(x\) (a point cloud or a coarse voxelized shape).

../_images/fig-03.png

3D generator. We first use PVCNN [5] to extract a feature volume \(F_\text{vol}(x)\) from a point cloud. Then we predict SDF value for each vertex in the initial deformable tetrahedral grid using a fully-connected network \(s(v), f(v) = \text{MLP}(F_\text{vol}(v, x), v)\), where \(f(v)\) is a feature vector used for surface refinement. We iteratively refine the surface and subdivide the tetrahedral grid by identifying the surface tetrahedra, predicting position offsets \(\Delta v_i\) and SDF residual values \(\Delta s(v_i)\) for each vertex \(i\) in surface tetrahedra with a graph convolutional network (GCN). After surface refinement, we perform the volume subdivision followed by an additional surface refinement step. Finally, we further apply learnable surface subdivision with GCN and Loop Subdivision.

3D discriminator. The authors empiricially found that using a 3D CNN from DECOR-GAN [6] on the SDF is effective to capture the local details. We first select a high-curvature vertex \(v\) from the target mesh and compute the ground-truth and predicted SDF field \(S_\text{real} \in \mathbb{R}^{N \times N \times N}\) and \(S_\text{pred} \in \mathbb{R}^{N \times N \times N}\). We feed \(S_\text{real}\) or \(S_\text{pred}\) into the discriminator, along with the feature vector \(F_\text{vol}(v, x)\).

Loss function.

  • Surface alignment loss. A set of points is sampled from the ground-truth and predicted mesh, \(P_\text{gt}\) and \(P_\text{pred}\), and we minimize the L2 Chamfer Distance \(\mathcal{L}_\text{cd}\) and the normal consistency loss \(\mathcal{L}_\text{normal}\).

  • Adversarial loss. The adversarial loss \(\mathcal{L}_\text{D} + \mathcal{L}_\text{G}\) proposed in LSGAN [7].

  • Regularizations. A SDF loss \(\mathcal{L}_\text{SDF}\) is added to regularize the SDF values. A L2 regularization loss \(\mathcal{L}_\text{def}\) is also applied to the predicted vertex deformations to avoid artifacts.

Technical Details#

Notes#

References#

[1] J. Gao, W. Chen, T. Xiang, C. Tsang, A. Jacobson, M. McGuire, and S. Fidler. Learning deformable tetrahedral meshes for 3d reconstruction. In NeurIPS, 2020.

[2] A. Doi and A. Koide. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE TRANSACTIONS on Information and Systems, 1991.

[3] Charles Loop. Smooth subdivision surfaces based on triangles. January 1987.

[4] H. Liu, V. Kim, S. Chaudhuri, N. Aigerman, and A. Jacobson. Neural subdivision. ACM Trans. Graph., 2020.

[5] Z. Liu, H. Tang, Y. Lin, and S. Han. Point-voxel cnn for efficient 3d deep learning. In NeurIPS, 2019.

[6] Z. Chen, V. Kim, M Fisher, N. Aigerman, H. Zhang, and S. Chaudhuri. Decor-gan: 3d shape detailization by conditional refinement. In CVPR, 2021.

[7] X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, and S. Smolley. Least squares generative adversarial networks. In ICCV, 2017.