Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance#

Authors: Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Ronen Basri, Yaron Lipman
Affiliations: Weizmann Institute of Science
NeurIPS 2020
Links: arXiv, Project Page, Code

Summary#

In this work the authors introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. By training the network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset, the model can produce state-of-the-art 3D surface reconstructions with high fidelity, resolution, and detail.

Key Ideas#

The goal is to reconstruct the geometry of an object from masked 2D images with possibly rough or noisy camera information. There are three unknowns:

geometry $θ \in R^{m}$
appearance $γ \in R^{n}$
cameras $τ \in R^{k}$

The geometry is represented as the zero level set of an MLP $f$

S_{θ} = {x \in R^{3} ∣ f (x; θ) = 0}

IDR forward model. Let the pixel be $p$ and the ray through pixel $p$ be $R_{p} (τ) = {c_{p} + t v_{p} ∣ t \geq 0}$ . Let ${\hat{x}}_{p} = {\hat{x}}_{p} (θ, τ)$ denote the first intersection. The rendered color of the pixel $L_{p}$ is given by

L_{p} (θ, γ, τ) = M ({\hat{x}}_{p}, {\hat{n}}_{p}, {\hat{z}}_{p}, v_{p}; γ)

Approximation of the surface light field. The surface light field radiance $L$ is determined by two functions: the bidrectional reflectance distribution function (BRDF) and the light emitted in the scene.

The BRDF function $B (x, n, w^{o}, w^{i})$ describes the proportion of reflected radiance leaving the surface point $x$ with normal $n$ at direction $w^{o}$ with respect to the incoming radiance from direction $w^{i}$ .
The light sources are described by a function $L^{e} (x, w^{o})$ measuring the emitted radiance of light at point $x$ in direction $w^{o}$ .

The overall rendering equation is given by

L (\hat{x}, w^{o}) = L^{e} (\hat{x}, w^{o}) + \int_{Ω} B (\hat{x}, \hat{n}, w^{i}, w^{o}) L^{i} (\hat{x}, w^{i}) d w^{i} = M (\hat{x}, \hat{n}, v; γ)

where $M$ is a sufficiently large MLP approximating $M_{0}$ . For $M$ to be able to represent the correct light reflected from a surface point $x$ , i.e., be $P$ -universal, it has to receive as arguments also $v, n$ .

Masked rendering. Consider the indicator function identifying whether a certain pixel is occupied by the rendered object

S (θ, τ) = sigmoid (- α min_{t \geq 0} f (c + t v; θ))

which approximates $S (θ, τ)$ as $α \to \infty$ .

Loss. The loss is given by

\begin{aligned} L (θ, γ, τ) & = L_{RGB} (θ, γ, τ) + ρ L_{mask} (θ, τ) + λ L_{E} (θ) \\ L_{RGB} (θ, γ, τ) & = \frac{1}{| P |} \sum_{p \in P^{in}} | I_{p} - L_{p} (θ, γ, τ) \\ L_{mask} (θ, τ) & = \frac{1}{α | P |} \sum_{α | P |} CE (O_{p}, S_{p, α} (θ, τ)) \\ L_{E} (θ) & = E_{x} (‖ \nabla_{x} f (x; θ) ‖ - 1)^{2} \end{aligned}

where $L_{E}$ is the Implicit Geometric Regularization (IGR) incorporating the Eikonal regularization [1].

Technical Details#

Notes#

References#

[1] A. Gropp, L. Yariv, N. Haim, M. Atzmon, Y. Lipman. Implicit geometric regularization for learning shape. In arXiv, 2020.