1 minute read

3D Scene Reconstruction System for given End-User Videos.

“Do you have any moment to remember forever? Just take a video. Our AI-powered solution brings your moment into a virtual 3D world permanently.”

Thanks to an experience participating in the NGR-CO3D challenge, we started developing an automatic AI-driven 3D scene reconstruction system for end-users. Here we describe the resulting prototype.

Inputs

Just take a video, and then upload it to us something like that:

Outputs

Our solution automatically processes end-users videos, and then provides an experience where the end-users browse their scenes in 3D (click below):

Order in novel-view RGB, Depth, and Surface-normal

One’s favorite Car (make sure the full screen mode) test5 Clothes test0 Object test7

After finishing train our model, we can extract a 3D mesh via marching-cube algorithm (click below):

test5_mesh chair_mesh mic_mesh (To render these extracted meshes, we used Blender’s EEVEE rendering engine.)

System Building Blocks

Our system consists of major three stages i.e. 1) Automatic Dataset Building, 2) Training, and 3) Eval.

  • Automatic Dataset Building
    – Our solution splits an input video into N number of frames and then executes an off-the-shelf Structure-from-Motion (SfM) package (COLMAP in our case) to estimate camera-to-world pose parameters, i.e., camera intrinsic (focal length, principle point, and distortion) and extrinsic (camera rotation and translation).
    – Miscellaneous information is also acquired by pre-trained vision networks, e.g., semantic segmentation mask, face parsing mask, etc.
  • Train
    – Our Implicit Neural Network is trained for the train datasets built by the previous stage.
  • Eval
    – Given a trained implicit renderer, ray-sampler, and train dataset, we compose a virtual camera trajectory for novel-view synthesis off-line.

Keywords:

Neural Rendering, Neural Radiance Field (NeRF), Visual Effect Studio, 3D Reconstruction

Updated: