Introduction to NeRF and 3D Gaussian Splatting - 한국어

Created

2024/07/21 00:54

Overview

지금까지 저희는 주로 2D 이미지 모델들에 대해 논의했습니다. 이제 새로운 차원으로 확장해보시죠!

최근 큰 주목을 받고 있는 기술들중 하나는 다양한 각도에서 촬영된 2D 이미지 입력을 기반으로 3D 이미지를 재구성하는 기술입니다. 해당 기술의 application 은 매우 무궁무진 하다고 생각합니다. 예를들면, 이 기술을 통해 온라인 쇼핑몰에서 더 현실적인 상품/옷 미리보기를 제공하거나(심지어 텍스처까지), 게임 scene 디자인 과정에 활용할 수 있습니다!

Input: 다양한 각도에서 촬영된 이미지

Result: 재구성된 3D 이미지

Texture Detail까지 볼 수 있습니다

YouTube패션 테크 스타트업 미타운 - 3D 의류 스캐닝 AI 시연 영상

Unreal Engine Gaussian Splatting

배경

이제 이 신기한 기술과 관련된 용어와 모델들에 대해 알아보겠습니다. 저희의 범위를 3D 공간으로 확장해보겠습니다. 한번 상상해보시죠! 3D 공간은 2D와 어떻게 다를까요? 3D 공간의 각 지점에서는 색상과 밀도가 다르게 나타납니다.

예를 들어, 위의 신발은 저희가 어떤 각도에서 보느냐에 따라 그림자나 빛이 다르게 나타날 것입니다. 이는 저희가 보고 있는 물체의 현재 스냅샷에 영향을 미칩니다. 특정 지점의 색상은 우리가 그 지점을 얼마나 멀리서 보는지와 방향에 따라 다를 것입니다.

이 모든 것을 모델로 어떻게 예측할 수 있을까요?

저희는 ML 모델은 이러한 관계를 학습하고 이를 일반화할 수 있어야 합니다.

NeRF

3D 컴퓨터 그래픽 모델 개요

NeRF (Neural Radiance Field - paper linked)

•

NeRF는 주로 밀도와 색상 표현 정보를 사용합니다. 다양한 빛의 광선을 학습합니다. 빛의 광선은 본질적으로 관찰자의 위치(거리), 방향, 그리고 그 광선 경로를 따라 있는 색상 밀도를 포착하는 단위입니다.

여기서 한 가지 주의할 점은: t는 시간을 나타내지 않습니다 (Integral 부분).

tf는 광선의 먼 끝점을 나타내고,

tn는 광선의 가까운 시작점을 나타냅니다.

r(t): t 시점의 광선 위의 점의 위치 벡터, 시작점과 방향을 포함합니다.

d: 방향 또는 광선

σ(r(t)): 해당 광선 점의 밀도 (그 점에서의 빛 흡수)

c(r(t),d): 방향 d에 대한 r(t) 지점의 색상

T(t) = exp(-∫σ(r(s))ds): 투과율, tn에서 t까지 누적된 빛의 밀도

저희는 빛의 local contribution과 global contribution을 포착하기 위해, 이 통합된 빛 밀도(투과율)와 전체 광선의 밀도 그리고 특정 지점의 볼륨 밀도가 모두 필요합니다.

C(r): 이제 위의 정보를 tn에서 tf까지 통합합니다. 이는 본질적으로 광선을 따라 색상 c 의 기여를 투과율과 해당 광선 지점의 밀도에 기반하여 통합하는 것입니다. 이제 저희는 이 모든 정보를 담을 수 있습니다: 위치(거리), 방향, 그리고 색상 밀도를 하나로 결합합니다.

NeRF의 한계

그러나 NeRF의 한계는 다음과 같습니다:

높은 연산처리 요구 / 높은 메모리 사용량

훈련 과정에서 학습해야 할 파라미터가 상당히 많습니다. 위의 방정식에서도 이를 확인할 수 있었습니다. 더 큰 scene으로 확장할수록 이러한 문제나 한계는 더욱 심해집니다.

Gaussian Splatting

Gaussian Splatting은 이러한 한계를 해결하기 위해 개발된 모델입니다. 기본 메커니즘은 distribution을 estimate한다는 점에서 이전 글에서 논의한 Diffusion과 유사합니다. Gaussian Splatting은 다음을 포함합니다:

3D Gaussian 분포

•

데이터 포인트가 3차원에서 어떻게 분포하는지를 설명하는 확률 분포 (평균, 중심)

저자들은 전에 있던 문제들을 해결해주는 다음과 같은 결과를 제시했습니다:

최신 기술의 품질 (MipNerf 360과 동등)

실시간 렌더링 (100fps 이상)

빠른 훈련 시간 (1시간 이하)

기억해야 할 다른 용어들

“The second component of our method is optimization of the properties of the 3D Gaussians – 3D position, opacity 𝛼, anisotropic covariance, and spherical harmonic (SH) coefficients”

“저희가 소개하는 방법의 두 번째 구성 요소는 3D Gaussian의 속성 최적화입니다 – 3D position, opacity 𝛼, anisotropic covariance, spherical harmonic (SH) coefficients”

-page 2

Isotropic:

uniform in all direction, same spread in all direction

모든 방향에서 균일하게 퍼짐, 모든 방향에서 동일한 분포

Image Reference

Anisotropic:

vary depending on direction (different variance along different axis)

방향에 따라 다르게 퍼짐 (축을 따라 다른 분산)

이 “anisotropic covariance”는 이미지가 잘 구성되도록 합니다.

	NeRF	3D GS
Representation	Neural Network	Gaussian Distribution
Rendering	Integrating Rays	Summing Gaussians
Parameter	Model Weights	Means, Covariance, weight of Gaussians
Efficiency	slow	faster
Unit/Samples	Sample of Rays	Gaussian Sampling

알고리즘 개요:

이제 알고리즘의 중요한 부분을 더 깊이 살펴보겠습니다.

SfM (Structure from Motion):

Feature detection, feature matching, camera pose estimation을 사용하여 다중 시점 2D 이미지 입력에서 3D 장면을 재구성합니다.

Image Reference: Math Works

여러 시점에서 촬영된 장면의 다수의 이미지를 제공하면, SfM은 각 이미지에서 대응되는 특징을 파악하고 매칭합니다. 이 정보를 사용하여 카메라 포즈를 파악합니다.

For Further Reading (CMSC426): https://cmsc426.github.io/sfm/

SfM points:

이것은 SfM에서 장면을 나타내는 초기 3D 포인트 집합입니다. 시작 기준으로 사용됩니다. 여기서 S (Covariance), C (Colors), A (Opacities)를 얻습니다.

Rasterize (Algorithm 2):

이제 Algorithm 1의 일부인 Algorithm 2의 이 부분을 더 자세히 살펴보겠습니다.

Algorithm 2: Rasterize(M, S, C, A, V)

•

Cull Gaussian:  해당 시점 V(view frustum)에서 관찰된 Gaussian만 유지

•

Frustum Culling: Gaussian이 시점 내에 있는지 확인

•

Screen Space Transformation: 3D 좌표(Gaussian 좌표)에서 2D 좌표로 투영

→ 새로운 대응 S’ / covariance 계산

J: Projective Transformation Metrics (Camera → Image)

W: Viewing Transformation (World → Camera)

Sigma: Input gaussian Covariance

위의 변환을 수행하면 다음과 같은 변환이 가능합니다:

World → Camera → Image

Transformation y = Ax가 주어졌을 때

위의 방정식은 그저 이 Transformation을 반영한 새로운 covariance Matrix를 계산하는 것입니다.

그러나,

“An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our parameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.”

"명백한 접근법은 공분산 행렬 Σ를 직접 최적화하여 복사장(field)을 나타내는 3D Gaussian을 얻는 것입니다. 그러나, 공분산 행렬은 양의 준정규 행렬(positive semi-definite)일 때에만 물리적 의미가 있습니다. 모든 파라미터의 최적화를 위해 우리는 경사 하강법을 사용하지만, 이는 그러한 유효한 행렬을 쉽게 생성하도록 제약하기 어렵고, 업데이트 단계와 경사도는 매우 쉽게 유효하지 않은 공분산 행렬을 만들 수 있습니다."

이러한 한계로 인해:

“The covariance matrix Σ of a 3D Gaussian is analogous to describing the configuration of an ellipsoid. Given a scaling matrix 𝑆 and rotation matrix 𝑅, we can find the corresponding Σ”:

"3D Gaussian의 공분산 행렬 Σ는 타원체의 구성을 describe하는 것과 유사합니다. 스케일링 행렬 𝑆와 회전 행렬 𝑅가 주어지면, 해당하는 Σ를 찾을 수 있습니다":

Thus, this will mean that:

Σ = SS^TJT

따라서, 이것은 이제 타일 또는 그리드 형식으로 생성되며, 이전 작업에서의 픽셀 스플래팅 대신 타일 기반 래스터화를 갖습니다. 이것이 저자가 다음과 같이 언급한 이유입니다

“We introduce a new approach that combines the best of both worlds: our 3D Gaussian representation allows optimization with state-of-the-art (SOTA) visual quality and competitive training times, while our while our Tile-based splatting solution ensures real-time rendering at SOTA quality for 1080p resolution on several previously published dataset”

"저희는 두 가지 세계의 장점을 결합한 새로운 접근 방식을 도입합니다: 3D Gaussian 표현은 최신 시각적 품질과 경쟁력 있는 학습 시간을 통해 최적화를 가능하게 하며, 타일 기반 스플래팅 솔루션은 여러 이전에 발표된 데이터셋에서 1080p 해상도로 실시간 렌더링을 보장합니다.”

These tiles are now sorted with Radix Sort

“Sorting. Our design is based on the assumption of a high load of small splats, and we optimize for this by sorting splats once for each frame using radix sort at the beginning”

"정렬. 우리의 설계는 작은 스플랫이 많이 로드된다는 가정에 기반을 두고 있으며, 이를 최적화하기 위해 각 프레임마다 처음에 radix sort를 사용하여 스플랫을 한 번 정렬합니다.”

그런 다음, 최종적으로 2D 래스터화된 이미지를 생성하기 위해 블렌딩됩니다.

Going back to (Algorithm 1):

이제 래스터화된 컴포넌트를 얻었으니, 방금 언급한 래스터화된 이미지를 기준 이미지와 비교합니다.

Loss

L1과 L(D-SSIM)을 결합합니다:

L1: 픽셀 유사성에 중점을 둡니다.

D-SSIM: 구조 유사성에 중점을 둡니다.

Refinement Iteration에서

Gaussian이 너무 투명한 경우 (opacity α < epsilon):

유지할 필요가 없으므로, RemoveGaussian()을 수행합니다.

필요한 영역에 더 많은 파라미터를 할당합니다.

CloneGaussian

누락된 Geometric Feature들을 보완하기 위해 주변을 늘려 더 많은 영역을 커버합니다.

SplitGaussian

하나의 Gaussian이 넓은 영역을 커버하고 세부 영역을 구성할 수 없는 경우를 대비합니다.

Summary / Summing up

3D 컴포넌트를 2D로 래스터화 → Compute the Loss / Differentiate → Gaussian 파라미터 업데이트

Code Snippet

공식 구현 코드인 train.py (link) 파일을 살펴보겠습니다

def training(dataset, opt, pipe, testing_iterations, saving_iterations, checkpoint_iterations, checkpoint, debug_from):
    first_iter = 0
    tb_writer = prepare_output_and_logger(dataset)
    gaussians = GaussianModel(dataset.sh_degree)
    scene = Scene(dataset, gaussians)
    gaussians.training_setup(opt)
    if checkpoint:
        (model_params, first_iter) = torch.load(checkpoint)
        gaussians.restore(model_params, opt)
...
JavaScript
복사

3D 컴포넌트를 2D로 래스터화

 render_pkg = render(viewpoint_cam, gaussians, pipe, bg)
 image, viewspace_point_tensor, visibility_filter, radii = render_pkg["render"], render_pkg["viewspace_points"], render_pkg["visibility_filter"], render_pkg["radii"]
JavaScript
복사

Compute the Loss / Differentiate

L1 손실 함수와 L(D-SSIM)을 모두 확인할 수 있습니다.

gt_image = viewpoint_cam.original_image.cuda()
Ll1 = l1_loss(image, gt_image)
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
loss.backward()
JavaScript
복사

Update Gaussian Parameters

if iteration < opt.iterations:
	gaussians.optimizer.step()
	gaussians.optimizer.zero_grad(set_to_none = True)
JavaScript
복사

Refinement Iteration

Pruning이 opacity를 기준으로 수행되는 것을 확인할 수 있습니다.

# Densification
if iteration < opt.densify_until_iter:
	# Keep track of max radii in image-space for pruning
  gaussians.max_radii2D[visibility_filter] = torch.max(gaussians.max_radii2D[visibility_filter], radii[visibility_filter])
  gaussians.add_densification_stats(viewspace_point_tensor, visibility_filter)

if iteration > opt.densify_from_iter and iteration % opt.densification_interval == 0:
   size_threshold = 20 if iteration > opt.opacity_reset_interval else None
   gaussians.densify_and_prune(opt.densify_grad_threshold, 0.005, scene.cameras_extent, size_threshold)
                
if iteration % opt.opacity_reset_interval == 0 or (dataset.white_background and iteration == opt.densify_from_iter):
   gaussians.reset_opacity()
JavaScript
복사

너무 멋진 기술들인것 같습니다! 해당 포스트를 통해 조금은 도움이 되었으면 좋겠습니다!

Written By: Terry (Taehan) Kim,

Reviewed by Metown Corp. (Sangbin Jeon, PhD Researcher)