Exploring emerging technologies for visual storytelling

Freely navigate through a 3D environment in the browser

Exploring emerging technologies for visual storytelling
The aim of this experiment was to investigate the possibility of exploring an open space, captured with a drone and converted to a Gaussian splash scene. This environment should be able to be visible without additional plugins or downloads is all modern browsers. No additional hardware requirements or investments should be necessary for the photographer.

Buildingblocks

# Hard- & Software / Service Cost Level 1-5
1 DJI mini 3 pro Drone €1000 2
2 Luma Labs Free for now 2
3 jQuery Free 4
4 Three.js Free 5
Step 01: Taking the pictures
Take as many as possible images from all possible positions. This can be done with a drone, but equally well with an ordinary camera. Always take the photos in a horizontal position and make sure the exposure is balanced.
Step 02: Create the Neural radiance field

A Neural Radiance Field (NeRF) is a sophisticated deep learning technique that reconstructs three-dimensional (3D) representations of scenes from sparse two-dimensional (2D) images. It excels in generating photorealistic views from new viewpoints by learning the underlying geometry, lighting, and reflectance properties of the scene.

The NeRF algorithm represents a scene as a radiance field parameterized by a deep neural network (DNN). This network predicts volume density and view-dependent emitted radiance based on the spatial location (x, y, z) and the camera's viewing direction in Euler angles (θ, Φ). By sampling many points along camera rays, traditional volume rendering techniques can produce an image.

To train a NeRF model, images of the scene must be collected from different angles along with their corresponding camera poses. These images can be standard 2D photos captured using any camera, provided the capture method meets the requirements for Structure from Motion (SfM) to track camera positions and orientations.

The goal of NeRF is to produce a detailed and accurate 3D representation of the scene, often referred to as a point cloud. A point cloud is a set of data points in space, each representing the coordinates of a point on the surface of objects within the scene.

Step 03: Reformat to Gaussian Splat

This point cloud is then converted into Gaussian functions (Gaussian Splat) with initial covariance, colour and opacity. Instead of using a neural network for each point in the scene, the Gaussian functions are directly optimised by stochastic gradient descent to match the input images. This saves computational resources by eliminating empty spaces and enables fast realistic image rendering by projecting the Gaussian functions onto the screen and overlaying them to the desired image.

How to splat?

Currently, there are a number of options to calculate the Nerf and Gaussian Splat. You can do this on your own computer provided you have sufficient computing power (at least CUDA 7+ support) or use free -but experimental- services like Polycam or Luma.
The resulting file can be a .ply, .splat or .ksplat since there is no official standard extention at the moment. Converting is somehow possible at https://splat-converter.glitch.me/
https://github.com/graphdeco-inria/gaussian-splatting
Step 04: Implement on a webpage
<html>
<head>
<title>Demo</title>
<meta charset="utf-8" />
<meta name="viewport"
content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no" />
</head>
<body style="background: black; overflow: hidden; margin: 0;" data-splat-file="GS_file.splat">
<script type="importmap">
{
"imports": {
"three": "https://unpkg.com/three@0.157.0/build/three.module.js",
"three/addons/": "https://unpkg.com/three@0.157.0/examples/jsm/"
}
}
</script>
<script type="module" src="js/main.js"></script>
</body>
</html>
Thanks to the 3d open standard WebGL and the javascript library Three.js, we can view splat files in a browser.

https://github.com/akbartus/Gaussian-Splatting-WebViewers/tree/main

"The photographer is the Swiss Army knife of the visual content industry, versatile and indispensable, crafting moments into timeless stories."