Neural Rendering is the ability to generate a photorealistic model in space just like this one, from pictures of the object, person, or scene of interest. In this case, you’d have a handful of pictures of this sculpture and ask the machine to understand what the object in these pictures should look like in space. You are basically asking a machine to understand physics and shapes out of images. This is quite easy for us since we only know the real world and depths, but it’s a whole other challenge for a machine that only sees pixels.It’s great that the generated model looks accurate with realistic shapes, but what about how it blends in the new scene?And what if the lighting conditions vary in the pictures taken and the generated model looks different depending on the angle you look at it?This would automatically seem weird and unrealistic to us. These are the challenges Snapchat and the University of Southern California attacked in this new research.
Watch to learn more:
References:
►Read the full article:
►Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P. and
Tulyakov, S., 2022. NeROIC: Neural Rendering of Objects from Online
Image Collections.
►Project link with great video demo:
►Code:
►My Newsletter (A new AI application explained weekly to your emails!):
Video Transcript
00:00neural rendering neural rendering is the00:03ability to generate a photorealistic00:05model in space just like this one from00:07pictures of the object person or scene00:10of interest in this case you'll have a00:13handful of pictures of this sculpture00:15and ask the machine to understand what00:17the object in these pictures should look00:19like in space you are basically asking a00:21machine to understand physics and shapes00:23out of images this is quite easy for us00:26since we only know the real world and00:28depth but it's a whole other challenge00:30for a machine that only sees pixels then00:33you might ask why do we even want to do00:35this i'd say that the answer is pretty00:37obvious to me there are many cool00:39applications from having an app that00:41could simply take a few pictures of an00:43object and perfectly synthesize the 3d00:45model to put it in images 3d scenes or00:48even video games this is really00:50promising but for these models to be00:52realistic lighting is another challenge00:54that comes with these applications it's00:56great that the generated model looks00:58accurate with realistic shapes but what01:00about how it blends in the new scene and01:02what if the lighting conditions vary in01:05the pictures taken and the generated01:07model looks different depending on the01:09angle you look at it this will01:11automatically seem weird and unrealistic01:13to us these are the challenges snapchat01:15and the university of southern01:17california attacked in this new research01:19but first a word from this episode01:21sponsor weights and biases weight and01:24biases allows you to easily keep track01:26of the input hyperparameters output01:28matrix and any insights that you and01:30your team have with only a handful of01:32lines added to your code one aspect01:34that's great for speeding up your01:36experiments is sweeps sweeps automate01:38hyperparameter optimization and explore01:40the space of all possible models without01:42any effort on your end it will simply01:45run all tests tweaking the parameters01:47and reporting the effect of all01:48parameters in clear graphs and reports01:51you can share with your team to explain01:53your final results easily i love to do01:55my best trying to make research look01:57simple and clear for you all and this is01:59a big reason why i love weights and02:01biases they are doing the same thing02:04with their platform making your research02:06look simple and reproducible i'd love02:09for you to check them out with the first02:10link below because they are helping me02:12continue making these videos and growing02:14this channel02:16now let's see how these researchers02:17tackle the lighting and realism02:19challenges that come with creating a02:21virtual object out of images the02:23technique builds upon neural radiance02:25fields which are largely used for02:27reconstruction with many models such as02:30nerf that we already covered on the02:32channel typically neural regions fields02:34need images taken in the same ideal02:37conditions but this is not what we want02:39here their approach starts with nerf and02:42as i said i already covered it on my02:44channel so i won't cover it again but02:46feel free to take a break and watch the02:47video to better understand how nerf02:50works in short nerf is a neural network02:52that is trained to infer the color02:54opacity and radiance of each pixel using02:57the images as inputs and guess the03:00missing pixels for the small parts of03:02the objects that aren't present in the03:04images but this approach doesn't work03:06for large missing parts or different03:08lighting conditions as it can only03:10interpolate from the input images here03:13we need something more to extrapolate03:16from it and make assumptions on what03:18should appear here and there or how03:20these pixels should look like under this03:22lighting or that one many approaches03:25build upon nerf to fix this but always03:27require more inputs from the user which03:30is not what we want and is hard to have03:32in many cases especially when we want to03:34build a good data set to train our model03:37on in short these models do not really03:39understand the object nor the03:41environment the object is in so we03:43always come back to the lighting problem03:46here the goal is to use this03:47architecture in online images or in03:50other words images with varying lighting03:52cameras environments and poses something03:55nerf can hardly do with realism the only03:59few things they will need other than the04:01images of the object themselves are a04:03rough foreground segmentation and an04:06estimation of the camera parameters04:08which can both be obtained with other04:10models available the foreground04:12estimation is basically just a mask that04:14tells you where the object of interest04:17is in your image like this04:19what they did differently is that they04:21separate the rendering of the object04:23from the environment lighting in the04:25input images they focus on two things04:28which are done in two stages first is04:30the object's shape or its geometry which04:33is the part that is most similar to nerf04:35here called the geometry network it will04:38take the input images segmentation mask04:40and camera parameters estimation we04:42discussed build a radiance field and04:44find the first guess of the density and04:46colors of each pixel as in nerf but04:49adapt with varying lighting conditions04:51in the input images this difference04:53relies on the two branches you see here04:55splitting the static content from the04:57varying parameters like camera or04:59shadows this will allow us to teach our05:02model how to correctly isolate the05:04static content from other unwanted05:06parameters like lighting but we are not05:08finished here we will estimate the05:10surface normals from this learned05:13density field which will be our textures05:16or in other words it will take the05:18results we just produced and find how05:20our object will react to light it will05:23find unbiased material properties of the05:25object at this stage or at least an05:28estimation of it using a 3d convolution05:31with a sobol kernel it's basically a05:33filter that we apply in three dimensions05:35to find all edges and how sharp they are05:38which can look like this on a05:40two-dimensional image and this on a05:42three-dimensional rendering giving us05:44essential information about the05:46different textures and shapes of the05:48object05:49the next stage is where they will fix05:51the long geometry and optimize the05:53normals we just produced using the05:55rendering network which is very similar05:57to the first geometry network here again05:59there are two branches one for the06:01material and another for the lighting06:03they will use spherical harmonics to06:06represent the lighting model and06:08optimize its coefficients during06:09training as they explain in the paper06:11with more information if you are06:13interested spherical harmonics are used06:15here to represent a group of basis06:17functions defined on the sphere surface06:20we can find on wikipedia that each06:22function defined on the surface of a06:24sphere can be written as a sum of these06:27spherical harmonics this technique is06:29often used for calculating the lighting06:31on 3d models it produces highly06:34realistic shading and shadowing with06:36comparatively little overhead in short06:39it will simply reduce the number of06:40parameters to estimate but keep the same06:42amount of information so instead of06:44learning how to render the appropriate06:46lighting for the whole object from06:48scratch the model will instead be06:50learning the correct coefficients to use06:52in the spherical harmonics that will06:54estimate the lighting coming out of the06:56surface of each pixel simplifying the06:58problem to a few parameters the other07:00branch will be trained to improve the07:02surface normals of the object following07:04the same trick using the standard foam07:06brdf which will model the object07:09material properties based on a few07:11parameters to find finally the outputs07:13of the two branches so the final07:15rendering and lighting will be merged to07:18find the final color of each pixel this07:20disentanglement of light and materials07:23is why they are able to apply any07:25lighting to the object and have it react07:27realistically remember this is done07:30using only a couple of images from the07:32internet and could all have different07:34lighting conditions this is amazing and07:37voila this is how this new paper from07:40quang and collaborators at snapchat07:42created neroic a neural rendering model07:45for objects from online images i hope07:47you enjoyed this short overview of the07:49paper all the references are linked07:51below as well as a link to the official07:54project and their code let me know what07:56you think of the explanation the07:57technique and how do you use it in the08:00real world if you are still here and08:02enjoy the video please don't forget to08:04leave a like and subscribe to the08:06channel it both means a lot and helps a08:08lot thank you for watching