Have you ever imagined being able to take a picture and just magically dive into it as if it would be a door to another world?Well, whether you thought about this or not, some people did, and thanks to them, it is now possible with AI! This is just one step away from teleportation and being able to be there physically. Maybe one day AI will help with that and fix an actual problem too! I’m just kidding, this is really cool, and I’m glad some people are working on it.This is InfiniteNature… Zero! It is called this way because it is a follow-up on a paper I previously covered called InfiniteNature. What’s the difference? Quality! Learn more in the video...
References
►Read the full article:
►Li, Z., Wang, Q., Snavely, N. and Kanazawa, A., 2022.
InfiniteNature-Zero: Learning Perpetual View Generation of Natural
Scenes from Single Images. In European Conference on Computer Vision
(pp. 515-534). Springer, Cham,
►Code and project website:
►My Newsletter (A new AI application explained weekly to your emails!):
Video Transcript
0:00have you ever imagined being able to0:02take a picture and just magically dive0:04into it as if it will be a door to0:06another world well whether you thought0:08about this or not some people did and0:11thanks to them it's now possible with AI0:13this is just one step away from0:16teleportation and being able to be there0:18physically maybe one day AI will help0:21with that and fix an actual problem too0:23I'm just kidding this is really cool and0:25I'm glad some people are working on it0:27this is infinite nature zero it's called0:31this way because it's a follow-up on a0:33paper I previously covered called0:35infinite nature what's the difference0:37quality just look at that it's so much0:40better in only one paper it's incredible0:43you can actually feel like you are0:45diving into the picture and it only0:47requires one input picture how cool is0:50that the only thing even cooler is how0:53it works let's dive into it but first0:56allow me 10 seconds of your time for a0:58sponsor of this video myself yes only 101:01seconds I don't think I deserve more1:02compared to the amazing companies that1:04usually sponsor my work if you like the1:06videos first I think you should1:08subscribe to the channel but I also1:10think you will love my two newsletters1:12where I share daily research papers and1:15news and the weekly one where I share1:17these videos and very interesting1:19discussions related to these papers and1:21AI ethics you should probably follow me1:24on Twitter as well at what's AI if you'd1:26like to stay up to date with the news1:28and papers in the field tons are coming1:30out with the cvpr deadlines that just1:32passed and you don't want to miss out on1:34those so how does infinite nature zero1:37work it all starts with a single image1:40you send as input yes a single image it1:43doesn't require a video or multiple1:44views or anything else this is different1:47from their previous paper that I also1:49covered where they needed videos to help1:51the model understand natural scenes1:53during training which is also why they1:55call this model infinite nature zero1:58because it requires zero videos here2:01their work is divided into three methods2:03used during training in order to get2:05those results to start the model2:07randomly samples two virtual camera2:10trajectories which will tell you where2:12you are going in the image why too2:14because the force is necessary to2:16generate a new view telling you where to2:19fly into the image to generate a second2:21image this is the actual trajectory you2:24will be taking the second virtual2:25trajectory is used during training to2:28dive and return to the original image to2:31teach the model to learn geometry aware2:33view refinement during view generation2:36in a self-supervised way as we teach it2:39to get back to an image we already have2:42in our training data set they refer to2:44this approach as a cyclic virtual camera2:46trajectory as the starting and ending2:48views are the same our input image they2:51do that by going to a virtual or fake2:54sample Viewpoint and returning to the2:56original view afterward just to teach2:58the Reconstruction part to the model the3:01viewpoints are sampled using an3:03algorithm called the autopilot algorithm3:05to find the sky and not Skydive into3:08rocks or the ground as nobody would like3:10to do that then during training we use a3:13gun-like approach using a discriminator3:15to measure how much the new view3:17generated looks like a real image3:19represented with L adversarial or ladv3:23so yes guns aren't dead yet this is a3:26very cool application of them for3:28guiding the training when you don't have3:30any ground roof for example when you3:32don't have infinite images in this case3:34basically they use another model a3:37discriminator trained on our training3:39data set that can see if an image seems3:42to be part of it or Not So based on its3:44answer you can improve the generation to3:46make it look like an image from our data3:49set which supposedly looks realistic we3:52also measure the difference between our3:53regenerated initial image and the3:56original one to help the model3:57iteratively get better at reconstruct3:59acting it represented by L Rick here and4:03we simply repeat this process multiple4:05times to generate our novel frames and4:07create these kinds of videos there's one4:10last thing to tweak before getting those4:12amazing results they saw that with their4:14approach the sky due to its infinite4:17nature compared to the ground changes4:19way too quickly to fix that they use4:21another segmentation model to find the4:24sky automatically in the generated4:26images and fix it using an intelligent4:28blending system between the generated4:31sky and the sky from our initial image4:33so that it doesn't change too quickly4:35and unrealistically after training with4:38this two-step process and scale4:40refinement infinite nature 0 allows you4:42to have stable long-range trajectories4:44for natural scenes as well as accurately4:47generate Noble views that are4:49geometrically coherent and voila this is4:52how you can take a picture and dive into4:54it as if you were a bird I invite you to4:56read their paper for more details on4:58their method and in limitation5:00especially regarding how they achieve to5:02train their model in such a clever way5:05as I omitted some technical details5:07making this possible for Simplicity by5:09the way the code is available and linked5:11below if you'd like to try it let me5:13know if you do and send me the results5:15I'd love to see them thank you for5:17watching and I hope you've enjoyed this5:19video I will see you next week with5:21another amazing paper