paint-brush
DreamFusion: An AI that Generates 3D Models from Text by@whatsai
21,988 reads
21,988 reads

DreamFusion: An AI that Generates 3D Models from Text

by Louis BouchardOctober 16th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

DreamFusion is a new Google Research model that can understand a sentence enough to generate a 3D model of it. The results aren’t perfect yet, but the progress we’ve made in the field since this past year is just incredible. We can't really make it much cooler but what’s even more fascinating is how it works. Let's dive into it... here's Dream Fusion a new computer vision model that understands a sentence enough to generate 3D models.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - DreamFusion: An AI that Generates 3D Models from Text
Louis Bouchard HackerNoon profile picture
We’ve seen models before that were able to take a sentence and .We've also seen other  by learning specific concepts like an object or particular style.Last week, Meta published the  that I covered, which allows you to generate a short video also from a text sentence. The results aren’t perfect yet, but the progress we’ve made in the field since this past year is just incredible.This week we take another step forward.Here’s DreamFusion, a new Google Research model that can understand a sentence enough to generate a 3D model of it.You can see this as a  or  but in 3D.How cool is that?! We can’t really make it much cooler.But what’s even more fascinating is how it works. Let’s dive into it...

References

►Read the full article: 
►Poole, B., Jain, A., Barron, J.T. and Mildenhall, B., 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv preprint arXiv:2209.14988.
►Project website: 
►My Newsletter (A new AI application explained weekly to your emails!): 

Video Transcript

0:02we've seen models able to take a0:04sentence and generate images then other0:07approaches to manipulate the generated0:09images by learning specific Concepts0:11like an object or a particular style0:13last week meta published the make a0:16video model that I covered which allows0:18you to generate a short video also from0:20a text sentence the results aren't0:22perfect yet but the progress we've made0:24in the field since last year is just0:26incredible this week we make another0:28step forward here's dream Fusion a new0:32Google research model that can0:34understand a sentence enough to generate0:36a 3D model out of it you can see this as0:39a dally or stable diffusion but in 3D0:41how cool is that we can't make it much0:44cooler but what's even more fascinating0:46is how it works let's dive into it but0:49first give me a few seconds to talk0:51about a related subject computer vision0:53you'll want to hear that if you are in0:55this field as well for this video I'm0:57partnering with encord the online1:00learning platform for computer vision1:01data is one of the most important parts1:04of creating Innovative computer vision1:06model that's why the encode platform has1:09been built from the ground up to make1:10the creation of training data and1:12testing of machine learning models1:14quicker than it's ever been encord does1:17this in two ways first it makes it1:19easier to manage annotate and evaluate1:22training data through a range of1:24collaborative annotation tools and1:25automation features secondly encod1:28offers access to its QA workflows apis1:31and SDK so you can create your own1:33Active Learning pipelines speeding up1:35model development and by using encode1:38you don't need to waste time building1:39your own annotation tools letting you1:41focus on getting the right data into1:44your models if that sounds interesting1:46please click the first link below to get1:48a free 28-day trial of encode exclusive1:51to our community1:54if you've been following my work dream1:56Fusion is quite simple it basically use1:59two models I already covered Nerfs and2:02one of the text to image models in their2:04case it's the Imogen model but and you2:07will do like stable diffusion or Dolly2:09as you know if you've been a good2:11student and watched the previous videos2:12Nerfs are a kind of model used to render2:153D scenes by generating neural Radiance2:18field out of one or more images of an2:21object but then how can you generate a2:233D render from text if the Nerf model2:26only works with images well we use2:29imagen the other AI to generate image2:31variations from the one it takes and why2:34do we do that instead of directly2:36generating 3D models from text because2:38it will require huge data sets of 3D2:41data along with their Associated2:43captions for our model to be trained on2:46which will be very difficult to have2:48instead we use a pre-trained text to2:50image model with much less complex data2:53together and we adapt it to 3D so it2:56doesn't require any 3D data to be2:57trained on only a pre-existing AI for3:00generating images it's really cool how3:03we can reuse powerful Technologies for3:05new tasks like this when interpreting3:07the problem differently so if we start3:09from the beginning we have a Nerf model3:12as I explained in previous videos this3:14type of model takes images to predict3:17the pixels in each novel view creating a3:203D model by learning from image pairs of3:22the same object with different3:24viewpoints in our case we do not start3:26with images directly we start with the3:28text and Sample a random view3:30orientation we want to generate an image3:33for basically we are trying to create a3:353D model by generating images of all3:38possible angles a camera could cover3:40looking around the object and guessing3:42the pixels colors densities light3:45Reflections Etc everything needed to3:48make it look realistic thus we start3:50with a caption and add a small tweak to3:52it depending on the random camera3:54viewpoint we want to generate for3:56example we may want to generate a front3:58view so we would append front view to4:01the caption on the other side we use the4:03same angle and camera parameters for4:05initial not trained Nerf model to4:09predict the first rendering then we4:11generate an image version Guided by our4:13caption and initial rendering with added4:17noise using imagine our pre-trained text4:20to image model which I further explained4:22in my image and video if you are curious4:24to see how it does that so our image and4:26model will be guided by the text input4:28as well as the current rendering of the4:30object with added noise here we add4:33noise because this is what the image and4:36module can take as input it needs to be4:38part of a noise distribution it4:40understands we use the model to generate4:43a higher quality image add the image4:45used to generate it and remove the Noise4:48We manually added to use this result to4:51guide and improve our Nerf model for the4:54next step we do all that to better4:55understand where in the image the Nerf4:57model should focus its attention to4:59produce better results for the next step5:01and we repeat that until the 3D model is5:05satisfying enough you can then export5:07this model to mesh and use it in a scene5:10of your choice and before some of you5:12ask no you don't have to retrain the5:15image generator model as they say so5:17well in the paper it just acts as a5:19frozen critic that predicts image space5:21edits and voira this is how dream Fusion5:25generates 3D rendering from text inputs5:28if you'd like to have a deeper5:30understanding of the approach have a5:32look at my videos covering nerves and5:34Imogen I also invite you to read their5:36paper for more details on this specific5:39method thank you for watching the whole5:41video and I will see you next week with5:44another amazing paper
바카라사이트 바카라사이트 온라인바카라