paint-brush
An Intro to eDiffi: NVIDIA's New SOTA Image Synthesis Model by@whatsai
3,193 reads
3,193 reads

An Intro to eDiffi: NVIDIA's New SOTA Image Synthesis Model

by Louis BouchardNovember 5th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

eDiffi, NVIDIA's most recent model, generates better-looking and more accurate images than all previous approaches like DALLE 2 or Stable Diffusion. eDiffi better understands the text you send and is more customizable, adding a feature we saw in a previous paper from NVIDIA: the painter tool. Learn more in the video...
featured image - An Intro to eDiffi: NVIDIA's New SOTA Image Synthesis Model
Louis Bouchard HackerNoon profile picture
eDiffi, NVIDIA's most recent model, generates better-looking and more accurate images than all previous approaches like DALLE 2 or Stable Diffusion. eDiffi better understands the text you send and is more customizable, adding a feature we saw in a previous paper from NVIDIA: the painter tool. Learn more in the video...

References

►Read the full article:
► Balaji, Y. et al., 2022, eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers,
►Project page:
►My Newsletter (A new AI application explained weekly to your emails!):

Video Transcript

0:06the new state-of-the-art approach for0:08image synthesis it generates better0:10looking and more accurate images than0:13all previous approaches like Delhi 2 or0:15stable diffusion either if he better0:17understands the text you send and is0:19more customizable adding a new feature0:21we saw in a previous paper from Nvidia0:23the painter tool as they see you can0:26paint with words in short this means you0:29can enter a few subjects and paint in0:32the image what should appear here and0:34there allowing you to create much more0:36customized images compared to a random0:39generation following a prompt this is0:41the next level enabling you to pretty0:43much get the exact image you have in0:45mind by simply drawing a horrible quick0:47sketch something even I can do as I0:50mentioned the results are not only Sota0:52and better looking than stable diffusion0:55but they are also way more controllable0:57of course it's a different use case as0:59it needs a bit more work and a clearer1:02ID in mind for creating such a draft but1:04it's definitely super very exciting and1:06interesting it's also why I wanted to1:08cover it on my channel since it's not1:11merely a better model but also a1:13different approach with much more1:15control over the output the tool isn't1:17available yet unfortunately but I sure1:19hope it will be soon by the way you1:22should definitely subscribe to the1:23channel and follow me on Twitter at what1:25say hi if you like this kind of video1:27and would like to have access to easily1:30digestible news on this heavily1:32complicated field another win which they1:34allow you to have more control in this1:37new model is by using the same feature1:39we saw but differently indeed the model1:42generates images Guided by a sentence1:44but it can also be influenced using a1:47quick sketch so it basically takes an1:49image and a text as inputs this means1:52you can do other stuff as it understands1:54images here they leverage this1:56capability by developing a style1:58transfer approach where you can2:00influence the style of the image2:02generation process giving an image with2:04a particular style well along with your2:06text input this is super cool and just2:09look at the results they speak for2:11themselves it's incredible beating both2:14Sota style transfer models and image2:16synthesis models with a single approach2:18now the question is how could Nvidia2:22develop a model that creates better2:23looking images enable more control over2:26both the style and the image structure2:29as well as better understanding and2:31representing what you actually want in2:34your text well they change the typical2:36diffusion architecture in two ways first2:39they encode the text using two different2:41approaches that I already covered on the2:43channel which we refer to as clip and T52:46encoders this means they will use2:48pre-trained models to take text and2:50create various embeddings focusing on2:52different features as they are trained2:55and behaved differently and meanings are2:57just representations maximizing what the3:00sentence actually means for the3:01algorithm or the machine to understand3:04it regarding the input image they just3:06use the clip embeddings as well3:08basically encoding the image so that the3:11model can understand it which you can3:13learn more about in my other videos3:14covering generative models as they are3:16pretty much all built on clip this is3:19what allows them to have more control3:21over the output as well as processed3:23text and images rather than only text3:25the second modification is using a3:28Cascade of diffusion models instead of3:31reusing the same iteratively as we3:33usually do with diffusion based models3:35here the use models trained for the3:38specific part of the generative process3:39meaning that each model does not have to3:42be as general as the regular diffusion3:44denoiser since each model has to focus3:46on a specific part of the process it can3:49be much better at it they use this3:51approach because they observed that the3:52denoising models seemed to use the text3:55embeddings a lot more to orient its3:57generation towards the beginning of the3:59process and then use it less and less to4:02focus on output quality and Fidelity the4:05this naturally brings the hypothesis4:07that reusing the same denoising model4:09throughout the whole process might not4:11be the best ID since it automatically4:13focuses on different tasks and we know4:15that a generalist is far from the expert4:18level at all tasks why not use a few4:20experts instead of one generalist to get4:23much better results so this is what they4:25did and why they call them denoising4:28experts and the main reason for this4:30improves performance in quality and4:32faithfulness the rest of the4:34architecture is pretty similar to other4:36approaches of scaling the final results4:38with other models to get a high4:40definition final image the image and4:43video synthesis fields are just getting4:45crazy nowadays and we are seeing4:47impressive results coming out every week4:49I am super excited for the next releases4:51and I love to see different approaches4:53with both innovative ways of tackling4:55the problem and also going for different4:57use cases as a great person once said5:01what a time to be alive I hope you like5:04this quick overview of the approach a5:06bit more high level than what I usually5:08do as it takes most Parts I already5:10covered in numerous videos and changed5:12them to act differently I invite you to5:15watch my stable diffusion video to learn5:17a bit more about the diffusion approach5:19itself and read the nvidia's paper to5:21learn more about this specific approach5:23and its implementation I will see you5:26next week with another amazing paper5:32foreign5:36[Music]
바카라사이트 바카라사이트 온라인바카라