Panoptic scene graph generation, or PSG, is a new problem task aiming to generate a more comprehensive graph representation of an image or scene based on panoptic segmentation rather than bounding boxes. It can be used to understand images and generate sentences describing what's happening. This may be the most challenging task for an AI! Learn more in the video...
References
►Read the full article:
►Yang, J., Ang, Y.Z., Guo, Z., Zhou, K., Zhang, W. and Liu, Z., 2022.
Panoptic Scene Graph Generation. arXiv preprint arXiv:2207.11247.
►Code:
►Project page (PSG dataset):
►Try it: ,
►My Newsletter (A new AI application explained weekly to your emails!):
Video Transcript
0:00you can use ai to identify what's in an0:02image like finding out whether there's a0:04cat or not in this scene if there's one0:07you can use another ai to find where it0:10is in the image and you can find it very0:12precisely these tasks are called image0:15classification object detection and0:17finally instance segmentation then you0:20can build cool applications to extract0:23your cat from an image and put it into a0:25fun gift card or a meme but what if you0:27want an application that understands the0:29scene and image not only being able to0:32identify whether there's an object and0:34where it is but what's happening you0:36don't want to identify if there's a0:38customer or not in your shop but you0:40might want to identify if the customer0:42in question is stealing you whether0:44using such surveillance is ethically0:46correct or not is a whole other question0:49you also need to consider still suppose0:51we focus on finding out what's happening0:53in a scene or a particular image in that0:56case you'd want to use a task called0:58scene graph generation where objects are1:01directed using bounding boxes as shown1:04previously with object detection which1:06is then used to create a graph with each1:09object's relationship to each other1:11object it will basically try to1:13understand what's happening from all the1:15principal objects of the scene it works1:17quite well and finds out these main1:19characteristics of the image but there's1:21a big problem it relies on the bonding1:23box accuracies and completely disregards1:26the background which is often crucial in1:28understanding what's happening or at1:30least giving a more realistic summary1:33instead you might want to use this new1:35task called panoptic scene graph1:38generation or psg psg is a new problem1:42task aiming to generate a more1:43comprehensive graph representation of an1:46image or scene based on panoptic1:49segmentation rather than bonding boxes1:52something much more precise taking into1:54account all pixels of an image as we saw1:57and the creators of this task didn't1:58only invent it but they also created a2:01data set as well as a baseline model to2:03test your results against which is2:05really cool this task has a lot of2:07potential as understanding what is2:09happening in an image is incredibly2:11useful and complex for machines even2:14though humans do it automatically it2:16brings some sort of needed intelligence2:18to the machines making the difference2:20between being a cool funny app like2:23snapchat to a product you'd use to save2:25time or complete a need like2:27understanding when your cat wants to2:29play and using a robot to play with it2:31automatically so it isn't bored all the2:33time2:34understanding a scene is really cool but2:36how can a machine do that well you need2:39two things a data set and a powerful2:42model we know that we already have the2:44data sets since they built it for us now2:47the second thing how to learn from this2:50data set which means how to build this2:52ai model and what should it do there are2:55multiple ways to approach this problem2:58and i invite you to read their paper to3:00find out more but here's one way to do3:02it3:03before getting into it give me a few3:05seconds to be my own sponsor and talk3:07about our community since you are3:09watching this video i know you will love3:11it as it was basically created for you3:13of course we have the youtube community3:15which you should definitely join by3:17clicking the little subscribe button and3:19commenting below for instance i'd love3:21to know what you think about this task3:23and if it's interesting or not to the ai3:25community i also wanted to share our3:28discord community learn ai together it's3:31a place to connect with fellow ai3:33enthusiasts from any skill level find3:35people to learn with find people to work3:37with ask your questions or even find3:40interesting job offers we are organizing3:42a lot of very cool events and q as like3:44the one we are currently running with3:46the mine rl organizers from deepmind and3:49openai the link is in the description3:51below and i'd love to see you join an3:53exchange with us there3:55as we said the model needs to find the3:57class for each pixel of the image3:59meaning that it has to identify every4:01pixel of the image the first stage of4:04the model will be responsible for this4:06it will be a model called panoptic fpn4:09already trained to classify each pixel4:12such a model is already available online4:14and quite powerful it will take an image4:17and return what we call a mask with each4:19pixel matched to an existing object like4:22a ball human or grass in this case you4:25now have the segmentation and you know4:28what's in the image and where if you are4:30not familiar with how such a model works4:32i invite you to watch one of the videos4:34i made covering similar approaches like4:36this one the next step is to find out4:38what's happening with those objects4:41here you already know it's a man playing4:43soccer on the field but the machine4:45actually has no idea the only thing it4:48knows is that there is a man a ball and4:50a field with a lot of confidence but it4:53doesn't understand anything and cannot4:55connect the dots as we do with ease we4:58need a second model trained just to take5:00those objects and figure out why they5:03are in the same picture this is the5:05scene graph generation step where a5:07modal will learn how to match a5:09dictionary of words and concepts5:12covering multiple possible object5:13relations to objects in a scene using5:16the information extracted from the first5:19stage learning how to structure the5:21objects with each other object and voila5:25you end up with a clear graph that you5:27can use to build sentences covering5:29what's happening in your image you can5:31now use this approach in your next5:32application and give a few iq points to5:35your approach getting it closer to5:37something intelligent if you'd like to5:39learn more about this new task i5:41strongly invite you to read the paper5:43linked below thank you for watching5:45until the end and i will see you nextweek with another amazing paper