1,164 reads

70-Page Report on the COCO Dataset and Object Detection [Part 2]

by Shreya AminJune 15th, 2022

Too Long; Didn't Read

We (ReasoNets) are building a dataset-first marketplace focusing on the end-to-end machine learning pipeline. Buyers will be able to find and use the datasets and assets they need, and sellers will be able to earn money while building those datasets and assets. Our goal: incentivize efficient and effective data usage, processing, and actions. Sign up for updates! Take a look at the report to quickly find common data resources and/or assets for dataset=COCO, task=object detection. We're open to suggestions, questions, and criticism - let's start a conversation.

People Mentioned

Companies Mentioned

featured image - 70-Page Report on the COCO Dataset and Object Detection [Part 2]

We’re building a data-first marketplace, one where data and assets can be shared and traded. The marketplace will contain all that this contains (and much more for a lot more datasets). The was created to help readers quickly find common resources and/or assets for a given dataset and a specific task, in this case dataset=COCO, task=object detection. I’m open to suggestions, questions, and criticism. Please email me or message me to start a conversation. I have broken up the report into the following blogs:

Part 1: COCO Summary Card. Each link will take you to the longer report where you can learn more. The next 3 parts represent a specific section in the report
Part 2 (this one): About COCO and examples and tutorials (companies / platforms / articles / more), including tools and platforms used to work with COCO (or object detection tasks): FiftyOne, DataTorch, Know Your Data (KYD), OpenCV, OpenVINO, CVAT, Roboflow, SuperAnnotate, OpenMMLab, Coral, Amazon, Facebook, Google, Microsoft, NVIDIA, Weights and Biases, Other (PyImageSearch, Immersive Limit, Tensorflow, Viso.ai)
Part 3: Process - This part is about the tools and platforms that can be used for different phases of data preparate or data processing involved in vision, object detection, and specifically COCO-related tasks. It will also discuss synthetic data and data quality.
Part 4: Models - This part is about a quick introduction to some pre-trained models and some corresponding readings.

If you have feedback please review this link () and email me at [email protected]. Looking forward to starting a conversation.

About

Who: Microsoft

Year released: The first version of MS COCO dataset was released in 2014.

License: Creative Commons Attribution 4.0 License.

Links

Website:
Github:
Paper:
API: : This package provides Matlab, Python, and Lua APIs that assists in loading, parsing, and visualizing the annotations in COCO. The Matlab and Python APIs are complete, the Lua API provides only basic functionality.

Description

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features

Object segmentation
Recognition in context
Superpixel stuff segmentation
330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image
250,000 people with keypoints

List of the COCO Object Classes: The COCO dataset classes include the following pre-trained 80 objects. Click to see the representation of these objects in the dataset.

The first version of MS COCO dataset was released in 2014. It contains 164,000 images split into training (83,000), validation (41,000) and test (41,000) sets. In 2015 an additional test set of 81,000 images was released, including all the previous test images and 40,000 new images. Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.

Structure and format

The “COCO format” is the following JSON structure, which also includes labels and metadata:

Info: Provides a high-level description and versioning information about your dataset.
Licenses: Provides a list of image licenses with unique IDs to be specified by your images. It specifies the copyright to use the image.
Images: Provides a list of images and relevant metadata.
Categories: Provides a list of classification categories and supercategories of objects that are present in an image, each with a unique ID. (Note if you want to use a model pretrained on COCO out of the box, then you’d need to follow the COCO classes/categories).
Annotations: Provides annotations each with a unique ID and the image ID it relates to. This contains the metadata about the categories related to an object, such as the location, size, and object category.

Tasks

This dataset is used to set benchmarks for the following tasks: object detection, panoptic semantic segmentation, keypoint detection, dense pose estimation.

Object Detection: Objects are annotated with a bounding box and class label

Panoptic Semantic Segmentation: The boundary of objects are labeled with a mask and object classes are labeled with a class label

Keypoint Detection: This task involves simultaneously detecting people and localizing their keypoints.

DensePose: Involves mapping all human pixels of an RGB image to the 3D surface of the human body.

In this document, we will mainly focus on object detection. Please read and for quick tutorials on object detection (more detailed tutorials are available throughout the document.)

Evaluation metrics

Average Precision (AP)

The following 12 metrics are used for characterizing the performance of an object detector on COCO:

Mean Average Precision (MAP) metric

Here is a quick (but very good) for both object detection and COCO.

Examples and tutorials (companies / platforms / articles / more)

[More to be added later]

About

COCO dataset can be found here: Datasets can be found here: Models can be found here: :

FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. You can visualize complex labels, evaluate your models, explore scenarios of interest, identify failure modes, find annotation mistakes, and much more. It is tightly integrated with CVAT for annotation and label refinement.

The COCO team has partnered with the open-source tool FiftyOne to make it easier to download, visualize, and evaluate the COCO dataset. It facilitates visualization and access to COCO data resources and serves as an evaluation tool for model analysis on COCO. Here’s the .

The FiftyOne tool has three components: the Python library, the App, and the Brain.

FiftyOne : FiftyOne’s core library provides “a structured yet dynamic representation to explore your datasets”. It allows you to efficiently query and manipulate your dataset by adding custom tags, model predictions and more.
FiftyOne : The FiftyOne App is a graphical user interface that makes it easy to explore and rapidly gain intuition into your datasets. It allows you to visualize labels like bounding boxes and segmentations overlaid on the samples; sort, query and slice the dataset into any subset of interest; and more.
FiftyOne : The FiftyOne Brain is “a library of machine learning-powered capabilities that provide insights into your datasets and recommend ways to modify your datasets that will lead to measurably better performance of your models.” This is a closed-source solution.

Tutorials

: This notebook provides a brief walkthrough of FiftyOne, highlighting features that help build datasets and computer vision models.
: This post introduces to visualize and facilitate access to dataset resources and . You can, a) download specific subsets of COCO, b) visualize the data and labels, c) evaluate your models on COCO easily and in few lines of code. [Detailed breakdown of tutorials in the ]

DataTorch

About

“Easily collaborate on custom computer vision datasets.”

DataTorch has an open source () collaborative data annotation tool where you can plug in any cloud storage, annotate files with your team, and export in COCO and other formats. You can also work online on the . DataTorch is a developer tool for building computer vision models. DataTorch revolves around the management of projects, which encapsulate of all of the data, people, and work related to a particular model.

Tutorials

: Get started annotating a dataset and exporting it in COCO format right away.
: “Analyzing visual environments is a major objective of computer vision; it includes detecting what items are there, localizing them in 2D and 3D, identifying their properties, and describing their relationships. As a result, the dataset could be used to train item recognition and classification methods. COCO is frequently used to test the efficiency of real-time object recognition techniques. Modern neural networking modules can understand the COCO dataset’s structure. Contemporary AI-driven alternatives are not quite skillful in creating complete precision in findings that lead to a fact that the COCO dataset is a substantial reference point for CV to train, test, polish, and refine models for faster scaling of the annotation pipeline. The COCO standard specifies how your annotations and picture metadata are saved on disc at a substantial stage. Furthermore, the COCO dataset is an addition to transfer learning, in which the material utilized for one model is utilized to start another.”
contains the tutorial to build computer vision dataset using Datatorch.

About

COCO dataset can be found here: Datasets can be found here:

KYD allows users to explore the dataset by information that wasn’t originally in the dataset. “The tool annotates the existing data using machine learning models like , , and general (e.g. sharpness and brightness).”

You cannot run Know Your Data on your own data yet. For now, Know Your Data works for image-based datasets supported by the TensorFlow Datasets API. Here are the and links.

Tutorials

: “Know Your Data helps researchers, engineers, product teams, and decision makers understand datasets with the goal of improving data quality, and helping mitigate fairness and bias issues.”
: KYD allows you to explore fairness and bias issues by comparing features. You can see how labels correlate with protected entities.
: “We demonstrate some of the functionality of a dataset exploration tool, Know Your Data (KYD), recently introduced at Google I/O, using the COCO Captions dataset as a case study. Using this tool, we find a range of gender and age biases in COCO Captions — biases that can be traced to both dataset collection and annotation practices. KYD is a dataset analysis tool that complements the growing suite of responsible AI tools being developed across Google and the broader research community. Currently, KYD only supports analysis of a small set of image datasets, but we’re working hard to make the tool accessible beyond this set.”

About

OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision. It is a software toolkit for processing real-time image and video, as well as providing analytics, and machine learning capabilities. It was originally created in 2000 by Intel. Github: . According to , “using OpenCV developers can access many advanced computer vision algorithms used for image and video processing in 2D and 3D as part of their programs. The algorithms are otherwise only found in high-end image and video processing software.”

OpenCV provides for working on computer vision problems that are supported on the current popular deep learning frameworks: Tensorflow, Keras, and PyTorch.OpenCV’s trained models can be executed on CPUs or NVIDIA or Intel GPUs. OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware.

OpenCV also has launched both a) hardware devices called the OpenCV AI Kit (OAK) () and b) . “OAK is a modular, open-source ecosystem composed of MIT-licensed hardware, software, and AI training — that allows you to embed the super-power of spatial AI plus accelerated computer vision functions into your product. OAK provides in a single, cohesive solution what would otherwise require cobbling together disparate hardware and software components.

The marketplace is called and was built with OAK in mind.

Tutorials

: This series of posts will help you get started with OpenCV — the most popular computer vision library in the world. Also, check out and .
: additional courses
: “In this OpenCV Weekly Webinar, Roboflow CEO Joseph Nelson joins OpenCV CEO Satya Mallick to discuss the fundamentals of deploying computer vision models, including common pitfalls and best practices. That includes deploying to the a web hosted API, to the edge, and even in-browser for live webcam use.”
: “We’re excited to show you some of the new site’s [modelplace.ai] features, and how to build a simple but elegant product using OAK and the OpenCV AI Marketplace.”
Roboflow has created tutorial content on using OAK, including and .

About

OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit for optimizing and deploying AI inference (across various Intel specific hardware devices).

OpenVINO (see below) optimizes running OpenCV capabilities on Intel hardware. Here’s the .

Models can be found here:

Tutorials

: OpenVINO series from

About

CVAT tool is part of the OpenVINO toolkit and was originally designed to accelerate the process of annotating videos and images for use in training computer vision algorithms.

Tutorials

: By Roboflow (see below): “We walkthrough how to use the Computer Vision Annotation Tool (CVAT), a free tool for labeling images open sourced by Intel, as well as labeling best practices. Learn how to creating bounding boxes and prepare your computer vision dataset from scratch.”

About

COCO dataset can be found here: Datasets can be found here: Models can be found here:

“The Roboflow Model Library contains pre-configured model architectures for easily training computer vision models. Just add the link from your Roboflow dataset and you’re ready to go! We even include the code to export to common inference formats like TFLite, ONNX, and CoreML.”Roboflow empowers developers to build their own computer vision applications, no matter their skillset or experience. We provide all of the tools needed to convert raw images into a custom trained computer vision model and deploy it for use in applications. Roboflow supports object detection and classification models. Here’s the .

Tutorials

Check out for lots of interesting videos and tutorials.
” “In this video, we take a deep dive into the Microsoft Common Objects in Context Dataset (COCO). We show a COCO object detector live, COCO benchmark results, COCO example images, COCO class distribution, and more!”
: “Build your own image datasets automatically with Python.”
: “COCO format is not anywhere near universal and so you may find yourself needing to convert it to another format for a model (or export to COCO JSON from another format if you happen to be using a model that supports it). Roboflow is the universal tool for computer vision format conversion and can seamlessly input and output files in COCO JSON format.” The COCO dataset comes down in a special format called .
: This video has each step of the process building a working computer vision model.
OpenCV related (see above): OpenCV has launched hardware devices called the OpenCV AI Kit (OAK). Roboflow has created tutorial content on using OAK, including and .

About

COCO dataset can be found here: Datasets can be found here:

SupperAnnotate is an end-to-end platform to annotate, version, and manage ground truth data.Here’s the .

Datasets has Computer Vision Datasets which provides an “easily accessible way of exploring public datasets using SuperAnnotate’s data curation platform.” From there you can explore the COCO dataset. The allows access to the platform without web browser.

Tutorials

: This tutorial covers how to use the Python SDK.
: This tutorial covers the SuperAnnotate Desktop app and Python SDK. “SuperAnnotate platform provides end to end service for automating computer vision projects, starting from data engineering(generating high-quality training data) to model creation(training using neural networks). Allows project management through team creation and share via an API through Python SDK to measure progress. SuperAnnotate works with pixel-accurate annotations.”

About

Datasets can be found here:

OpenMMLab is an open-source algorithm platform for computer vision.

released more than 20 high-quality projects and toolboxes in various research areas such as image classification, object detection, semantic segmentation, action recognition, etc.
made public more than 300 algorithms and 2,300 checkpoint
Github link: (open-source) [see ]

is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project. It consists of:

Training recipes for object detection and instance segmentation.
360+ pre-trained models to use for fine-tuning (or training afresh).
Dataset support for popular vision datasets such as COCO, Cityscapes, LVIS and PASCAL VOC.

Major features of the toolbox

You can construct a customized object detection framework by combining different modules.
the toolbox directly supports popular and contemporary detection frameworks, e.g. Faster RCNN, Mask RCNN, RetinaNet, etc.
All basic bbox and mask operations run on GPUs.
The toolbox stems from the codebase developed by the MMDet team, who won COCO Detection Challenge in 2018.

Tutorials

Google Colab:
: In MMDetection, OpenMMLap recommends to convert the data into COCO formats and to do the conversion offline. The tutorial shows how you only need to modify the config’s data annotation paths and classes after the conversion of your data.
: “MMDetection is a Python toolbox built as a codebase exclusively for object detection and instance segmentation tasks. It is built in a modular way with PyTorch implementation. There are numerous methods available for object detection and instance segmentation collected from various well-acclaimed models. It enables quick training and inference with quality. On the other hand, the toolbox contains weights for more than 200 pre-trained networks, making the toolbox an instant solution in the object detection domain.”
Since MMDetection is a toolbox containing many pre-built models and each model has its own architecture, this toolbox defines a general architecture that can adapt to any model. This general architecture comprises the following parts: Backbone, Neck, DenseHead (AnchorHead/AnchorFreeHead), RoIExtractor, RoIHead (BBoxHead/MaskHead)

About

Models can be found here: .

Coral is a complete toolkit to build products with local AI. “Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline.” Coral has trained TensorFlow models for the Edge TPU for image classification, object detection, semantic segmentation, pose estimation, speech recognition.

Tutorials

: “This page provides several that are compiled for the Edge TPU, to run them, plus information about how to with TensorFlow.”
: “this tutorial shows you how to retrain a MobileNet V1 SSD model so that it detects two pets: Abyssinian cats and American Bulldogs (from the Oxford-IIIT Pets Dataset), using TensorFlow r1.15.”

[if I have time, I’ll add more hardware-specific and local / offline object detection or COCO-specific tutorials.]

Amazon

: “COCO is a format for specifying large-scale object detection, segmentation, and captioning datasets. This Python shows you how to transform a COCO object detection format dataset into an Amazon Rekognition Custom Labels . This section also includes information that you can use to write your own code.”
: “In this post, we discuss Detectron2, an object detection and segmentation framework released by Facebook AI Research (FAIR), and its implementation on to solve a dense object detection task for retail. This post includes an associated sample notebook, which you can run to demonstrate all the features discussed in this post. For more information, see the .”
: “In this post, we use to build, train, and deploy an ML model for object detection and use (Amazon A2I) to build and render a custom worker template that allows reviewers to identify or review objects found in an image. You can also use for object detection to identify objects from a predefined set of classes, or use to train your custom model to detect objects and scenes in images that are specific to your business needs, simply by bringing your own data.
: “For a computer vision project, I need to apply an object detection model on a large set of images. This blog post describes how this can be done in Amazon SageMaker using Batch Transform Jobs with the TensorFlow object detection model API.”

Facebook

How to use detectron2 : “The purpose of this guide is to show how to easily implement a pretrained Detectron2 model, able to recognize objects represented by the classes from the COCO (Common Object in COntext) dataset.”
: Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
: “we are releasing , an important new approach to object detection and panoptic segmentation. DETR completely changes the architecture compared with previous object detection systems. It is the first object detection framework to successfully integrate Transformers as a central building block in the detection pipeline.”
See “DETR” in section.
: “Common Objects in 3D (CO3D) is a dataset designed for learning category-specific 3D reconstruction and new-view synthesis using multi-view images of common object categories. The dataset has been introduced in our . The CO3D dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories. As such, it surpasses alternatives in terms of both the number of categories and objects.”

Google

: “This topic describes how to prepare the COCO dataset for models that run on Cloud TPU.”
: “In this tutorial, you train an image object detection model without writing any code. You submit the COCO dataset to AI Platform Training for training, and then you deploy the model on AI Platform Training to get predictions. The resulting model classifies common objects within images of complex everyday scenes.”
: Detect and classify multiple objects including the location of each object within the image. Learn more about object detection with and .

Microsoft

: “In this article, you’ll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.”
: “In this article, you learn how to prepare image data for training computer vision models with .”
: “In this tutorial, you learn how to train an object detection model using Azure Machine Learning automated ML with the Azure Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2 (preview). This object detection model identifies whether the image contains objects, such as a can, carton, milk bottle, or water bottle.”
: “The Microsoft cloud includes a collection of services that help you create advanced AI solutions. This course will teach you how to build an object detection solution with Azure Custom Vision.”

NVIDIA

: “ lets you take your own custom dataset and fine-tune it with one of the many popular network architectures to produce a task-specific model…With TAO Toolkit, you can achieve state-of-the-art accuracy using public datasets while maintaining high inference throughput for deployment. This post shows you how to train object detection and image classification models using TAO Toolkit to achieve the same accuracy as in the literature and open-sourced implementations. We trained on public datasets such as, , and as a comparison with published results in the literature or open-source community. This post discusses the complete workflow to reach state-of-the-art accuracy on several popular model architectures.”
The NVIDIA Train, Adapt, and Optimize (TAO) Toolkit:
: “In this post, we show you how we used the TAO Toolkit quantized-aware training and model pruning to accomplish this, and how to replicate the results yourself. We show you how to create an airplane detector, but you should be able to fine-tune the model for various satellite detection scenarios of your own.”
: “This post covers what you need to get up to speed using NVIDIA GPUs to run high performance object detection pipelines quickly and efficiently.”
: “Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.”

Weights and Biases

: “Today we’ll try out a couple of different models by comparing their performance on a custom dataset. One of the most important steps is to visualize the performance metrics on the go to get a good idea of what’s working and what’s not. We’ll use Weights and Biases (WandB) to log and visualize performance metrics.”

Other / Independent

Immersive Limit:: : “A detailed walkthrough of the COCO Dataset JSON Format, specifically for object detection (instance segmentations).”
PyImageSearch: : “In this tutorial, you will learn how to perform object detection with pre-trained networks using PyTorch. Utilizing pre-trained object detection networks, you can detect and recognize 90 common objects that your computer vision application will “see” in everyday life.” [And so many more for object detection: ]
Tensorflow: : In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.
Viso.ai: : “Everything you need to know about the popular Microsoft COCO dataset that is widely used for machine learning Projects. We will cover what you can do with MS COCO and what makes it different from alternatives such as Google’s OID (Open Images Dataset).”