472 reads

The Future of Human In The Loop

by Tyler SchulzeNovember 15th, 2019

Too Long; Didn't Read

The market for third-party data labeling solutions was $150M in 2018 and will grow to over $1B by 2023. The industry has shifted from simple bounding boxes and speech transcription to pixel-perfect image segmentation and millisecond-level time slices in audio analysis. In pathology, for example, detecting diseased cells in a tissue slide requires incredible accuracy as the diagnosis of disease depends on deriving the correct answer. The nature of this work will increasingly require more subject matter expertise and consensus on what answer is considered “most correct”

People Mentioned

featured image - The Future of Human In The Loop

Since the 1980’s, human/machine interactions, and human-in-the-loop (HTL) scenarios in particular, have been systematically studied. It was often predicted that with an increase in automation, less human-machine interaction would be needed over time. Human input is still relied upon for most common forms of AI/ML training, and often even more human insight is required than ever before.

This brings us to a question:

As AI/ML technology continues to progress, what will the trajectory of human-machine interaction be over time and how might it differ from the status quo?

As AI/ML evolves and baseline accuracy of models improves, the type of human interaction required will change from creation of generalized ground truth from scratch, to human review of the worst-performing ML predictions in order to improve and fine-tune models iteratively and cost-effectively.

Deep learning algorithms thrive on labeled data and can be improved progressively if more training data is added over time. For example, a common use case is to annotate boundaries of buildings in satellite images of cities to create models that generate accurate street maps for navigation applications.

Incorrect, biased, or subjective labels are prone to generating inconsistencies in the maps. Human review of every element of such ML-generated maps would be a painstaking if not impossible task, so the best approach is to analyze the ML predictions programmatically, focus on the self-reported regions of low confidence, prioritize these for human review and editing, then reintroduce the results as new training data.

The iterative nature of the process still relies on human input but the nature of this work will increasingly require more subject matter expertise and consensus on what answer is considered “most correct.”

A recent report by Cognilytica noted that the data preparation tasks such as aggregating, labeling and cleansing represent over 80% of the time consumed in most AI/ML projects. It is estimated that the market for third-party data labeling solutions was $150M in 2018 and will grow to over $1B by 2023[1].

Labeling accuracy is increasingly becoming a primary concern — the industry has shifted from simple bounding boxes and speech transcription to pixel-perfect image segmentation and millisecond-level time slices in audio analysis.

In pathology, for example, detecting diseased cells in a tissue slide requires incredible accuracy as the diagnosis of disease and thereby a patient’s plan of care depends on deriving the correct answer. The stakes are obviously extremely high, so the boundaries of diseased cells need to be labeled as accurately as possible. In the case of autonomous vehicles, identifying objects and activity in millisecond-level time slices is now the norm.

When a car from a neighboring lane moves into the same lane as an autonomous vehicle, the reaction must be immediate while taking other factors into account — such as the location and speed of every other vehicle in the immediate vicinity. Human input on situations that require judgment when facing a series of potentially disastrous results is no longer theoretical, it has become the next frontier in data annotation.

As ML models approach the barrier of 100% accuracy, establishing ground truth intrinsically becomes more subjective, requiring increasingly higher levels of subject matter expertise and labeling precision. Voting mechanisms to decide the collective wisdom of expert-level human annotators are now used routinely.

In 2019, a study published in the Open Data Science Conference (ODSC) compared the performance of full-time data labelers to crowdsourced workers on a simple transcription task. The crowdsourced workers had 10 times more errors than the professional annotators. A similar trend was observed for tasks such as sentiment analysis or extracting information from unstructured text. This study highlights that hiring a professionally managed workforce is often the optimal overall solution when taking both the accuracy of results and cost into account.

We anticipate that “commodity” data labeling currently offered by crowdsourcing and business process optimization organizations around the world will soon be displaced by smaller teams of annotation specialists with deep subject matter expertise. By extension, this shift will require more expensive labor, strict quality controls, specialized toolsets, and workflow automation to optimize the process versus huge teams of low-cost labor.

It is evident that although many advances have been made since ’80s, AI/ML is still a rapidly evolving field and human-machine interaction to support model training will continue to be a critical input for the foreseeable future.

However, the nature of human-in-the-loop workflows and the expertise of the humans involved will continue to change dramatically as the annotation problems to be solved become increasingly more complex and demanding.

[1] Source: Cognilytica

L O A D I N G
. . . comments & more!