visit
Image by
This post is about Machine Learning and data labeling. It takes you on a tour throughout some of the most interesting techniques, right under our noses**,** used by companies such as Google, LinkedIn or Facebook, to have users label their data. This is a praise to the immense creativity of the creators behind those practices, and a lesson for those who are swamped in data and are struggling to figure out how to make sense of it. Because it really is wonderful to watch how, through convenience for the user and rigorous statistical study, tech companies find ways to create value for the customer, the user and especially themselves.
How much would you accept to forgo Search or E-mail? — Source: The Economist. It’s common knowledge that these companies make money with ads, but how is it exactly that they transform what we users do with on their platforms into real money? To grasp this, let’s decompose the essence of a business, so we can identify its key aspects:
A business model in just a couple of lines.
Now, to illustrate, let’s compare the business of a traditional shoe maker with that of Facebook.
Brick & Mortar retail v.s. the Attention Economy at a glance.
So there you have it. We users not only are not the customers, when it comes to free online services, we and our behavior are .
Typically we think of data as pictures, videos, chat conversations or tweets, but reality is, that barely scratches the surface. The apps installed in our phones and the extensions in our browsers are capable of tracking our every action on real time. The number of seconds spent (presumably looking) at a certain screen in a phone, the number of times the word “awesome” or “mom” is used on a messaging app, entire search histories, comments, likes, shares, hearts, pokes (remember those?), hashtags, the pictures watched and for how long, are all easily extracted and mashed up together from the user’s digital footprint. We’re going to call this kind of data Behavioral Data, as opposed to the pictures, videos and conversations, which we’ll call Explicit Data.
Based on this definition we can take a new perspective on the data being collected on us by the tech companies: Behavioral Data is descriptive of how users are using the tools available to them in the platform. Therefore, by programmatically introducing alterations to those tools, service providers can design systematic ways to catch behavioral patterns that are reflective of the way users think and act upon certain cues. By doing this, service providers can transform terabytes worth of user interaction data into actionable insights and automated content curation engines that increase engagement time and ultimately the bottom line. For the service providers it is a no brainer to double down in A/B testing and analytics that implement Machine Learning to cluster users together based on behavior. In other words, with every click the user makes, he willingly tells the tech companies how he thinks and how his brain can be hacked.
Now that we’ve discussed the basics on what type of data we’re talking about, and why it is relevant for companies to label it, let’s dive into the techniques to get the users to label it for you.
Every single traceable interaction between user and interface is a potential source of high quality, labeled data for ML. Introduction | | | | | | | | |
I normally write so feel free to take a look and clap like the world is ending (so you can teach Medium to show you more of… well, me.)
Right after the👏 rain, follow us on , and .