visit
Running Facebox against lots of video files does require some extra tools and workflow. For example, Machine Box’s parent company Veritone has that sorts those kinds of things for production and scale.That will work well in certain circumstances, but you’re going to see a lot of false negatives (instances where the celebrity wasn’t detected in video) as well. How does one avoid this? Fortunately, the answer is also pretty simple. Use a photo FROM the video you want to recognize the person in. For example, we could try and use this well-lit, professional photo of Keanu Reeves from the internet to try and find him in all of our thousands of hours of Late Night with Jimmy Fallon clips, and we’ll certainly get some positive hits. But if we really want to dial-up the accuracy, we’ll also throw in this gem of a freeze frame of Keanu mid sentence.
Freeze frame from video we want to recognize him in When I ran a video from Jimmy Fallon’s late night through (another nifty tool) which was trained on the single, professional photo of Keanu Reeves above, it found him throughout the video, but a lot of the results had confidence scores around 50–65%. This was the only instance above 65%:
When I ran the same video again, after teaching Facebox a freeze frame from the video itself, we not only increased the confidence (up to 100% in some places), but we also found him in many more places in the video.
This gets even more important when you’re training with hundreds of different celebrities, and running the model against thousands of hours of video. Is it going to be 100% accurate every time? Of course not. But the amount of time you’ll save yourself hunting for clips of Keanu Reeves in your collection will be significantly reduced. By the way, do you know how to spot a deep fake? Look for the absence of blinking. Not having blinking is an artifact of only using training images of people with their eyes open. I’m not super excited to teach you how to make better deep fake videos, but it is an interesting case about training data and accuracy. There’s a great podcast on this subject on .
puts state of the art machine learning capabilities into Docker containers so developers like you can easily incorporate natural language processing, facial detection, object recognition, etc. into your own apps very quickly.
The boxes are built for scale, so when your app really takes off just add more boxes horizontally, to infinity and beyond. Oh, and it’s way cheaper than any of the cloud services (and they might be better)… and your data doesn’t leave your infrastructure.