Facebook announced this week the open-sourcing of Detectron, the company’s platform for computer vision object detection algorithm based on a deep learning framework. The company says that its motive for opening up the project is to accelerate computer vision research, and that teams within Facebook are using the platform for a variety of applications, including augmented reality.
In my recent article detailing the three biggest challenges facing augmented reality today, I noted that real-time object classification was one of the biggest hurdles:
…it’s a non-trivial problem to get computer-vision to understand ‘cup’ rather than just seeing a shape. This is why for years and years we’ve seen AR demos where people attach fiducial markers to objects in order to facilitate more nuanced tracking and interactions.
Why is it so hard? The first challenge here is classification. Cups come in thousands of shapes, sizes, colors, and textures. Some cups have special properties and are made for special purposes (like beakers), which means they are used for entirely different things in very different places and contexts.
Think about the challenge of writing an algorithm which could help a computer understand all of these concepts, just to be able to know a cup when it sees it. Think about the challenge of writing code to explain to the computer the difference between a cup and a bowl from sight alone.
I also talked about how ‘deep learning’ techniques—which involve ‘training’ a computer to interpret what it sees, rather than programming detection by hand—are one potential answer to the problem of real-time object classification. Facebook this week has open-sourced their own object detection algorithm in a move which could accelerate development of systems capable of the sort of real-time object classification that could make augmented reality truly useful.
Augmented reality that actually interacts with the world around us without being pre-programmed for specific environments needs to have a cursory understanding of what’s in our immediate vicinity. For example, if you’re wearing AR glasses and want to be able to project the oven temperature above the oven, along with an AR list floating on your refrigerator to show what food you’re almost out of, your glasses need to know what an oven and a refrigerator look like; a tremendously challenging task given the wide range of ovens and refrigerators, and the places in which they reside.
Facebook’s AI research team, among others, has been working on this problem of object detection by using deep learning to give computers the ability to reach conclusions about what objects are present in a scene. The company’s object detection algorithm, based on the Caffe2 deep learning framework, is called Detectron, and it’s now available for anyone to experiment with, hosted here on GitHub. Facebook hopes that open-sourcing Detectron will enable computer vision researchers around the world to experiment with and continue to improve the state of the art.
“The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It is designed to be flexible in order to support rapid implementation and evaluation of novel research,” the project’s GitHub page reads.
The algorithms examine video input and are able to make guesses about what discrete objects comprise the scene. Research projects like Detecting and Recognizing Human-Object Interactions (Gkioxari et al), have used Detectron as a foundation for understanding human actions performed with objects in an environment, a step in the right direction toward helping computers understand enough about what we’re doing to be able to offer valuable information on the fly.
Detectron is also used internally by Facebook outside of AI research; “teams use this platform to train custom models for a variety of applications including augmented reality and community integrity,” the company wrote in the announcement of Detectron’s open-sourcing.
Exactly which teams would be using Detectron for augmented reality isn’t clear, but one obvious guess is Oculus, whose chief scientist, Michael Abrash, recently spoke at length about how and when augmented reality will transform our lives.