Google has released to researchers and developers its own mobile device-based hand tracking method using machine learning, something Google R call a “new approach to hand perception.”

First unveiled at CVPR 2019 back in June, Google’s on-device, real-time hand tracking method is now available for developers to explore—implemented in MediaPipe, an open source cross-platform framework for developers looking to build processing pipelines to handle perceptual data, like video and audio.

The approach is said to provide high-fidelity hand and finger tracking via machine learning, which can infer 21 3D ‘keypoints’ of a hand from just a single frame.

“Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method achieves real-time performance on a mobile phone, and even scales to multiple hands,” say in a blog post.


Google Research hopes its hand-tracking methods will spark in the community “creative use cases, stimulating new applications and new research avenues.”

 explain that there are three primary systems at play in their hand tracking method, a palm detector model (called BlazePalm), a ‘hand landmark’ model that returns high fidelity 3D hand keypoints, and a gesture recognizer that classifies keypoint configuration into a discrete set of gestures.

Indie Dev Experiment Brings Google Lens to VR, Showing Real-time Text Translation

Here’s a few salient bits, boiled down from the full blog post:

  • The BlazePalm technique is touted to achieve an average precision of 95.7% in palm detection, researchers claim.
  • The model learns a consistent internal hand pose representation and is robust even to partially visible hands and self-occlusions.
  • The existing pipeline supports counting gestures from multiple cultures, e.g. American, European, and Chinese, and various hand signs including “Thumb up”, closed fist, “OK”, “Rock”, and “Spiderman”.
  • Google is open sourcing its hand tracking and gesture recognition pipeline in the MediaPipe framework, accompanied with the relevant end-to-end usage scenario and source code, here.

In the future, say Google Research plans on continuing its hand tracking work with more robust and stable tracking, and also hopes to enlarge the amount of gestures it can reliably detect. Moreover, they hope to also support dynamic gestures, which could be a boon for machine learning-based sign language translation and fluid hand gesture controls.

Not only that, but having more reliable on-device hand tracking is a necessity for AR headsets moving forward; as long as headsets rely on outward-facing cameras to visualize the world, understanding that world will continue to be a problem for machine learning to address.

Newsletter graphic

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. More information.

Well before the first modern XR products hit the market, Scott recognized the potential of the technology and set out to understand and document its growth. He has been professionally reporting on the space for nearly a decade as Editor at Road to VR, authoring more than 4,000 articles on the topic. Scott brings that seasoned insight to his reporting from major industry events across the globe.
  • starchaser28

    I would love to see a demo of this implemented on the Oculus Quest!

  • I’ve tried it on my phone. I am not super-impressed: the tracking is not reliable with artificial light and cluttered environment. The demo doesn’t work with multiple hands. Anyway, it’s a good job… when it works, the 3D tracking is very well performed.

  • Greyl

    Kinda makes the Index controllers look like a good solution in the here and now, but somewhat redundant as camera finger tracking improves.

    • George Stewart

      But how do you integrate haptics with camera based solutions? It’s exciting for sure, but it isn’t right for every application.

      • Ratm

        Current haptics are bit trashy to begin with,only good wheels provide descend feedback.
        Gloves using this google tech is probably the future.
        I wish we had that tracking even without the feedback,looks enough good to even paint with it.

      • Greyl

        I guess they could make a small and cheap Bluetooth PC peripheral you strap to your palms, to send haptic feedback. It could even emit IR light to be used to further improve tracking.

  • Eric Draven

    I can’t see how this is different from the Leap Motion device, leaving the “mobile” aside… I tried using Leap with my rift, and it was really cool, but haptics and movement (without a joystick) were the main issues for me, so I moved on to touch controllers

  • Sponge Bob

    wait a minute…

    How can they track in 3D with just one camera ?

    Impossible by definition

    LeapMotion uses sophisticated system of IR ray projections from 2 angles, not just one

    There is no 3D tracking with just one camera, Period.

    • Lachlan Sleight

      There are loads of ways to infer 3D information from a single camera feed. Multiple cameras are used to detect depth using stereo disparity, which is certainly more robust and accurate than many of the one-camera methods.

      How do you think Rift DK2, ARKit / ARCore, PS Move, Vuforia etc perform their 3D tracking? Those are all single-camera 3D tracking systems. They all achieve their tracking in different ways.

  • Minimum viable product development and prototype creation are often confused. Let’s clear the air when it comes to these two notions.

    Let’s explore MVP software development.

  • Sebastian Hasselrud

    Have somebody tried to implement Blockchain NFTs into VR Concept?
    I heard Norwegian company Rasklå has tried this out with consumer loans