WebXR Marker Tracking Module

1. Introduction

This module describes a mechanism for detecting 2D images in the real world and tracking their poses (position and orientation). The application supplies a set of images to be tracked when requesting an XR session. The XR device determines if the images are suitable for tracking, and returns information about their real-world position and orientation as they are detected in the user’s environment.

Since detecting and tracking these images happens locally on the device, this functionality can be implemented without providing camera images to the application.

2. Model

2.1. Tracked Image

A tracked image corresponds to the information used by the XR device to track an image.

A tracked image has an associated integer image index which is the zero-based index of this image in the trackedImages sequence provided in the session request.

A tracked image has an associated image trackable status which corresponds to the XR device’s ability to track this image based on characteristics of the supplied image itself. This is a boolean value, it is false if the image is unsuitable for tracking and guaranteed to never appear in image tracking results for this XRSession, and otherwise true.

A tracked image has an image was detected flag. This is initially false. It is set to true once the suitable for tracking algorithm returns true for a frame, and then remains true for the rest of the XRSession.

NOTE: A true value returned by the suitable for tracking algorithm indicates that the XR device has found this image during this session, its current view meets suitability requirements, and that the UA is going to provide tracking information for it from this point onward without further suitability checks.

A tracked image has an associated XRImageTrackingState image tracking state. This is "untracked" if the image is not being tracked at all, "tracked" if it is actively being tracked and visible, or "emulated" if its pose is extrapolated from a past tracked state.

NOTE: "tracked" typically means the image was recognized and is currently being actively tracked in 3D space, and is at least partially visible to a tracking camera. (This does not necessarily mean that it’s visible in the user’s viewport in case that differs from the tracking camera field of view.) "emulated" means that the image was recognized and tracked recently, but may currently be out of camera view or obscured, and the reported pose is based on assuming that the object remains at the same position and orientation as when it was last seen. This pose is likely to be adequate for a poster attached to a wall, but may be unhelpful for an image attached to a moving object.

NOTE: The "untracked" value for image tracking state is an internal detail of the algorithms described below. The getImageTrackingResults() API only returns information about images where the state is either "tracked" or "emulated".

A tracked image has an associated XRSpace image space that represents the tracking system’s 6DoF pose of the image in the user’s environment. The image space origin is the center point of the tracked image. The +x axis points toward the right edge of the image and +y toward the top of the image. The +z axis is orthogonal to the picture plane, pointing toward the viewer when the image’s front is in view.

The image space is used for getPose() calls according to the populate the pose algorithm with force emulation set to false. The tracking system MAY provide either actively tracked or statically known (emulated) poses.

NOTE: A getPose() call using the image space of an image with an image tracking state of "untracked" will report a null pose.

NOTE: For tracked images, the returned pose’s emulatedPosition value is always false. This attribute is intended to indicate a pose with a known orientation combined with an unknown position, for example a 3DoF headset, and that doesn’t fit this use case where both orientation and pose are emulated. Instead, the tracking system treats previously seen but not actively tracked images as statically known, and uses the image tracking state "emulated" to distinguish them from actively tracked images with image tracking state "tracked".

A tracked image has an associated float measured width in meters corresponding to the physical width of the image as measured by the tracking system. It is zero if the device is unable to measure the image’s size.

3. WebXR Device API Integration

This module expands the definitions of XRSessionInit, XRSession, and XRFrame.

3.1. XRSessionInit

This module introduces the string marker-tracking as a new valid feature descriptor for use in the requiredFeatures or optionalFeatures sequences for immersive sessions.

A device is capable of supporting the marker tracking feature if the XR device is capable of detecting and tracking images in the real world.

NOTE: There is no guarantee that the specific images supplied for a session are trackable. For example, an image where all pixels have the same color would likely be untrackable due to lack of features, and some types of images such as synthetic markers may only be trackable on some implementations. However, a device should not claim to support the feature if it completely lacks tracking capability.

The XRSessionInit dictionary is expanded by adding a new trackedImages member that is used to set up tracked images. It is a sequence of XRTrackedImageInit values.

NOTE: trackedImages is an optional member of XRSessionInit, but the feature will effectively be inactive if it is not supplied. There is no default set of tracked images.

dictionary XRTrackedImageInit {
  required ImageBitmap image;
  required float widthInMeters;
};

partial dictionary XRSessionInit {
  sequence<XRTrackedImageInit> trackedImages;
};

Each trackedImages entry specifies an ImageBitmap and a corresponding widthInMeters value that provides expected physical width measurement for the real-world image being tracked. This width may be approximate but is required. If the actual width differs substantially from the provided width, the tracked image result MAY have an inaccurate reported position.

NOTE: When viewed from a fixed camera position, a half-sized image at half the distance looks identical to a full-sized image, and the tracking system can’t differentiate these cases without additional context about the environment. The UA may be able to detect the actual size when tracking the image from multiple angles and update the measured width based on this, but is not required to do so.

NOTE: The UA MAY emit local warnings such as developer console messages if it is unable to support the feature or if the supplied images are unsuitable for tracking.

In order to set up tracked images for a requestSession() session request, add the following steps to initialize the session for the new XRSession session, with requested image list set to the value of XRSessionInit's trackedImages attribute:

Set session’s tracked images to an empty list.
If marker-tracking is not contained in session’s set of granted features, abort these steps.
If requested image list is undefined or an empty list, abort these steps.
For each requested image in requested image list:
1. Set up any platform resources required to track requested image.
2. Create a new tracked image image:
  1. Set image’s image index to the position index of this image in requested image list.
  2. Set image’s image trackable to either true or false depending on the platform’s expected ability to track this image.
  3. Set image’s image was detected flag to false.
  4. Set image’s image tracking state to "untracked".
  5. Set image’s measured width in meters to zero.
  6. If image trackable is true, set image’s image space to a device XRSpace associated with this image.
3. Append image to tracked images.

3.2. XRSession

Each XRSession has a list of tracked images which is the trackedImages sequence supplied in the session request.

When a valid XRSession has been established with the marker-tracking feature active, the getImageTrackability() method can be used to obtain image trackability, it returns a promise that provides information about the expected ability to use the provided images for tracking.

enum XRImageTrackability {
  "untrackable",
  "trackable",
};

partial interface XRSession {
  Promise<FrozenArray<XRImageTrackability>> getImageTrackability();
};

In order to obtain image trackability for an XRSession session, the user agent MUST run the following steps:

Let promise be a new Promise in the relevant realm of this XRSession.
Run the following steps in parallel:
1. Set image trackabilities to an empty list.
2. For each tracked image image in session’s list of tracked images:
  1. Obtain an XRImageTrackability trackability from the XR device that represents the trackability of image.
  2. Append trackability to image trackabilities.
3. queue a task to resolve promise with the value image trackabilities.
Return promise.

The XRImageTrackability enum value "untrackable" means that the image is not usable for tracking, for example due to having insufficient distinctive feature points, and this image MUST NOT appear in tracking results for this session. The value "trackable" means that the image is potentially detectable.

NOTE: Future versions of this API may define additional more granular values with quality estimates for trackable images. Applications should treat a value other than "untrackable" as representing a potentially trackable image.

3.3. XRFrame

When marker tracking is active, add the update tracked images algorithm to the XRSession's list of frame updates.

In order to update tracked images for an XRFrame frame in an XRSession session, the user agent MUST run the following steps:

For each tracked image image in session’s list of tracked images, using the current device tracking state of image for frame:
If the XR device has no tracking information for image:

Set image’s image tracking state to "untracked".

Otherwise:
1. If image’s image was detected flag is false, and if the suitable for tracking algorithm for image returns false, continue to the next entry.
2. Set image’s image was detected attribute to true.
3. Set image’s image tracking state to "tracked" if the image is actively being tracked, or to "emulated" if the position is inferred based on previous observations.
4. Set image’s image space based on the XR device’s estimate of the image’s pose.
5. Set image’s measured width in meters based on the XR device’s estimated physical width if available, or set to zero if there is no available estimate.

Applications can use the XRFrame's getImageTrackingResults() method to obtain image tracking results about the current state of tracked images in that frame.

enum XRImageTrackingState {
  "untracked",
  "tracked",
  "emulated",
};

[SecureContext, Exposed=Window]
interface XRImageTrackingResult {
  [SameObject] readonly attribute XRSpace imageSpace;
  readonly attribute unsigned long index;
  readonly attribute XRImageTrackingState trackingState;
  readonly attribute float measuredWidthInMeters;
};

partial interface XRFrame {
  FrozenArray<XRImageTrackingResult> getImageTrackingResults();
};

In order to obtain image tracking results for an XRFrame frame, the user agent MUST run the following steps:

Let session be frame’s session object.
Let results be an empty list.
For each tracked image image in session’s list of tracked images:
1. If image’s image tracking state is "untracked", continue to the next entry.
2. Let result be an XRImageTrackingResult
3. Set result’s imageSpace to image’s image space.
4. Set result’s index to image’s image index
5. Set result’s trackingState to image’s image tracking state
6. Set result’s measuredWidthInMeters to image’s measured width in meters
7. Append result to results.
Return a new FrozenArray containing the elements of results.

NOTE: The image tracking results only contains information about images with an image tracking state of "tracked" or "emulated". "untracked" images are omitted from the returned array. Applications can use the index value to associate each result with the underlying image.

NOTE: Each tracked image can appear at most once in the tracking results. If multiple copies of the same image exist in the user’s environment, the device can choose an arbitrary instance to report a pose, and this choice can change for future XRFrames.

4. Security, Privacy, and Comfort Considerations

4.1. Sensitive Information

In the context of image tracking, sensitive image information includes, but is not limited to, information about the existence and position of specific real-world images in the user’s environment.

NOTE: For example, a hostile application might try to detect large-denomination bank notes or other valuable items, specific book covers of titles that may be banned or restricted in certain jurisdictions, or other information that the user may be unwilling to share.

NOTE: The goal of this API is to provide a reasonable amount of protection against disclosing such information, providing a tradeoff where it provides useful functionality with reduced risk compared to full camera access by the application. If an application were to ask for and receive full camera access, it could scan for all of this sensitive information, and there would be no way for the UA to mitigate the risks.

4.2. Protected functionality

The sensitive image information exposed by the API has the following threat profiles and necessary protections:

4.2.1. Presence of images

In order to check if a tracked image image is suitable for tracking for an XRFrame frame, the user agent MUST run the following steps:

If image is considered unsuitable for tracking due to device or UA limitations, return false.
If image is not currently being actively tracked by the XR device in frame, return false.
If image’s current pose indicates it’s outside the user’s central field of view, return false.
If image’s current pose indicates that the image’s angular size is too small for it to be prominently visible, return false.
Return true.

The goal of this algorithm is that the image MUST fill a substantial fraction of the user’s central field of view (for a head-mounted device) or camera view (for a handheld screen-based device) to be initially detected, and MUST have an angular area large enough to indicate an intent that the user is actively focusing on the image and is aware of it.

The UA MUST NOT initiate tracking for distant or small images, or images that only appear in peripheral vision.

NOTE: The UA’s detailed criteria for initiating tracking are left to the UA’s discretion and depend on the device and tracking system.

NOTE: For example, a smartphone AR system may require that the image fills at least 25% of the camera frame’s area for initial detection.

NOTE: This limitation only applies to initial detection. Once an image has been determined to be suitable for tracking, the UA is free to continue reporting poses for that image even if it is distant or partially occluded.

5. Acknowledgements

The following individuals have contributed to the design of the WebXR Marker Tracking specification:

Brandon Jones (Google)
Alex Turner (Microsoft)

WebXR Marker Tracking Module

Draft Community Group Report, 2 July 2021

Abstract

Status of this document

1. Introduction

2. Model

2.1. Tracked Image

3. WebXR Device API Integration

3.1. XRSessionInit

3.2. XRSession

3.3. XRFrame

4. Security, Privacy, and Comfort Considerations

4.1. Sensitive Information

4.2. Protected functionality

4.2.1. Presence of images

5. Acknowledgements

Conformance

Document conventions

Conformant Algorithms

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

IDL Index