WebXR Device API

Editor’s Draft,

This version:
https://immersive-web.github.io/webxr/
Issue Tracking:
GitHub
Inline In Spec
Editors:
(Google)
(Microsoft)
Participate:
File an issue (open issues)
Mailing list archive
W3C’s #webvr IRC

Abstract

This specification describes support for accessing virtual reality (VR) and augmented reality (AR) devices, including sensors and head-mounted displays, on the Web.

UNSTABLE API

The version of the WebXR Device API represented in this document is incomplete and may change at any time.

While this specification is under development some concepts may be represented better by the WebXR Device API Explainer.

1. Introduction

Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications requires high-precision, low-latency interfaces to deliver an acceptable experience. Other interfaces, such as the RelativeOrientationSensor and AbsoluteOrientationSensor, can be repurposed to surface input from these devices to polyfill the WebXR Device API. The WebXR Device API provides purpose-built interfaces to VR/AR hardware to allow developers to build compelling, comfortable immersive experiences.

2. Terminology

This document uses the acronym XR throughout to refer to the spectrum of hardware, applications, and techniques used for Virtual Reality, Augmented Reality, and other related technologies. Examples include, but are not limited to:

The important commonality between them being that they offer some degree of spatial tracking with which to simulate a view of virtual content.

Terms like "XR Device", "XR Application", etc. are generally understood to apply to any of the above. Portions of this document that only apply to a subset of these devices will indicate so as appropriate.

The terms 3DoF and 6DoF are used throughout this document to describe the tracking capabilities of XR devices.

3. Security, Privacy, and Comfort Considerations

The WebXR Device API provides powerful new features which bring with them several unique privacy, security, and comfort risks that user agents must take steps to mitigate.

3.1. Gaze Tracking

While the API does not yet expose eye tracking capabilites a lot can be inferred about where the user is looking by tracking the orientation of their head. This is especially true of XR devices that have limited input capabilities, such as Google Cardboard, which frequently require users to control a "gaze cursor" with their head orientation. This means that it may be possible for a malicious page to infer what a user is typing on a virtual keyboard or how they are interacting with a virtual UI based solely on monitoring their head movements. For example: if not prevented from doing so a page could estimate what URL a user is entering into the user agent’s URL bar.

To prevent this risk the user agent MUST blur all sessions when the users is interacting with sensitive, trusted UI such as URL bars or system dialogs. Additionally, to prevent a malicious page from being able to monitor input on a other pages the user agent MUST blur all sessions on non-focused pages.

3.2. Trusted Environment

If the virtual environment does not consistently track the user’s head motion with low latency and at a high frame rate the user may become disoriented or physically ill. Since it is impossible to force pages to produce consistently performant and correct content the user agent MUST provide a tracked, trusted environment and an XR Compositor which runs asynchronously from page content. The compositor is responsible for compositing the trusted and untrusted content. If content is not performant, does not submit frames, or terminates unexpectedly the user agent should be able to continue presenting a responsive, trusted UI.

Additionally, page content has the ability to make users uncomfortable in ways not related to performance. Badly applied tracking, strobing colors, and content intended to offend, frighten, or intimidate are examples of content which may cause the user to want to quickly exit the XR experience. Removing the XR device in these cases may not always be a fast or practical option. To accomodate this the user agent SHOULD provide users with an action, such as pressing a reserved hardware button or performing a gesture, that escapes out of WebXR content and displays the user agent’s trusted UI.

When navigating between pages in XR the user agent should display trusted UI elements informing the user of the security information of the site they are navigating to which is normally presented by the 2D UI, such as the URL and encryption status.

3.3. Context Isolation

The trusted UI must be drawn by an independent rendering context whose state is isolated from any rendering contexts used by the page. (For example, any WebGL rendering contexts.) This is to prevent the page from corrupting the state of the trusted UI’s context, which may prevent it from properly rendering a tracked environment. It also prevents the possibility of the page being able to capture imagery from the trusted UI, which could lead to private information being leaked.

Also, to prevent CORS-related vulnerabilities each page will see a new instance of objects returned by the API, such as XRDevice and XRSession. Attributes such as the context set by one page must not be able to be read by another. Similarly, methods invoked on the API MUST NOT cause an observable state change on other pages. For example: No method will be exposed that enables a system-level orientation reset, as this could be called repeatedly by a malicious page to prevent other pages from tracking properly. The user agent MUST, however, respect system-level orientation resets triggered by a user gesture or system menu.

3.4. Fingerprinting

Given that the API describes hardware available to the user and its capabilities it will inevitably provide additional surface area for fingerprinting. While it’s impossible to completely avoid this, steps can be taken to mitigate the issue. This spec limits reporting of available hardware to only a single device at a time, which prevents using the rare cases of multiple headsets being connected as a fingerprinting signal. Also, the devices that are reported have no string identifiers and expose very little information about the devices capabilities until an XRSession is created, which may only be triggered via user activation in the most sensitive case.

Discuss use of sensor activity as a possible fingerprinting vector.

4. Device Enumeration

4.1. XR

[SecureContext, Exposed=Window] interface XR : EventTarget {
  // Methods
  Promise<XRDevice?> requestDevice();

  // Events
  attribute EventHandler ondevicechange;
};

[SecureContext]
partial interface Navigator {
  [SameObject] readonly attribute XR xr;
};

The xr object is the entry point to the API, used to query for XRDevices available to the user agent. It has a list of XR devices, which MUST be initially empty, and a default device which MUST be initially null.

The user agent MUST be able to enumerate XR devices attached to the system, at which time each available device is placed in the list of XR devices. Subsequent algorithms requesting enumeration MAY reuse the cached list of XR devices. Enumerating the devices should not initialize device tracking. After the first enumeration the user agent SHOULD begin monitoring device connection and disconnection, adding connected devices to the list of XR devices and removing disconnected devices.

Each time the list of XR devices changes the user agent should select a default XR device by running the following steps:

  1. Let oldDefaultDevice be the default device.

  2. If the list of XR devices is empty, set the default device to null.

  3. If the list of XR devices contains one device set the default device to that device.

  4. If the list of XR devices contains multiple devices set the default device to a device of the user agent’s choosing.

  5. If this is not the first time devices have been enumerated and oldDefaultDevice does not equal the default device queue a task that fires a simple event named devicechange on the XR object.

NOTE: The user agent is allowed to use any criteria it wishes to select a default XR device when the list of XR devices contains multiple devices. For example, the user agent may always select the first item in the list, or provide settings UI that allows users to manage device priority. Ideally the algorithm used to select the default device is stable and will result in the same device being selected across multiple browsing sessions.

The page can request a device by calling the requestDevice() method on the xr object. When invoked it MUST return a new Promise promise and run the following steps in parallel:

  1. Enumerate XR devices.

  2. If the list of XR devices is empty, reject promise with a NotFoundError and abort these steps.

  3. Select a default XR device from the list of XR devices.

  4. Resolve promise with the default device.

Calling requestDevice() MUST NOT trigger device-selection UI as this would cause many sites to display XR-specific dialogs early in the document lifecycle without user activation.

Take permissions into account when calling requestDevice.

The ondevicechange attribute is an Event handler IDL attribute for the devicechange event type.

The following code finds an available XRDevice.
navigator.xr.requestDevice().then(device => {
  // Resolves if an XRDevice is available.
  onXRAvailable(device);
}).catch(error => {
  // An error occurred while requesting an XRDevice or none is available.
  console.error('Unable to retrieve an XR device: ', error);
});

4.2. XRDevice

[SecureContext, Exposed=Window] interface XRDevice {
  // Methods
  Promise<void> supportsSession(optional XRSessionCreationOptions options);
  Promise<XRSession> requestSession(optional XRSessionCreationOptions options);
};

An XRDevice represents a physical unit of XR hardware that can present imagery to the user somehow. On desktop devices this may take the form of a headset peripheral; on mobile devices it may represent the device itself in conjunction with a viewer harness. It may also represent devices without the ability to present content in stereo but with advanced (6DoF) tracking capabilities.

Each XRDevice has a supports immersive value, which is a boolean which MUST be set to true if the device can support immersive sessions and false if it cannot.

Each XRDevice has an immersive session, which MUST be initially null, and a list of non-immersive sessions, which MUST be initially empty.

When the supportsSession(options) method is invoked, it MUST return a new Promise promise and run the following steps in parallel:

  1. Let device be the target XRDevice object.

  2. If the options are not supported by the device device, reject promise with null.

  3. Else resolve promise with null.

When the requestSession(options) method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:

  1. Let device be the target XRDevice object.

  2. If the options are not supported by the device device, reject promise with a NotSupportedError and abort these steps.

  3. Let immersive be the immersive attribute of the options argument.

  4. If immersive is true and device’s immersive session is not null, reject promise with an InvalidStateError and abort these steps.

  5. If immersive is true and the algorithm is not triggered by user activation, reject promise with a SecurityError and abort these steps.

  6. Let session be a new XRSession.

  7. Initialize the session session with the session description given by options.

  8. If immersive is true set the device’s immersive session to session.

  9. Else append session to device’s list of non-immersive sessions.

  10. Resolve promise with session.

The following code attempts to retrieve an immersive XRSession.
let xrSession;

xrDevice.requestSession({ immersive: true }).then((session) => {
  xrSession = session;
});

5. Session

5.1. XRSessionCreationOptions

dictionary XRSessionCreationOptions {
  boolean immersive = false;
  XRPresentationContext outputContext;
};

The XRSessionCreationOptions dictionary provides a session description, indicating the desired properties of a session to be returned from requestSession().

To determine if an XRSessionCreationOptions options is supported by the device device run the following steps:

  1. Let immersive be options.immersive.

  2. If immersive is true and device’s supports immersive boolean is false return false.

  3. If immersive is false and options.outputContext is null, return false

Document restrictions and capabilities of an immersive session

5.2. XRSession

enum XREnvironmentBlendMode {
  "opaque",
  "additive",
  "alpha-blend",
};

[SecureContext, Exposed=Window] interface XRSession : EventTarget {
  // Attributes
  readonly attribute XRDevice device;
  readonly attribute boolean immersive;
  readonly attribute XRPresentationContext outputContext;
  readonly attribute XREnvironmentBlendMode environmentBlendMode;

  attribute double depthNear;
  attribute double depthFar;
  attribute XRLayer baseLayer;

  // Methods
  Promise<XRFrameOfReference> requestFrameOfReference(XRFrameOfReferenceType type, optional XRFrameOfReferenceOptions options);

  FrozenArray<XRInputSource> getInputSources();

  long requestAnimationFrame(XRFrameRequestCallback callback);
  void cancelAnimationFrame(long handle);

  Promise<void> end();

  // Events
  attribute EventHandler onblur;
  attribute EventHandler onfocus;
  attribute EventHandler onresetpose;
  attribute EventHandler onend;
  attribute EventHandler onselect;
  attribute EventHandler onselectstart;
  attribute EventHandler onselectend;
};

Any interaction with XR hardware outside of enumeration is done via an XRSession object, which can only be retrieved by calling requestSession() on an XRDevice. Once a session has been successfully acquired it can be used to poll the device pose, query information about the user’s environment and, present imagery to the user.

The user agent, when possible, SHOULD NOT initialize device tracking or rendering capabilities until an XRSession has been acquired. This is to prevent unwanted side effects of engaging the XR systems when they’re not actively being used, such as increased battery usage or related utility applications from appearing when first navigating to a page that only wants to test for the presence of XR hardware in order to advertise XR features. Not all XR platforms offer ways to detect the hardware’s presence without initializing tracking, however, so this is only a strong recommendation.

When an XRSession is created, the user agent MUST initialize the session by running the following steps:

  1. Let session be the newly created XRSession object.

  2. Let device be the XRDevice object that requested session’s creation.

  3. Let options be the XRSessionCreationOptions passed to requestSession().

  4. Initialize session’s device to device.

  5. Initialize session’s immersive to options immersive value.

  6. Initialize session’s outputContext to options outputContext value.

  7. Initialize session’s depthNear to 0.1.

  8. Initialize session’s depthFar to 1000.0.

  9. Initialize session’s baseLayer to null.

  10. If no other features of the user agent have done so already, perform the necessary platform-specific steps to initialize the device’s tracking and rendering capabilities.

A number of diffrent circumstances may shut down the session, which is permanent and irreversable. Once a session has been shut down the only way to access the XRDevice's tracking or rendering capabilities again is to request a new session. Each XRSession has an ended boolean, initially set to false, that indicates if it has been shut down.

When an XRSession is shut down the following steps are run:

  1. Let session be the target XRSession object.

  2. Let device be session’s device.

  3. Set session’s ended value to true.

  4. If device’s immersive session is equal to session, set device’s immersive session to null.

  5. If device’s list of non-immersive sessions contains session, remove it from the list.

  6. If no other features of the user agent are actively using them, perform the necessary platform-specific steps to shut down the device’s tracking and rendering capabilities.

The end() method provides a way to manually shut down a session. When invoked, it MUST return a new Promise promise and run the following steps in parallel:

  1. Shut down the target XRSession object.

  2. Resolve promise.

Each XRSession has a environment blending mode value, which is a enum which MUST be set to whichever of the following values best matches the behavior of imagery rendered by the session in relation to the user’s surrounding environment.

The environmentBlendMode attribute returns the XRSession's environment blending mode

NOTE: Most Virtual Reality devices exhibit opaque blending behavior. Augmented Reality devices that use transparent optical elements frequently exhibit additive blending behavior, and Augmented Reality devices that use passthrough cameras frequently exhibit alpha-blend blending behavior.

requestFrameOfReference(type, options)

The onblur attribute is an Event handler IDL attribute for the blur event type.

The onfocus attribute is an Event handler IDL attribute for the focus event type.

The onresetpose attribute is an Event handler IDL attribute for the resetpose event type.

The onend attribute is an Event handler IDL attribute for the end event type.

The onselectstart attribute is an Event handler IDL attribute for the selectstart event type.

The onselectend attribute is an Event handler IDL attribute for the selectend event type.

The onselect attribute is an Event handler IDL attribute for the select event type.

Example of acquiring a session here.

Document what happens when we end the session

Document effects when we blur the session

Document how to poll the device pose

5.3. Animation Frames

callback XRFrameRequestCallback = void (DOMHighResTimeStamp time, XRFrame frame);

Each XRFrameRequestCallback object has a cancelled boolean initially set to false.

Each XRSession has a list of animation frame callbacks, which is initially empty, and an animation frame callback identifier, which is a number initially be zero. Each XRSession also has a processing frame boolean, which is initially be set to false.

When the requestAnimationFrame(callback) method is invoked, the user agent MUST run the following steps:

  1. Let session be the target XRSession object.

  2. Increment session’s animation frame callback identifier by one.

  3. Append callback to session’s list of animation frame callbacks, associated with session’s animation frame callback identifier’s current value.

  4. Return session’s animation frame callback identifier’s current value.

When the cancelAnimationFrame(handle) method is invoked, the user agent MUST run the following steps:

  1. Let session be the target XRSession object.

  2. Find the entry in session’s list of animation frame callbacks that is associated with the value handle.

  3. If there is such an entry, set it’s cancelled boolean to true and remove it from session’s list of animation frame callbacks.

When the user agent is to run the animation frame callbacks for an XRSession session with a timestamp now and an XRFrame frame, it MUST run the following steps:

  1. Let callbacks be a list of the entries in session’s list of animation frame callback, in the order in which they were added to the list.

  2. Set session’s list of animation frame callbacks to the empty list.

  3. Set session’s processing frame to true.

  4. For each entry in callbacks, in order:

    1. If the entry’s cancelled boolean is true, continue to the next entry.

    2. Invoke the Web IDL callback function, passing now and frame as the arguments

    3. If an exception is thrown, report the exception.

  5. Set session’s processing frame to false.

5.4. The XR Compositor

This needs to be broken up a bit more and more clearly decribe things such as the frame lifecycle.

The user agent MUST maintain an XR Compositor which handles presentation to the XRDevice and frame timing. The compositor MUST use an independent rendering context whose state is isolated from that of any WebGL contexts used as XRWebGLLayer sources to prevent the page from corrupting the compositor state or reading back content from other pages. the compositor MUST also run in separate thread or processes to decouple performance of the page from the ability to present new imagery to the user at the appropriate framerate.

The XR Compositor has a list of layer images, which is initially empty.

6. Frame Loop

6.1. XRFrame

[SecureContext, Exposed=Window] interface XRFrame {
  readonly attribute XRSession session;
  readonly attribute FrozenArray<XRView> views;

  XRDevicePose? getDevicePose(XRCoordinateSystem coordinateSystem);
  XRInputPose? getInputPose(XRInputSource inputSource, XRCoordinateSystem coordinateSystem);
};

An XRFrame provides all the values needed to render a single frame of an XR scene to the XRDevice's display. Applications can only aquire an XRFrame by calling requestAnimationFrame() on an XRSession with an XRFrameRequestCallback. When the callback is called it will be passed an XRFrame.

session

views

getDevicePose(coordinateSystem)

getInputPose(inputSource, coordinateSystem)

7. Coordinate Systems

7.1. XRCoordinateSystem

[SecureContext, Exposed=Window] interface XRCoordinateSystem : EventTarget {
  Float32Array? getTransformTo(XRCoordinateSystem other);
};

getTransformTo(other)

7.2. XRFrameOfReference

enum XRFrameOfReferenceType {
  "head-model",
  "eye-level",
  "stage",
};

dictionary XRFrameOfReferenceOptions {
  boolean disableStageEmulation = false;
  double stageEmulationHeight = 0.0;
};

[SecureContext, Exposed=Window] interface XRFrameOfReference : XRCoordinateSystem {
  readonly attribute XRStageBounds? bounds;
  readonly attribute double emulatedHeight;

  attribute EventHandler onboundschange;
};

When an XRFrameOfReference is created with a "stage" XRFrameOfReferenceType it describes a space known as a "Stage". A stage is a bounded, floor-relative play space that the user can be expected to safely be able to move within. Other XR platforms sometimes refer to this concept as "room scale" or "standing space".

Note: A stage is not intended to describe multi-room spaces, areas with uneven floor levels, or very large open areas. Future iterations of this specification may add more detailed support for tracking in those scenarios.

The origin of the stage MUST be at floor level, such that Y equals 0 at the floor, is negative below the floor level, and is positive above the floor. The origin on the X and Z axes is determined in a platform-specific manner, but in general if the user is in an enclosed space it’s ideal if the X and Z axes originate at or near the center of the room.

The stage bounds are described by a XRFrameOfReference's bounds attribute. If the frame of reference is not a stage or the stage bounds cannot be determined the bounds attribute MUST be null.

Note: When the bounds for a stage are null the user should not be required to physically move around their environment in order to interact with content. The device may still support 6DoF tracking, but it can’t be assumed that the user will know where they are relative to their environment and as such content that encourages movement beyond leaning is discouraged.

emulatedHeight

The onboundschange attribute is an Event handler IDL attribute for the boundschange event type.

7.3. XRStageBounds

[SecureContext, Exposed=Window] interface XRStageBounds {
  readonly attribute FrozenArray<XRStageBoundsPoint> geometry;
};

[SecureContext, Exposed=Window] interface XRStageBoundsPoint {
  readonly attribute double x;
  readonly attribute double z;
};

The XRStageBounds interface describes a border around a stage, known as the stage bounds which the user can be expected to safely be able to move within.

The polygonal boundary is given by the geometry array of XRStageBoundsPoints, which represents a loop of points at the edges of the safe space. The points MUST be given in a clockwise order as viewed from above, looking towards the negative end of the Y axis. The bounds originate at the floor (Y == 0) and extend infinitely high. The shape it describes MAY not be convex. The values reported are relative to the stage origin, but MAY not contain it.

The x and z attributes of an XRStageBoundsPoint describe the offset from the stage origin along the X and Z axes respectively of the point, given in meters.

Note: Content should not require the user to move beyond the stage bounds; however, if their physical surroundings allow for it, it is possible for the user to ignore the bounds resulting in position values outside of the polygon they describe. This is not an error condition and should be handled gracefully by page content.

Note: Content generally should not provide a visualization of the stage bounds, as it’s the user agent’s responsibility to ensure that safety critical information is provided to the user.

8. Views

8.1. XRView

enum XREye {
  "left",
  "right"
};

[SecureContext, Exposed=Window] interface XRView {
  readonly attribute XREye eye;
  readonly attribute Float32Array projectionMatrix;
};

An XRView describes a single view into an XR scene. Each view corresponds to a display or portion of a display used by an XR device to present imagery to the user. They are used to retrieve all the information necessary to render content that is well aligned to the view's physical output properties, including the field of view, eye offset, and other optical properties.

Many HMDs will request that content render two views, one for the left eye and one for the right, while most magic window devices will only request one view. However, no guarantee is made about the number of views any XR device uses, nor is the number of views required to be constant for the duration of an XRSession. For example: A magic window device may request two views if it is capable of stereo output, but may revert to requesting a single view for performance reasons if the stereo output mode is turned off. Similarly, HMDs may request more than two views to facilitate a wide field of view or displays of different pixel density. Views may cover overlapping regions of the user’s vision and the order is not guaranteed.

The eye attribute describes which eye this view is expected to be shown to. This attribute’s primary purpose is to ensure that prerendered stereo content can present the correct portion of the content to the correct eye. If the view does not have an intrinsicly associated eye (the display is monoscopic, for example) this attribute MUST be set to "left".

The projectionMatrix attribute provides a matrix describing the projection to be used for the view’s rendering. It is strongly recommended that applications use this matrix without modification. Failure to use the provided projection matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort.

8.2. XRViewport

[SecureContext, Exposed=Window] interface XRViewport {
  readonly attribute long x;
  readonly attribute long y;
  readonly attribute long width;
  readonly attribute long height;
};

An XRViewport object describes a viewport, or rectangular region, of a graphics surface. The x and y attributes define an offset from the surface origin and the width and height attributes define the rectangular dimensions of the viewport.

The exact interpretation of the viewport values depends on the conventions of the graphics API the viewport is associated with:

The following code sets the WebGL viewport for an XRWebGLLayer's framebuffer using an XRViewport retrieved from an XRView.
xrSession.requestAnimationFrame((time, xrFrame) => {
  let xrView = xrFrame.views[0];
  let xrViewport = xrWebGLLayer.getViewport(xrView);

  gl.bindFramebuffer(xrWebGLLayer.framebuffer);
  gl.viewport(xrViewport.x, xrViewport.y, xrViewport.width, xrViewport.height);

  // WebGL draw calls will now be rendered into the appropriate viewport.
});

9. Pose

9.1. Matrices

WebXR provides various transforms in the form of matrices. WebXR matrices are always 4x4 and given as 16 element Float32Arrays in column major order. They may be passed directly to WebGL’s uniformMatrix4fv function, used to create an equivalent DOMMatrix, or used with a variety of third party math libraries.

Translations specified by WebXR matrices are always given in meters.

9.2. XRDevicePose

[SecureContext, Exposed=Window] interface XRDevicePose {
  readonly attribute Float32Array poseModelMatrix;

  Float32Array getViewMatrix(XRView view);
};

An XRDevicePose describes the position and orientation of an XRDevice relative to the XRCoordinateSystem it was queried with. It also describes the view and projection matrices that should be used by the application to render a frame of an XR scene.

poseModelMatrix

The getViewMatrix(view) method returns a matrix describing the view transform to be used when rendering the passed XRView. The matrices represent the inverse of the model matrix of the associated viewpoint.

10. Input

10.1. XRInputSource

enum XRHandedness {
  "",
  "left",
  "right"
};

enum XRTargetRayMode {
  "gazing",
  "pointing",
  "tapping"
};

interface XRInputSource {
  readonly attribute XRHandedness handedness;
  readonly attribute XRTargetRayMode targetRayMode;
};

Each XRInputSource SHOULD define a primary action. The primary action is a platform-specific action that, when engaged, produces selectstart, selectend, and select events. Examples of possible primary actions are pressing a trigger, touchpad, or button, speaking a command, or making a hand gesture. If the platform guidelines define a recommended primary input then it should be used as the primary action, otherwise the user agent is free to select one.

10.2. XRInputPose

interface XRInputPose {
  readonly attribute boolean emulatedPosition;
  readonly attribute Float32Array targetRayMatrix;
  readonly attribute Float32Array? gripMatrix;
};

11. Layers

11.1. XRLayer

[SecureContext, Exposed=Window] interface XRLayer {};

An XRLayer defines a source of bitmap images and a description of how the image is to be rendered in the XRDevice. Initially only one type of layer, the XRWebGLLayer, is defined but future revisions of the spec may extend the available layer types.

11.2. XRWebGLLayer

Each XRSession MUST identify a native WebGL framebuffer resolution, which is the pixel resolution of a WebGL framebuffer required to match the physical pixel resolution of the XRDevice.

The native WebGL framebuffer resolution is detemined by running the following steps:

  1. Let session be the target XRSession.

  2. If session’s immersive value is true, set the native WebGL framebuffer resolution to the resolution required to have a 1:1 ratio between the pixels of a framebuffer large enough to contain all of the session’s XRViews and the physical screen pixels in the area of the display under the highest magnification and abort these steps. If no method exists to determine the native resolution as described, the recommended WebGL framebuffer resolution MAY be used.

  3. If session’s immersive value is false, set the native WebGL framebuffer resolution to the size of the session’s outputContext's canvas in physical display pixels and reevaluate these steps every time the size of the canvas changes.

Additionally, the XRSession MUST identify a recommended WebGL framebuffer resolution, which represents a best estimate of the WebGL framebuffer resolution large enough to contain all of the session’s XRViews that provides an average application a good balance between performance and quality. It MAY be smaller than, larger than, or equal to the native WebGL framebuffer resolution.

NOTE: The user agent is free to use and method of it’s choosing to estimate the recommended WebGL framebuffer resolution. If there are platform-specific methods for querying a recommended size it is recommended that they be used, but not required.

typedef (WebGLRenderingContext or
         WebGL2RenderingContext) XRWebGLRenderingContext;

dictionary XRWebGLLayerInit {
  boolean antialias = true;
  boolean depth = false;
  boolean stencil = false;
  boolean alpha = true;
  boolean multiview = false;
  double framebufferScaleFactor = 1.0;
};

[SecureContext, Exposed=Window, Constructor(XRSession session,
             XRWebGLRenderingContext context,
             optional XRWebGLLayerInit layerInit)]
interface XRWebGLLayer : XRLayer {
  // Attributes
  readonly attribute XRWebGLRenderingContext context;

  readonly attribute boolean antialias;
  readonly attribute boolean depth;
  readonly attribute boolean stencil;
  readonly attribute boolean alpha;
  readonly attribute boolean multiview;

  readonly attribute WebGLFramebuffer framebuffer;
  readonly attribute unsigned long framebufferWidth;
  readonly attribute unsigned long framebufferHeight;

  // Methods
  XRViewport? getViewport(XRView view);
  void requestViewportScaling(double viewportScaleFactor);

  // Static Methods
  static double getNativeFramebufferScaleFactor(XRSession session);
};

The XRWebGLLayer(session, context, layerInit) constructor MUST perform the following steps when invoked:

  1. Let layer be a new XRWebGLLayer

  2. If session’s ended value is true, throw an InvalidStateError and abort these steps.

  3. If context is lost, throw an InvalidStateError and abort these steps.

  4. If context’s compatible XR device does not equal session’s device, throw an InvalidStateError and abort these steps.

  5. Initialize layer’s context to context.

  6. Initialize layer’s antialias to layerInit’s antialias value.

  7. Initialize layer’s depth to layerInit’s depth value.

  8. Initialize layer’s stencil to layerInit’s stencil value.

  9. Initialize layer’s alpha to layerInit’s alpha value.

  10. If context supports multiview and layerInit’s multiview value is true, initialize layer’s multiview to true.

  11. Else initialize layer’s multiview to false.

  12. Initialize layer’s framebuffer to a new opaque WebGLFramebuffer created with context.

  13. Initialize the layer’s swap chain.

  14. If layer’s swap chain was unable to be created for any reason, throw an OperationError and abort these steps.

  15. Return layer.

The framebufferWidth and framebufferHeight attributes return the width and height of the swap chain's backbuffer, respectively.

getViewport() queries the XRViewport the given XRView should use when rendering to the layer.

The getViewport(view) method, when invoked, MUST run the following steps:

  1. Let view be the target XRView.

  2. If layer was created with a different XRSession than the one that produced view return null

  3. ???

requestViewportScaling(viewportScaleFactor)

The getNativeFramebufferScaleFactor(session) method, when invoked, MUST run the following steps:

  1. Let session be the target XRSession.

  2. If session’s ended value is true, return 0.0 and abort these steps.

  3. Return the value that the session’s recommended WebGL framebuffer resolution must be multiplied by to yield the session’s native WebGL framebuffer resolution.

Document what it means when a context supports multiview.

Document what an opaque framebuffer is.

Document the creation of a swap chain.

Need an example snippet of setting up and using an XRWebGLLayer.

11.3. WebGL Context Compatiblity

partial dictionary WebGLContextAttributes {
    XRDevice compatibleXRDevice = null;
};

partial interface mixin WebGLRenderingContextBase {
    Promise<void> setCompatibleXRDevice(XRDevice device);
};

When a user agent implements this specification it MUST set a compatible XR device, initially set to null, on every WebGLRenderingContextBase.

In order for a WebGL context to be used as a source for XR imagery it must be created on a compatible graphics adapter for the XRDevice. What is considered a compatible graphics adapter is platform dependent, but is understood to mean that the graphics adapter can supply imagery to the XRDevice without undue latency. If a WebGL context was not already created on the compatible graphics adapter, it typically must be re-created on the adapter in question before it can be used with an XRWebGLLayer. Once the compatible XR device is set, the context can be used with layers for any XRSession requested from that XRDevice.

Note: On an XR platform with a single GPU, it can safely be assumed that the GPU is compatible with the XRDevices advertised by the platform, and thus any hardware accelerated WebGL contexts are compatible as well. On PCs with both an integrated and discreet GPU the discreet GPU is often considered the compatible graphics adapter since it generally a higher performance chip. On desktop PCs with multiple graphics adapters installed, the one with XRDevice physically connected to it is likely to be considered the compatible graphics adapter.

The compatible XR device can be set either at context creation time or after context creation, potentially incurring a context loss. To set the compatible XR device at context creation time, an XRDevice is supplied to the compatibleXRDevice context creation attibute dictionary when requesting a WebGL context.

When the getContext() method is invoked with a WebGLContextAttributes dictionary that contains a non-null compatibleXRDevice, run the following steps:

  1. Let attributes be the WebGLContextAttributes passed to the function.

  2. Let device be the attributescompatibleXRDevice.

  3. Create the WebGL context as usual, ensuring it is created on a compatible graphics adapter for device.

  4. Let context be the newly created WebGL context.

  5. Set context’s compatible XR device to device.

  6. Return context.

The following code creates a WebGL context that is compatible with an XRDevice and then uses it to create an XRWebGLLayer.
function onXRSessionStarted(xrSession) {
  let glCanvas = document.createElement("canvas");
  let gl = glCanvas.getContext("webgl", { compatibleXRDevice: xrSession.device });

  loadWebGLResources();

  xrSession.baseLayer = new XRWebGLLayer(xrSession, gl);
}

To set the compatible XR device after the context has been created, the setCompatibleXRDevice() method is used.

When the setCompatibleXRDevice(device) method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:

  1. Let context be the target WebGLRenderingContextBase object.

  2. If context’s WebGL context lost flag is set, reject promise with an InvalidStateError and abort these abort these steps.

  3. If context’s compatible XR device is device, resolve promise and abort these steps.

  4. If context was created on a compatible graphics adapter for device:

    1. Set context’s compatible XR device to device.

    2. Resolve promise and abort these steps.

  5. Queue a task to perform the following steps:

    1. Force context to be lost and handle the context loss as described by the WebGL specification.

    2. If the canceled flag of the "webglcontextlost" event fired in the previous step was not set, reject promise with an AbortError and abort these steps.

    3. Restore the context on a compatible graphics adapter for device.

    4. Set context’s compatible XR device to device.

    5. Resolve promise.

Additionally, when any WebGL context is lost run the following steps prior to firing the "webglcontextlost" event:

  1. Set the context’s compatible XR device to null.

The following code creates an XRWebGLLayer from a pre-existing WebGL context.
let glCanvas = document.createElement("canvas");
let gl = glCanvas.getContext("webgl");

loadWebGLResources();

glCanvas.addEventListener("webglcontextlost", (event) => {
  // Indicates that the WebGL context can be restored.
  event.canceled = true;
});

glCanvas.addEventListener("webglcontextrestored", (event) => {
  // WebGL resources need to be re-created after a context loss.
  loadWebGLResources();
});

function onXRSessionStarted(xrSession) {
  // Make sure the canvas context we want to use is compatible with the device.
  // May trigger a context loss.
  return gl.setCompatibleXRDevice(xrSession.device).then(() => {
    xrSession.baseLayer = new XRWebGLLayer(xrSession, gl);
  });
}

12. Canvas Rendering Context

12.1. XRPresentationContext

[SecureContext, Exposed=Window] interface XRPresentationContext {
  readonly attribute HTMLCanvasElement canvas;
};

canvas

13. Events

13.1. XRSessionEvent

[SecureContext, Exposed=Window, Constructor(DOMString type, XRSessionEventInit eventInitDict)]
interface XRSessionEvent : Event {
  readonly attribute XRSession session;
};

dictionary XRSessionEventInit : EventInit {
  required XRSession session;
};

session The XRSession associated with this event.

13.2. XRInputSourceEvent

[SecureContext, Exposed=Window, Constructor(DOMString type, XRInputSourceEventInit eventInitDict)]
interface XRInputSourceEvent : Event {
  readonly attribute XRFrame frame;
  readonly attribute XRInputSource inputSource;
};

dictionary XRInputSourceEventInit : EventInit {
  required XRFrame frame;
  required XRInputSource inputSource;
};

frame An XRFrame that corresponds with the time that the event took place. This frame’s views array MUST be empty.

inputSource The XRInputSource that generated this event.

13.3. XRCoordinateSystemEvent

[SecureContext, Exposed=Window, Constructor(DOMString type, XRCoordinateSystemEventInit eventInitDict)]
interface XRCoordinateSystemEvent : Event {
  readonly attribute XRCoordinateSystem coordinateSystem;
};

dictionary XRCoordinateSystemEventInit : EventInit {
  required XRCoordinateSystem coordinateSystem;
};

coordinateSystem The XRCoordinateSystem associated with this event.

13.4. Event Types

The user agent MUST provide the following new events. Registration for and firing of the events must follow the usual behavior of DOM4 Events.

The user agent MAY fire a devicechange event on the XR object to indicate that the availability of XRDevices has been changed. The event MUST be of type Event.

A user agent MAY dispatch a blur event on an XRSession to indicate that presentation to the XRSession by the page has been suspended by the user agent, OS, or XR hardware. While an XRSession is blurred it remains active but it may have its frame production throttled. This is to prevent tracking while the user interacts with potentially sensitive UI. For example: The user agent SHOULD blur the presenting application when the user is typing a URL into the browser with a virtual keyboard, otherwise the presenting page may be able to guess the URL the user is entering by tracking their head motions. The event MUST be of type XRSessionEvent.

A user agent MAY dispatch a focus event on an XRSession to indicate that presentation to the XRSession by the page has resumed after being suspended. The event MUST be of type XRSessionEvent.

A user agent MUST dispatch a resetpose event on an XRSession when the system resets the XRDevice's position or orientation. The event MUST be of type XRSessionEvent.

A user agent MUST dispatch a end event on an XRSession when the session ends, either by the application or the user agent. The event MUST be of type XRSessionEvent.

A user agent MUST dispatch a selectstart event on an XRSession when one of its XRInputSources begins its primary action. The event MUST be of type XRInputSourceEvent.

A user agent MUST dispatch a selectend event on an XRSession when one of its XRInputSources ends its primary action or when an XRInputSource that has begun a primary action is disconnected. The event MUST be of type XRInputSourceEvent.

A user agent MUST dispatch a select event on an XRSession when one of its XRInputSources has fully completed a primary action. The event MUST be of type XRInputSourceEvent.

A user agent MUST dispatch a boundschange event on an XRFrameOfReference when the stage bounds change. This includes changes to the geometry point array or the bounds attribute changing to or from null. The event MUST be of type XRCoordinateSystemEvent.

14. Integrations

14.1. Feature Policy

This specification defines a feature that controls whether the xr attribute is exposed on the Navigator object.

The feature name for this feature is "xr".

The default allowlist for this feature is ["self"].

15. Acknowledgements

The following individuals have contributed to the design of the WebXR Device API specification:

And a special thanks to Vladimir Vukicevic (Unity) for kick-starting this whole adventure!

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[PROMISES-GUIDE]
Domenic Denicola. Writing Promise-Using Specifications. 16 February 2016. Finding of the W3C TAG. URL: https://www.w3.org/2001/tag/doc/promises-guide
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

Informative References

[ORIENTATION-SENSOR]
Mikhail Pozdnyakov; et al. Orientation Sensor. 20 March 2018. CR. URL: https://www.w3.org/TR/orientation-sensor/

IDL Index

[SecureContext, Exposed=Window] interface XR : EventTarget {
  // Methods
  Promise<XRDevice?> requestDevice();

  // Events
  attribute EventHandler ondevicechange;
};

[SecureContext]
partial interface Navigator {
  [SameObject] readonly attribute XR xr;
};

[SecureContext, Exposed=Window] interface XRDevice {
  // Methods
  Promise<void> supportsSession(optional XRSessionCreationOptions options);
  Promise<XRSession> requestSession(optional XRSessionCreationOptions options);
};

dictionary XRSessionCreationOptions {
  boolean immersive = false;
  XRPresentationContext outputContext;
};

enum XREnvironmentBlendMode {
  "opaque",
  "additive",
  "alpha-blend",
};

[SecureContext, Exposed=Window] interface XRSession : EventTarget {
  // Attributes
  readonly attribute XRDevice device;
  readonly attribute boolean immersive;
  readonly attribute XRPresentationContext outputContext;
  readonly attribute XREnvironmentBlendMode environmentBlendMode;

  attribute double depthNear;
  attribute double depthFar;
  attribute XRLayer baseLayer;

  // Methods
  Promise<XRFrameOfReference> requestFrameOfReference(XRFrameOfReferenceType type, optional XRFrameOfReferenceOptions options);

  FrozenArray<XRInputSource> getInputSources();

  long requestAnimationFrame(XRFrameRequestCallback callback);
  void cancelAnimationFrame(long handle);

  Promise<void> end();

  // Events
  attribute EventHandler onblur;
  attribute EventHandler onfocus;
  attribute EventHandler onresetpose;
  attribute EventHandler onend;
  attribute EventHandler onselect;
  attribute EventHandler onselectstart;
  attribute EventHandler onselectend;
};

callback XRFrameRequestCallback = void (DOMHighResTimeStamp time, XRFrame frame);

[SecureContext, Exposed=Window] interface XRFrame {
  readonly attribute XRSession session;
  readonly attribute FrozenArray<XRView> views;

  XRDevicePose? getDevicePose(XRCoordinateSystem coordinateSystem);
  XRInputPose? getInputPose(XRInputSource inputSource, XRCoordinateSystem coordinateSystem);
};

[SecureContext, Exposed=Window] interface XRCoordinateSystem : EventTarget {
  Float32Array? getTransformTo(XRCoordinateSystem other);
};

enum XRFrameOfReferenceType {
  "head-model",
  "eye-level",
  "stage",
};

dictionary XRFrameOfReferenceOptions {
  boolean disableStageEmulation = false;
  double stageEmulationHeight = 0.0;
};

[SecureContext, Exposed=Window] interface XRFrameOfReference : XRCoordinateSystem {
  readonly attribute XRStageBounds? bounds;
  readonly attribute double emulatedHeight;

  attribute EventHandler onboundschange;
};

[SecureContext, Exposed=Window] interface XRStageBounds {
  readonly attribute FrozenArray<XRStageBoundsPoint> geometry;
};

[SecureContext, Exposed=Window] interface XRStageBoundsPoint {
  readonly attribute double x;
  readonly attribute double z;
};

enum XREye {
  "left",
  "right"
};

[SecureContext, Exposed=Window] interface XRView {
  readonly attribute XREye eye;
  readonly attribute Float32Array projectionMatrix;
};

[SecureContext, Exposed=Window] interface XRViewport {
  readonly attribute long x;
  readonly attribute long y;
  readonly attribute long width;
  readonly attribute long height;
};

[SecureContext, Exposed=Window] interface XRDevicePose {
  readonly attribute Float32Array poseModelMatrix;

  Float32Array getViewMatrix(XRView view);
};

enum XRHandedness {
  "",
  "left",
  "right"
};

enum XRTargetRayMode {
  "gazing",
  "pointing",
  "tapping"
};

interface XRInputSource {
  readonly attribute XRHandedness handedness;
  readonly attribute XRTargetRayMode targetRayMode;
};

interface XRInputPose {
  readonly attribute boolean emulatedPosition;
  readonly attribute Float32Array targetRayMatrix;
  readonly attribute Float32Array? gripMatrix;
};

[SecureContext, Exposed=Window] interface XRLayer {};

typedef (WebGLRenderingContext or
         WebGL2RenderingContext) XRWebGLRenderingContext;

dictionary XRWebGLLayerInit {
  boolean antialias = true;
  boolean depth = false;
  boolean stencil = false;
  boolean alpha = true;
  boolean multiview = false;
  double framebufferScaleFactor = 1.0;
};

[SecureContext, Exposed=Window, Constructor(XRSession session,
             XRWebGLRenderingContext context,
             optional XRWebGLLayerInit layerInit)]
interface XRWebGLLayer : XRLayer {
  // Attributes
  readonly attribute XRWebGLRenderingContext context;

  readonly attribute boolean antialias;
  readonly attribute boolean depth;
  readonly attribute boolean stencil;
  readonly attribute boolean alpha;
  readonly attribute boolean multiview;

  readonly attribute WebGLFramebuffer framebuffer;
  readonly attribute unsigned long framebufferWidth;
  readonly attribute unsigned long framebufferHeight;

  // Methods
  XRViewport? getViewport(XRView view);
  void requestViewportScaling(double viewportScaleFactor);

  // Static Methods
  static double getNativeFramebufferScaleFactor(XRSession session);
};

partial dictionary WebGLContextAttributes {
    XRDevice compatibleXRDevice = null;
};

partial interface mixin WebGLRenderingContextBase {
    Promise<void> setCompatibleXRDevice(XRDevice device);
};

[SecureContext, Exposed=Window] interface XRPresentationContext {
  readonly attribute HTMLCanvasElement canvas;
};

[SecureContext, Exposed=Window, Constructor(DOMString type, XRSessionEventInit eventInitDict)]
interface XRSessionEvent : Event {
  readonly attribute XRSession session;
};

dictionary XRSessionEventInit : EventInit {
  required XRSession session;
};

[SecureContext, Exposed=Window, Constructor(DOMString type, XRInputSourceEventInit eventInitDict)]
interface XRInputSourceEvent : Event {
  readonly attribute XRFrame frame;
  readonly attribute XRInputSource inputSource;
};

dictionary XRInputSourceEventInit : EventInit {
  required XRFrame frame;
  required XRInputSource inputSource;
};

[SecureContext, Exposed=Window, Constructor(DOMString type, XRCoordinateSystemEventInit eventInitDict)]
interface XRCoordinateSystemEvent : Event {
  readonly attribute XRCoordinateSystem coordinateSystem;
};

dictionary XRCoordinateSystemEventInit : EventInit {
  required XRCoordinateSystem coordinateSystem;
};

Issues Index

Discuss use of sensor activity as a possible fingerprinting vector.
Take permissions into account when calling requestDevice.
Document restrictions and capabilities of an immersive session
Example of acquiring a session here.
Document what happens when we end the session
Document effects when we blur the session
Document how to poll the device pose
This needs to be broken up a bit more and more clearly decribe things such as the frame lifecycle.
Document what it means when a context supports multiview.
Document what an opaque framebuffer is.
Document the creation of a swap chain.
Need an example snippet of setting up and using an XRWebGLLayer.