UNSTABLE API
The version of the WebXR Device API represented in this document is incomplete and may change at any time.
While this specification is under development some concepts may be represented better by the WebXR Device API Explainer.
1. Introduction
Hardware that enables Virtual Reality (VR) and Augmented Reality (AR) applications requires high-precision, low-latency interfaces to deliver an acceptable experience. Other interfaces, such as the RelativeOrientationSensor and AbsoluteOrientationSensor, can be repurposed to surface input from these devices to polyfill the WebXR Device API. The WebXR Device API provides purpose-built interfaces to VR/AR hardware to allow developers to build compelling, comfortable immersive experiences.
2. Terminology
This document uses the acronym XR throughout to refer to the spectrum of hardware, applications, and techniques used for Virtual Reality, Augmented Reality, and other related technologies. Examples include, but are not limited to:
-
Head mounted displays, whether they are opaque, transparent, or utilize video passthrough
-
Mobile devices with positional tracking
-
Fixed displays with head tracking capabilities
The important commonality between them being that they offer some degree of spatial tracking with which to simulate a view of virtual content.
Terms like "XR Device", "XR Application", etc. are generally understood to apply to any of the above. Portions of this document that only apply to a subset of these devices will indicate so as appropriate.
The terms 3DoF and 6DoF are used throughout this document to describe the tracking capabilities of XR devices.
-
A 3DoF device, short for "Three Degrees of Freedom", is one that can only track rotational movement. This is common in devices which rely exclusively on accelerometer and gyroscope readings to provide tracking. 3DoF devices do not respond translational movements from the user, though they may employ algorithms to estimate translational changes based on modeling of the neck or arms.
-
A 6DoF device, short for "Six Degrees of Freedom", is one that can track both rotation and translation, enabling for precise 1:1 tracking in space. This typically requires some level of understanding of the user’s environment. That environmental understanding may be achived via inside-out tracking, where sensors on the tracked device itself (such as cameras or depth sensors) are used to determine the device’s position, or outside-in tracking, where external devices placed in the user’s environment (like a camera or light emmiting device) provides a stable point of reference against which the XR device can determine it’s position.
3. Security, Privacy, and Comfort Considerations
The WebXR Device API provides powerful new features which bring with them several unique privacy, security, and comfort risks that user agents must take steps to mitigate.
3.1. Gaze Tracking
While the API does not yet expose eye tracking capabilites a lot can be inferred about where the user is looking by tracking the orientation of their head. This is especially true of XR devices that have limited input capabilities, such as Google Cardboard, which frequently require users to control a "gaze cursor" with their head orientation. This means that it may be possible for a malicious page to infer what a user is typing on a virtual keyboard or how they are interacting with a virtual UI based solely on monitoring their head movements. For example: if not prevented from doing so a page could estimate what URL a user is entering into the user agent’s URL bar.
To prevent this risk the user agent MUST blur all sessions when the users is interacting with sensitive, trusted UI such as URL bars or system dialogs. Additionally, to prevent a malicious page from being able to monitor input on a other pages the user agent MUST blur all sessions on non-focused pages.
3.2. Trusted Environment
If the virtual environment does not consistently track the user’s head motion with low latency and at a high frame rate the user may become disoriented or physically ill. Since it is impossible to force pages to produce consistently performant and correct content the user agent MUST provide a tracked, trusted environment and an XR Compositor which runs asynchronously from page content. The compositor is responsible for compositing the trusted and untrusted content. If content is not performant, does not submit frames, or terminates unexpectedly the user agent should be able to continue presenting a responsive, trusted UI.
Additionally, page content has the ability to make users uncomfortable in ways not related to performance. Badly applied tracking, strobing colors, and content intended to offend, frighten, or intimidate are examples of content which may cause the user to want to quickly exit the XR experience. Removing the XR device in these cases may not always be a fast or practical option. To accomodate this the user agent SHOULD provide users with an action, such as pressing a reserved hardware button or performing a gesture, that escapes out of WebXR content and displays the user agent’s trusted UI.
When navigating between pages in XR the user agent should display trusted UI elements informing the user of the security information of the site they are navigating to which is normally presented by the 2D UI, such as the URL and encryption status.
3.3. Context Isolation
The trusted UI must be drawn by an independent rendering context whose state is isolated from any rendering contexts used by the page. (For example, any WebGL rendering contexts.) This is to prevent the page from corrupting the state of the trusted UI’s context, which may prevent it from properly rendering a tracked environment. It also prevents the possibility of the page being able to capture imagery from the trusted UI, which could lead to private information being leaked.
Also, to prevent CORS-related vulnerabilities each page will see a new instance of objects returned by the API, such as XRDevice and XRSession. Attributes such as the context set by one page must not be able to be read by another. Similarly, methods invoked on the API MUST NOT cause an observable state change on other pages. For example: No method will be exposed that enables a system-level orientation reset, as this could be called repeatedly by a malicious page to prevent other pages from tracking properly. The user agent MUST, however, respect system-level orientation resets triggered by a user gesture or system menu.
3.4. Fingerprinting
Given that the API describes hardware available to the user and its capabilities it will inevitably provide additional surface area for fingerprinting. While it’s impossible to completely avoid this, steps can be taken to mitigate the issue. This spec limits reporting of available hardware to only a single device at a time, which prevents using the rare cases of multiple headsets being connected as a fingerprinting signal. Also, the devices that are reported have no string identifiers and expose very little information about the devices capabilities until an XRSession is created, which may only be triggered via user activation in the most sensitive case.
Discuss use of sensor activity as a possible fingerprinting vector.
4. Device Enumeration
4.1. XR
[SecureContext ,Exposed =Window ]interface :XR EventTarget { // MethodsPromise <XRDevice ?>requestDevice (); // Eventsattribute EventHandler ondevicechange ; }; [SecureContext ]partial interface Navigator { [SameObject ]readonly attribute XR xr ; };
The xr object is the entry point to the API, used to query for XRDevices available to the user agent. It has a list of XR devices, which MUST be initially empty, and a default device which MUST be initially null.
The user agent MUST be able to enumerate XR devices attached to the system, at which time each available device is placed in the list of XR devices. Subsequent algorithms requesting enumeration MAY reuse the cached list of XR devices. Enumerating the devices should not initialize device tracking. After the first enumeration the user agent SHOULD begin monitoring device connection and disconnection, adding connected devices to the list of XR devices and removing disconnected devices.
Each time the list of XR devices changes the user agent should select a default XR device by running the following steps:
-
Let oldDefaultDevice be the default device.
-
If the list of XR devices is empty, set the default device to
null. -
If the list of XR devices contains one device set the default device to that device.
-
If the list of XR devices contains multiple devices set the default device to a device of the user agent’s choosing.
-
If this is not the first time devices have been enumerated and oldDefaultDevice does not equal the default device queue a task that fires a simple event named
devicechangeon theXRobject.
NOTE: The user agent is allowed to use any criteria it wishes to select a default XR device when the list of XR devices contains multiple devices. For example, the user agent may always select the first item in the list, or provide settings UI that allows users to manage device priority. Ideally the algorithm used to select the default device is stable and will result in the same device being selected across multiple browsing sessions.
The page can request a device by calling the requestDevice() method on the xr object. When invoked it MUST return a new Promise promise and run the following steps in parallel:
-
If the list of XR devices is empty, reject promise with a
NotFoundErrorand abort these steps. -
Select a default XR device from the list of XR devices.
-
Resolve promise with the default device.
Calling requestDevice() MUST NOT trigger device-selection UI as this would cause many sites to display XR-specific dialogs early in the document lifecycle without user activation.
Take permissions into account when calling requestDevice.
The ondevicechange attribute is an Event handler IDL attribute for the devicechange event type.
XRDevice.
navigator. xr. requestDevice(). then( device=> { // Resolves if an XRDevice is available. onXRAvailable( device); }). catch ( error=> { // An error occurred while requesting an XRDevice or none is available. console. error( 'Unable to retrieve an XR device: ' , error); });
4.2. XRDevice
[SecureContext ,Exposed =Window ]interface { // MethodsXRDevice Promise <void >supportsSession (optional XRSessionCreationOptions );options Promise <XRSession >requestSession (optional XRSessionCreationOptions ); };options
An XRDevice represents a physical unit of XR hardware that can present imagery to the user somehow. On desktop devices this may take the form of a headset peripheral; on mobile devices it may represent the device itself in conjunction with a viewer harness. It may also represent devices without the ability to present content in stereo but with advanced (6DoF) tracking capabilities.
Each XRDevice has a supports immersive value, which is a boolean which MUST be set to true if the device can support immersive sessions and false if it cannot.
Each XRDevice has an active immersive session, which MUST be initially null, and a list of non-immersive sessions, which MUST be initially empty.
When the supportsSession(options) method is invoked, it MUST return a new Promise promise and run the following steps in parallel:
-
Let device be the target
XRDeviceobject. -
If the options are not supported by the device device, reject promise with
null. -
Else resolve promise with
null.
When the requestSession(options) method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Let device be the target
XRDeviceobject. -
If the options are not supported by the device device, reject promise with a
NotSupportedErrorand abort these steps. -
Let immersive be the
immersiveattribute of the options argument. -
If immersive is
trueand device’s active immersive session is notnull, reject promise with anInvalidStateErrorand abort these steps. -
If immersive is
trueand the algorithm is not triggered by user activation, reject promise with aSecurityErrorand abort these steps. -
Let session be a new
XRSession. -
Initialize the session session with the session description given by options.
-
If immersive is
trueset the device’s active immersive session to session. -
Else append session to device’s list of non-immersive sessions.
-
Resolve promise with session.
XRSession.
let xrSession; xrDevice. requestSession({ immersive: true }). then(( session) => { xrSession= session; });
5. Session
5.1. XRSessionCreationOptions
dictionary {XRSessionCreationOptions boolean =immersive false ;XRPresentationContext ; };outputContext
The XRSessionCreationOptions dictionary provides a session description, indicating the desired properties of a session to be returned from requestSession().
To determine if an XRSessionCreationOptions options is supported by the device device run the following steps:
-
Let immersive be options.
immersive. -
If immersive is
trueand device’s supports immersive boolean isfalsereturnfalse. -
If immersive is
falseand options.outputContextisnull, returnfalse
A session is considered to be an immersive session if it’s output is displayed to the user in a way that makes the user feel the content is present in the same space with them, shown at the proper scale. Sessions are considered non-immersive (sometimes referred to as inline) if their output is displayed as an element in an HTML document.
NOTE: Content shown as part of an HTML document is always considered non-immersive even if headtracking is taken into account, the content is displayed in stereo, or the document is displayed in a headset.
Document restrictions and capabilities of an immersive session
5.2. XRSession
enum {XREnvironmentBlendMode "opaque" ,"additive" ,"alpha-blend" , }; [SecureContext ,Exposed =Window ]interface :XRSession EventTarget { // Attributesreadonly attribute XRDevice ;device readonly attribute boolean ;immersive readonly attribute XRPresentationContext ;outputContext readonly attribute XREnvironmentBlendMode environmentBlendMode ;attribute double ;depthNear attribute double ;depthFar attribute XRLayer ; // MethodsbaseLayer Promise <XRFrameOfReference >requestFrameOfReference (XRFrameOfReferenceType ,type optional XRFrameOfReferenceOptions );options FrozenArray <XRInputSource >getInputSources ();long requestAnimationFrame (XRFrameRequestCallback );callback void cancelAnimationFrame (long );handle Promise <void >end (); // Eventsattribute EventHandler onblur ;attribute EventHandler onfocus ;attribute EventHandler onresetpose ;attribute EventHandler onend ;attribute EventHandler onselect ;attribute EventHandler onselectstart ;attribute EventHandler onselectend ; };
Any interaction with XR hardware outside of enumeration is done via an XRSession object, which can only be retrieved by calling requestSession() on an XRDevice. Once a session has been successfully acquired it can be used to poll the device pose, query information about the user’s environment and, present imagery to the user.
The user agent, when possible, SHOULD NOT initialize device tracking or rendering capabilities until an XRSession has been acquired. This is to prevent unwanted side effects of engaging the XR systems when they’re not actively being used, such as increased battery usage or related utility applications from appearing when first navigating to a page that only wants to test for the presence of XR hardware in order to advertise XR features. Not all XR platforms offer ways to detect the hardware’s presence without initializing tracking, however, so this is only a strong recommendation.
When an XRSession is created, the user agent MUST initialize the session by running the following steps:
-
Let session be the newly created
XRSessionobject. -
Let device be the
XRDeviceobject that requested session’s creation. -
Let options be the
XRSessionCreationOptionspassed torequestSession(). -
Initialize session’s
deviceto device. -
Initialize session’s
outputContextto optionsoutputContextvalue. -
Initialize session’s
depthNearto0.1. -
Initialize session’s
depthFarto1000.0. -
Initialize session’s
baseLayertonull. -
If no other features of the user agent have done so already, perform the necessary platform-specific steps to initialize the device’s tracking and rendering capabilities.
A number of diffrent circumstances may shut down the session, which is permanent and irreversable. Once a session has been shut down the only way to access the XRDevice's tracking or rendering capabilities again is to request a new session. Each XRSession has an ended boolean, initially set to false, that indicates if it has been shut down.
When an XRSession is shut down the following steps are run:
-
Let session be the target
XRSessionobject. -
Let device be session’s
device. -
Set session’s ended value to
true. -
If device’s active immersive session is equal to session, set device’s active immersive session to
null. -
If device’s list of non-immersive sessions contains session, remove it from the list.
-
If no other features of the user agent are actively using them, perform the necessary platform-specific steps to shut down the device’s tracking and rendering capabilities.
The end() method provides a way to manually shut down a session. When invoked, it MUST return a new Promise promise and run the following steps in parallel:
When the requestFrameOfReference(type, options) method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Let session be the target
XRSessionobject. -
Let frameOfRef be a new
XRFrameOfReference. -
Initialize the frame of reference frameOfRef with session, type, and options.
-
Resolve promise with frameOfRef.
When the getInputSources() method is invoked, the user agent MUST run the following steps:
-
Return the current list of active input sources.
Each XRSession has a environment blending mode value, which is a enum which MUST be set to whichever of the following values best matches the behavior of imagery rendered by the session in relation to the user’s surrounding environment.
-
A blend mode of
opaqueindicates that the user’s surrounding environment is not visible at all. Alpha values in thebaseLayerwill be ignored, with the compositor treating all alpha values as 1.0. -
A blend mode of
additiveindicates that the user’s surrounding environment is visible and thebaseLayerwill be shown additively against it. Alpha values in thebaseLayerwill be ignored, with the compositor treating all alpha values as 1.0. When this blend mode is in use black pixels will appear fully transparent, and there is no way to make a pixel appear fully opaque. -
A blend mode of
alpha-blendindicates that the user’s surrounding environment is visible and thebaseLayerwill be blended with it according to the alpha values of each pixel. Pixels with an alpha value of 1.0 will be fully opaque and pixels with an alpha value of 0.0 will be fully transparent.
The environmentBlendMode attribute returns the XRSession's environment blending mode
NOTE: Most Virtual Reality devices exhibit opaque blending behavior. Augmented Reality devices that use transparent optical elements frequently exhibit additive blending behavior, and Augmented Reality devices that use passthrough cameras frequently exhibit alpha-blend blending behavior.
The onblur attribute is an Event handler IDL attribute for the blur event type.
The onfocus attribute is an Event handler IDL attribute for the focus event type.
The onresetpose attribute is an Event handler IDL attribute for the resetpose event type.
The onend attribute is an Event handler IDL attribute for the end event type.
The onselectstart attribute is an Event handler IDL attribute for the selectstart event type.
The onselectend attribute is an Event handler IDL attribute for the selectend event type.
The onselect attribute is an Event handler IDL attribute for the select event type.
Example of acquiring a session here.
Document what happens when we end the session.
Document effects when we blur the session.
Document how to poll the device pose.
Document how the list of active input sources is maintained.
5.3. Animation Frames
callback =XRFrameRequestCallback void (DOMHighResTimeStamp ,time XRFrame );frame
Each XRFrameRequestCallback object has a cancelled boolean initially set to false.
Each XRSession has a list of animation frame callbacks, which is initially empty, and an animation frame callback identifier, which is a number initially be zero. Each XRSession also has a processing frame boolean, which is initially be set to false.
When the requestAnimationFrame(callback) method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSessionobject. -
Increment session’s animation frame callback identifier by one.
-
Append callback to session’s list of animation frame callbacks, associated with session’s animation frame callback identifier’s current value.
-
Return session’s animation frame callback identifier’s current value.
When the cancelAnimationFrame(handle) method is invoked, the user agent MUST run the following steps:
-
Let session be the target
XRSessionobject. -
Find the entry in session’s list of animation frame callbacks that is associated with the value handle.
-
If there is such an entry, set it’s cancelled boolean to
trueand remove it from session’s list of animation frame callbacks.
When the user agent is to run the animation frame callbacks for an XRSession session with a timestamp now and an XRFrame frame, it MUST run the following steps:
-
Let callbacks be a list of the entries in session’s list of animation frame callback, in the order in which they were added to the list.
-
Set session’s list of animation frame callbacks to the empty list.
-
Set session’s processing frame to
true. -
For each entry in callbacks, in order:
-
If the entry’s cancelled boolean is
true, continue to the next entry. -
Invoke the Web IDL callback function, passing now and frame as the arguments
-
If an exception is thrown, report the exception.
-
-
Set session’s processing frame to
false.
5.4. The XR Compositor
This needs to be broken up a bit more and more clearly decribe things such as the frame lifecycle.
The user agent MUST maintain an XR Compositor which handles presentation to the XRDevice and frame timing. The compositor MUST use an independent rendering context whose state is isolated from that of any WebGL contexts used as XRWebGLLayer sources to prevent the page from corrupting the compositor state or reading back content from other pages. the compositor MUST also run in separate thread or processes to decouple performance of the page from the ability to present new imagery to the user at the appropriate framerate.
The XR Compositor has a list of layer images, which is initially empty.
6. Frame Loop
6.1. XRFrame
[SecureContext ,Exposed =Window ]interface {XRFrame readonly attribute XRSession session ;readonly attribute FrozenArray <XRView >views ;XRDevicePose ?getDevicePose (XRCoordinateSystem );coordinateSystem XRInputPose ?getInputPose (XRInputSource ,inputSource XRCoordinateSystem ); };coordinateSystem
An XRFrame provides all the values needed to render a single frame of an XR scene to the XRDevice's display. Applications can only aquire an XRFrame by calling requestAnimationFrame() on an XRSession with an XRFrameRequestCallback. When the callback is called it will be passed an XRFrame.
The session attribute returns the XRSession that produced the XRFrame.
views
getDevicePose(coordinateSystem)
getInputPose(inputSource, coordinateSystem)
7. Coordinate Systems
7.1. XRCoordinateSystem
[SecureContext ,Exposed =Window ]interface :XRCoordinateSystem EventTarget {Float32Array ?getTransformTo (XRCoordinateSystem ); };other
When the getTransformTo(other) method is invoked, the user agent MUST run the following steps:
-
Let current be the target
XRCoordinateSystemobject. -
If a known transform exists from the coordinate system described by current to the coordinate system described by other, return it as a matrix.
-
Else return
null
7.2. XRFrameOfReference
enum {XRFrameOfReferenceType "head-model" ,"eye-level" ,"stage" , };dictionary {XRFrameOfReferenceOptions boolean =disableStageEmulation false ;double = 0.0; }; [stageEmulationHeight SecureContext ,Exposed =Window ]interface :XRFrameOfReference XRCoordinateSystem {readonly attribute XRStageBounds ?bounds ;readonly attribute double emulatedHeight ;attribute EventHandler onboundschange ; };
An XRFrameOfReference describes an XRCoordinateSystem that is generally expected to remain static for the duration of the XRSession, with the most common exception being mid-session reconfiguration by the user. An XRFrameOfReference is created by calling requestFrameOfReference(). Every XRFrameOfReference describes a coordinate system where the Y axis MUST be aligned with gravity, with positive Y being "Up". Negative Z is considered "Forward", and positive X is considered "Right".
Each XRFrameOfReference has as session, which is the XRSession that created it, and a frame of reference type, which is one of the XRFrameOfReferenceType values, set to the type passed requestFrameOfReference(). The behavior of the XRFrameOfReference differs depending on its frame of reference type as follows:
eye-level
An XRFrameOfReference with a frame of reference type of "eye-level" describes a coordinate system
with an origin that corresponds to the first device pose acquired by the XRSession after the "head-model" frame of reference is created, with the device yaw (rotation around the Y axis) at that point defining the coordinate system’s forward direction.
head-model
An XRFrameOfReference with a frame of reference type of "head-model" describes a coordinate system identical to an eye-level frame of reference, but where the device is always located at the origin. Any translation is provided by a software estimate of the devices position based solely on the device orientation and the assumption that the device is being worn as a headset. The translation estimate SHOULD be based off of modeling how the human head moves on average when an individual is only moving their neck.
If the device is capable of reporting accurate translation, it MUST be discarded when using this frame of reference type in favor of an orientation-based estimate. If the translation reported by the device is already based on head modeling the device-provided translation MAY be preserved.
NOTE: A "head-model" frame of reference type is useful when displaying content that cannot be viewed from arbitrary positions, such as panoramic images or videos.
stage
An XRFrameOfReference with a frame of reference type of "stage" describes a space known as a stage. A stage is a bounded, floor-relative play space that the user can be expected to safely be able to move within. Other XR platforms sometimes refer to this concept as "room scale" or "standing space".
Note: A stage is not intended to describe multi-room spaces, areas with uneven floor levels, or very large open areas. Future iterations of this specification may add more detailed support for tracking in those scenarios.
The origin of the stage MUST be at floor level, such that Y equals 0 at the floor, is negative below the floor level, and is positive above the floor. The origin on the X and Z axes is determined in a platform-specific manner, but in general if the user is in an enclosed space it’s ideal if the X and Z axes originate at or near the center of the room.
The stage bounds are described by a XRFrameOfReference's bounds attribute. If the frame of reference is not a stage or the stage bounds cannot be determined the bounds attribute MUST be null.
Note: When the bounds for a stage are null the user should not be required to physically move around their environment in order to interact with content. The device may still support 6DoF tracking, but it can’t be assumed that the user will know where they are relative to their environment and as such content that encourages movement beyond leaning is discouraged.
The onboundschange attribute is an Event handler IDL attribute for the boundschange event type.
The emulatedHeight attribute is initially set to 0.0. This value indicates the offset in meters along the Y axis that will be applied to poses acquired with this frame of reference.
When an XRFrameOfReference is created, the user agent MUST initialize the frame of reference by running the following steps:
-
Let frameOfRef be the newly created
XRFrameOfReferenceobject. -
Let session be the
XRSessionobject that requested frameOfRef’s creation. -
Let type be the
XRFrameOfReferenceTypepassed torequestFrameOfReference(). -
Let options be the
XRFrameOfReferenceOptionspassed torequestFrameOfReference(). -
Initialize frameOfRef’s session to session.
-
Initialize frameOfRef’s frame of reference type to type.
-
If type is not
"stage", abort these steps. -
// Initialize stage-specific properties
7.3. XRStageBounds
[SecureContext ,Exposed =Window ]interface {XRStageBounds readonly attribute FrozenArray <DOMPointReadOnly >geometry ; };
The XRStageBounds interface describes a border around a stage, known as the stage bounds which the user can be expected to safely be able to move within.
The polygonal boundary is given by the geometry array of DOMPointReadOnlys, which represents a loop of points at the edges of the safe space. The points describe offsets from the stage origin in meters, and MUST be given in a clockwise order as viewed from above, looking towards the negative end of the Y axis. The y value of each point MUST be 0 and the w value of each point MUST be 1. The bounds can be considered to originate at the floor and extend infinitely high. The shape it describes MAY not be convex. The values reported are relative to the stage origin, but MAY not contain it.
Note: Content should not require the user to move beyond the stage bounds; however, if their physical surroundings allow for it, it is possible for the user to ignore the bounds resulting in position values outside of the polygon they describe. This is not an error condition and should be handled gracefully by page content.
Note: Content generally should not provide a visualization of the stage bounds, as it’s the user agent’s responsibility to ensure that safety critical information is provided to the user.
8. Views
8.1. XRView
enum {XREye ,"left" }; ["right" SecureContext ,Exposed =Window ]interface {XRView readonly attribute XREye eye ;readonly attribute Float32Array projectionMatrix ; };
An XRView describes a single view into an XR scene. Each view corresponds to a display or portion of a display used by an XR device to present imagery to the user. They are used to retrieve all the information necessary to render content that is well aligned to the view's physical output properties, including the field of view, eye offset, and other optical properties. Views may cover overlapping regions of the user’s vision. No guarantee is made about the number of views any XR device uses or their order, nor is the number of views required to be constant for the duration of an XRSession.
NOTE: Many HMDs will request that content render two views, one for the left eye and one for the right, while most magic window devices will only request one view, but applications should never assume a specific view comfiguration. For example: A magic window device may request two views if it is capable of stereo output, but may revert to requesting a single view for performance reasons if the stereo output mode is turned off. Similarly, HMDs may request more than two views to facilitate a wide field of view or displays of different pixel density.
The eye attribute describes which eye this view is expected to be shown to. This attribute’s primary purpose is to ensure that prerendered stereo content can present the correct portion of the content to the correct eye. If the view does not have an intrinsicly associated eye (the display is monoscopic, for example) this attribute MUST be set to "left".
The projectionMatrix attribute provides a matrix describing the projection to be used for the view’s rendering. It is strongly recommended that applications use this matrix without modification. Failure to use the provided projection matrices when rendering may cause the presented frame to be distorted or badly aligned, resulting in varying degrees of user discomfort.
8.2. XRViewport
[SecureContext ,Exposed =Window ]interface {XRViewport readonly attribute long x ;readonly attribute long y ;readonly attribute long width ;readonly attribute long height ; };
An XRViewport object describes a viewport, or rectangular region, of a graphics surface. The x and y attributes define an offset from the surface origin and the width and height attributes define the rectangular dimensions of the viewport.
The exact interpretation of the viewport values depends on the conventions of the graphics API the viewport is associated with:
-
When used with a
XRWebGLLayerthexandyattributes specify the lower left corner of the viewport rectangle, in pixels, with the viewport rectangle extendingwidthpixels to the right ofxandheightpixels abovey. The values can be passed to the WebGL viewport function directly.
XRWebGLLayer's framebuffer using an XRViewport retrieved from an XRView.
xrSession. requestAnimationFrame(( time, xrFrame) => { let xrView= xrFrame. views[ 0 ]; let xrViewport= xrWebGLLayer. getViewport( xrView); gl. bindFramebuffer( xrWebGLLayer. framebuffer); gl. viewport( xrViewport. x, xrViewport. y, xrViewport. width, xrViewport. height); // WebGL draw calls will now be rendered into the appropriate viewport. });
9. Pose
9.1. Matrices
WebXR provides various transforms in the form of matrices. WebXR matrices are always 4x4 and given as 16 element Float32Arrays in column major order. They may be passed directly to WebGL’s uniformMatrix4fv function, used to create an equivalent DOMMatrix, or used with a variety of third party math libraries.
Translations specified by WebXR matrices are always given in meters.
9.2. XRDevicePose
[SecureContext ,Exposed =Window ]interface {XRDevicePose readonly attribute Float32Array poseModelMatrix ;Float32Array getViewMatrix (XRView ); };view
An XRDevicePose describes the position and orientation of an XRDevice relative to the XRCoordinateSystem it was queried with. It also describes the view and projection matrices that should be used by the application to render a frame of an XR scene.
The poseModelMatrix is a matrix describing the transform from the origin of the XRCoordinateSystem to the XRDevice.
NOTE: The poseModelMatrix can be used to position graphical representations of the XRDevice for spectator views of the scene or multi-user interaction.
The getViewMatrix(view) method returns a matrix describing the view transform to be used when rendering the passed XRView. The matrices represent the inverse of the model matrix of the associated viewpoint.
10. Input
10.1. XRInputSource
enum {XRHandedness ,"" ,"left" };"right" enum {XRTargetRayMode "gaze" ,"tracked-pointer" ,"screen" }; [SecureContext ,Exposed =Window ]interface {XRInputSource readonly attribute XRHandedness handedness ;readonly attribute XRTargetRayMode targetRayMode ; };
Each XRInputSource SHOULD define a primary action. The primary action is a platform-specific action that, when engaged, produces selectstart, selectend, and select events. Examples of possible primary actions are pressing a trigger, touchpad, or button, speaking a command, or making a hand gesture. If the platform guidelines define a recommended primary input then it should be used as the primary action, otherwise the user agent is free to select one.
The handedness attribute describes which hand the input source is associated with, if any. Input sources with no natural handedness (such as headset-mounted controls or standard gamepads) or for which the handedness is not currently known MUST set this attribute to the empty string.
The targetRayMode attribute describes the method used to produce the target ray, and indicates how the application should present the target ray to the user if desired.
-
gazeindicates the target ray will originate at the user’s head and follow the direction they are looking (this is commonly referred to as a "gaze input" device). -
tracked-pointerindicates that the target ray originates from either a handheld device or other hand-tracking mechanism and represents that the user is using their hands or the held device for pointing. -
screenindicates that the input source was an interaction with the canvas element associated with a non-immersive session’s output context, such as a mouse click or touch event.
10.2. XRRay
[SecureContext ,Exposed =Window ]interface {XRRay readonly attribute DOMPointReadOnly origin ;readonly attribute DOMPointReadOnly direction ;readonly attribute Float32Array transformMatrix ; };
An XRRay describes a geometric ray. The origin attribute defines the 3-dimensional point in space that the ray originates from. The origin's w attribute MUST be 1.0. The direction attribute defines the ray’s 3-dimensional directional vector. The direction's w attribute MUST be 0.0 and the vector MUST be normalized to have a length of 1.0.
The transformMatrix is a matrix which represents the transform from a ray originating at [0, 0, 0] and extending down the negative Z axis to the ray described by the XRRay's origin and direction.
NOTE: The XRRay's transformMatrix can be used to easily position graphical representations of the ray when rendering.
10.3. XRInputPose
[SecureContext ,Exposed =Window ]interface {XRInputPose readonly attribute boolean emulatedPosition ;readonly attribute XRRay targetRay ;readonly attribute Float32Array ?gripMatrix ; };
targetRay
gripMatrix
The emulatedPosition attribute indicates the accuracy of the origin of the targetRay and translation component of the gripMatrix. emulatedPosition MUST be set to true if positional values are software estimations, such as those provided by a neck or arm model. emulatedPosition MUST be set to false if the positional values are based on sensor readings.
11. Layers
11.1. XRLayer
[SecureContext ,Exposed =Window ]interface {};XRLayer
An XRLayer defines a source of bitmap images and a description of how the image is to be rendered in the XRDevice. Initially only one type of layer, the XRWebGLLayer, is defined but future revisions of the spec may extend the available layer types.
11.2. XRWebGLLayer
Each XRSession MUST identify a native WebGL framebuffer resolution, which is the pixel resolution of a WebGL framebuffer required to match the physical pixel resolution of the XRDevice.
The native WebGL framebuffer resolution is detemined by running the following steps:
-
Let session be the target
XRSession. -
If session’s
immersivevalue istrue, set the native WebGL framebuffer resolution to the resolution required to have a 1:1 ratio between the pixels of a framebuffer large enough to contain all of the session’sXRViews and the physical screen pixels in the area of the display under the highest magnification and abort these steps. If no method exists to determine the native resolution as described, the recommended WebGL framebuffer resolution MAY be used. -
If session’s
immersivevalue isfalse, set the native WebGL framebuffer resolution to the size of the session’soutputContext'scanvasin physical display pixels and reevaluate these steps every time the size of the canvas changes.
Additionally, the XRSession MUST identify a recommended WebGL framebuffer resolution, which represents a best estimate of the WebGL framebuffer resolution large enough to contain all of the session’s XRViews that provides an average application a good balance between performance and quality. It MAY be smaller than, larger than, or equal to the native WebGL framebuffer resolution.
NOTE: The user agent is free to use and method of it’s choosing to estimate the recommended WebGL framebuffer resolution. If there are platform-specific methods for querying a recommended size it is recommended that they be used, but not required.
typedef (WebGLRenderingContext or WebGL2RenderingContext );XRWebGLRenderingContext dictionary {XRWebGLLayerInit boolean =antialias true ;boolean =depth true ;boolean =stencil false ;boolean =alpha true ;boolean =multiview false ;double = 1.0; }; [framebufferScaleFactor SecureContext ,Exposed =Window ,Constructor (XRSession ,session XRWebGLRenderingContext ,context optional XRWebGLLayerInit )]layerInit interface :XRWebGLLayer XRLayer { // Attributesreadonly attribute XRWebGLRenderingContext ;context readonly attribute boolean ;antialias readonly attribute boolean ;depth readonly attribute boolean ;stencil readonly attribute boolean ;alpha readonly attribute boolean ;multiview readonly attribute WebGLFramebuffer ;framebuffer readonly attribute unsigned long framebufferWidth ;readonly attribute unsigned long framebufferHeight ; // MethodsXRViewport ?getViewport (XRView );view void requestViewportScaling (double ); // Static MethodsviewportScaleFactor static double getNativeFramebufferScaleFactor (XRSession ); };session
The XRWebGLLayer(session, context, layerInit) constructor MUST perform the following steps when invoked:
-
Let layer be a new
XRWebGLLayer -
If session’s ended value is
true, throw anInvalidStateErrorand abort these steps. -
If context is lost, throw an
InvalidStateErrorand abort these steps. -
If context’s compatible XR device does not equal session’s
device, throw anInvalidStateErrorand abort these steps. -
Initialize layer’s
contextto context. -
Initialize layer’s
antialiasto layerInit’santialiasvalue. -
If context supports multiview and layerInit’s
multiviewvalue istrue, initialize layer’smultiviewtotrue. -
Else initialize layer’s
multiviewtofalse. -
Initialize layer’s
framebufferto a new opaqueWebGLFramebuffercreated with context. -
Initialize the layer’s swap chain.
-
If layer’s swap chain was unable to be created for any reason, throw an
OperationErrorand abort these steps. -
Return layer.
The framebufferWidth and framebufferHeight attributes return the width and height of the swap chain's backbuffer, respectively.
getViewport() queries the XRViewport the given XRView should use when rendering to the layer.
The getViewport(view) method, when invoked, MUST run the following steps:
requestViewportScaling(viewportScaleFactor)
The getNativeFramebufferScaleFactor(session) method, when invoked, MUST run the following steps:
-
Let session be the target
XRSession. -
If session’s ended value is
true, return0.0and abort these steps. -
Return the value that the session’s recommended WebGL framebuffer resolution must be multiplied by to yield the session’s native WebGL framebuffer resolution.
Document what it means when a context supports multiview.
Document what an opaque framebuffer is.
Document the creation of a swap chain.
Need an example snippet of setting up and using an XRWebGLLayer.
11.3. WebGL Context Compatiblity
partial dictionary WebGLContextAttributes {XRDevice =compatibleXRDevice null ; };partial interface mixin WebGLRenderingContextBase {Promise <void >setCompatibleXRDevice (XRDevice ); };device
When a user agent implements this specification it MUST set a compatible XR device, initially set to null, on every WebGLRenderingContextBase.
In order for a WebGL context to be used as a source for XR imagery it must be created on a compatible graphics adapter for the XRDevice. What is considered a compatible graphics adapter is platform dependent, but is understood to mean that the graphics adapter can supply imagery to the XRDevice without undue latency. If a WebGL context was not already created on the compatible graphics adapter, it typically must be re-created on the adapter in question before it can be used with an XRWebGLLayer. Once the compatible XR device is set, the context can be used with layers for any XRSession requested from that XRDevice.
Note: On an XR platform with a single GPU, it can safely be assumed that the GPU is compatible with the XRDevices advertised by the platform, and thus any hardware accelerated WebGL contexts are compatible as well. On PCs with both an integrated and discreet GPU the discreet GPU is often considered the compatible graphics adapter since it generally a higher performance chip. On desktop PCs with multiple graphics adapters installed, the one with XRDevice physically connected to it is likely to be considered the compatible graphics adapter.
The compatible XR device can be set either at context creation time or after context creation, potentially incurring a context loss. To set the compatible XR device at context creation time, an XRDevice is supplied to the compatibleXRDevice context creation attibute dictionary when requesting a WebGL context.
When the HTMLCanvasElement's getContext() method is invoked with a WebGLContextAttributes dictionary that contains a non-null compatibleXRDevice, run the following steps:
-
Let attributes be the
WebGLContextAttributespassed to the function. -
Let device be the attributes’
compatibleXRDevice. -
Create the WebGL context as usual, ensuring it is created on a compatible graphics adapter for device.
-
Let context be the newly created WebGL context.
-
Set context’s compatible XR device to device.
-
Return context.
XRDevice and then uses it to create an XRWebGLLayer.
function onXRSessionStarted( xrSession) { let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" , { compatibleXRDevice: xrSession. device}); loadWebGLResources(); xrSession. baseLayer= new XRWebGLLayer( xrSession, gl); }
To set the compatible XR device after the context has been created, the setCompatibleXRDevice() method is used.
When the setCompatibleXRDevice(device) method is invoked, the user agent MUST return a new Promise promise and run the following steps in parallel:
-
Let context be the target
WebGLRenderingContextBaseobject. -
If context’s WebGL context lost flag is set, reject promise with an
InvalidStateErrorand abort these abort these steps. -
If context’s compatible XR device is device, resolve promise and abort these steps.
-
If context was created on a compatible graphics adapter for device:
-
Set context’s compatible XR device to device.
-
Resolve promise and abort these steps.
-
-
Queue a task to perform the following steps:
-
Force context to be lost and handle the context loss as described by the WebGL specification.
-
If the canceled flag of the "webglcontextlost" event fired in the previous step was not set, reject promise with an
AbortErrorand abort these steps. -
Restore the context on a compatible graphics adapter for device.
-
Set context’s compatible XR device to device.
-
Resolve promise.
-
Additionally, when any WebGL context is lost run the following steps prior to firing the "webglcontextlost" event:
-
Set the context’s compatible XR device to
null.
XRWebGLLayer from a pre-existing WebGL context.
let glCanvas= document. createElement( "canvas" ); let gl= glCanvas. getContext( "webgl" ); loadWebGLResources(); glCanvas. addEventListener( "webglcontextlost" , ( event) => { // Indicates that the WebGL context can be restored. event. canceled= true ; }); glCanvas. addEventListener( "webglcontextrestored" , ( event) => { // WebGL resources need to be re-created after a context loss. loadWebGLResources(); }); function onXRSessionStarted( xrSession) { // Make sure the canvas context we want to use is compatible with the device. // May trigger a context loss. return gl. setCompatibleXRDevice( xrSession. device). then(() => { xrSession. baseLayer= new XRWebGLLayer( xrSession, gl); }); }
12. Canvas Rendering Context
12.1. XRPresentationContext
[SecureContext ,Exposed =Window ]interface {XRPresentationContext readonly attribute HTMLCanvasElement canvas ; };
canvas
13. Events
13.1. XRSessionEvent
[SecureContext ,Exposed =Window ,(Constructor DOMString ,type XRSessionEventInit )]eventInitDict interface :XRSessionEvent Event {readonly attribute XRSession session ; };dictionary :XRSessionEventInit EventInit {required XRSession ; };session
session The XRSession associated with this event.
13.2. XRInputSourceEvent
[SecureContext ,Exposed =Window ,(Constructor DOMString ,type XRInputSourceEventInit )]eventInitDict interface :XRInputSourceEvent Event {readonly attribute XRFrame frame ;readonly attribute XRInputSource inputSource ; };dictionary :XRInputSourceEventInit EventInit {required XRFrame ;frame required XRInputSource ; };inputSource
frame An XRFrame that corresponds with the time that the event took place. This frame’s views array MUST be empty.
inputSource The XRInputSource that generated this event.
13.3. XRCoordinateSystemEvent
[SecureContext ,Exposed =Window ,(Constructor DOMString ,type XRCoordinateSystemEventInit )]eventInitDict interface :XRCoordinateSystemEvent Event {readonly attribute XRCoordinateSystem coordinateSystem ; };dictionary :XRCoordinateSystemEventInit EventInit {required XRCoordinateSystem ; };coordinateSystem
coordinateSystem The XRCoordinateSystem associated with this event.
13.4. Event Types
The user agent MUST provide the following new events. Registration for and firing of the events must follow the usual behavior of DOM4 Events.
The user agent MAY fire a devicechange event on the XR object to indicate that the availability of XRDevices has been changed. The event MUST be of type Event.
A user agent MAY dispatch a blur event on an XRSession to indicate that presentation to the XRSession by the page has been suspended by the user agent, OS, or XR hardware. While an XRSession is blurred it remains active but it may have its frame production throttled. This is to prevent tracking while the user interacts with potentially sensitive UI. For example: The user agent SHOULD blur the presenting application when the user is typing a URL into the browser with a virtual keyboard, otherwise the presenting page may be able to guess the URL the user is entering by tracking their head motions. The event MUST be of type XRSessionEvent.
A user agent MAY dispatch a focus event on an XRSession to indicate that presentation to the XRSession by the page has resumed after being suspended. The event MUST be of type XRSessionEvent.
A user agent MUST dispatch a resetpose event on an XRSession when the system resets the XRDevice's position or orientation. The event MUST be of type XRSessionEvent.
A user agent MUST dispatch a end event on an XRSession when the session ends, either by the application or the user agent. The event MUST be of type XRSessionEvent.
A user agent MUST dispatch a selectstart event on an XRSession when one of its XRInputSources begins its primary action. The event MUST be of type XRInputSourceEvent.
A user agent MUST dispatch a selectend event on an XRSession when one of its XRInputSources ends its primary action or when an XRInputSource that has begun a primary action is disconnected. The event MUST be of type XRInputSourceEvent.
A user agent MUST dispatch a select event on an XRSession when one of its XRInputSources has fully completed a primary action. The event MUST be of type XRInputSourceEvent.
A user agent MUST dispatch a boundschange event on an XRFrameOfReference when the stage bounds change. This includes changes to the geometry point array or the bounds attribute changing to or from null. The event MUST be of type XRCoordinateSystemEvent.
14. Integrations
14.1. Feature Policy
This specification defines a feature that controls whether thexr attribute is exposed on the Navigator object.
The feature name for this feature is "xr".
The default allowlist for this feature is ["self"].
15. Acknowledgements
The following individuals have contributed to the design of the WebXR Device API specification:
-
Sebastian Sylvan (Formerly Microsoft)
And a special thanks to Vladimir Vukicevic (Unity) for kick-starting this whole adventure!
