<model>
element
Copyright
©
2022
the Contributors to the
The <model>
element
Specification, published by the
Immersive Web Community Group under the
W3C Community Contributor License Agreement (CLA). A human-readable
summary
is available.
The model
element allows embedding 3D graphical content into a
[HTML] document. The HTMLModelElement
interface then provides a
means to interface with the embedded resource. Access-Control-Request-Method
This specification was published by the Immersive Web Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.
This is a work on progress.
GitHub Issues are preferred for discussion of this specification.
Add an example that shows how to enabling interactivity.
Having an example that shows how to support multiple formats via <source>
would be great.
It would be great to show how one can provide fallback content for user agents that don't support model. This could include showing a <video>
or <picture>
instead.
It would be great to have some examples that shows how to maximize the accessibility of <model>
.
interactive
attribute: interactive content.
src
attribute: transparent, a picture
or
img
, or a media element descendant.
src
attribute: Zero or more source elements,
then transparent, optionally intermixed with script-supporting
elements.
autoplay
— Hint that the resource can be started
automatically when the page is loaded
interactive
— Allows the user to interact with the model
crossorigin
— How the element handles crossorigin requests
height
— Vertical dimension
loading
— Used when determining loading deferral
loop
— Whether to loop the media resource
muted
— Whether to mute the media resource by default
poster
— Poster frame to show while the resource is loading
src
— Address of the resource
width
— Horizontal dimension
HTMLModelElement
interface provides a means to interface with
the embedded resource.
The
element is used for embedding 3D models into a document.
model
Content may be provided inside the
element. User agents
should not show this content to the user; it is intended for web
browsers which do not support model
, to be shown as fallback content.
model
HTML defines an algorithm to determine the [=poster frame=], but it's <video>
specific. Should we accommodate it support <model>
or specify our own?
We also need to consider what happens if the animation resets and the model is paused: does the poster show again? (i.e., do we follow video's behavior?)
Some <model>
's resources can be significant in size. As such, it might be good to support the loading
attribute to allow these resources to be lazy-loaded.
The
attribute gives the URL of an image file that the
user agent can show while 3D content is unavailable. The attribute, if
present, must contain a valid non-empty URL
potentially surrounded by spaces.
poster
WebIDL[Exposed=Window]
interface HTMLModelElement
: HTMLElement {
};
Is it necessary that every method and attribute returns a promise?
As the model can be interactive, it's foreseeable that users might place/rotate a model in such a way that they are unsure which direction the model is facing. It might be nice to provide some way of resetting the camera to its initial position.
This is a "nice to have" (i.e., probably not critical) because a developer could take a snapshot of the camera's position when the model loads. However, that's kinda annoying because then they need to keep their own accounting of the starting position of each model.
source
element's parent is a model
element
The <source>
element behaves differently depending on who the parent is. For instance, when the parent is <picture>
, the srcset
attribute comes into to play. We need to look at the attributes of <source>
and figure out what they mean in when used in the context of <model>
.
glTF is not a run-time format. It does not define what an application should do with a model once it is loaded and rendered. It does provide some capabilities that a run-time engine may use to enhance the user experience. glTF currently does not store any interactivity information. Currently that is solely a run-time determination. The run-time determines what parts (if any) of the model may be active and the behavior based on any trigger.
Like Interactivity, animation is not built-into glTF. glTF files may contain animation parameters that specify the type of animation (e.g., morph, skin & bones, etc.) and the associated parameters needed to perform the animation. There is nothing in the glTF specification that defines how one animation interacts with another. For example, a human model may include walk, jump, and drop animations; but it is unlikely that they should all be played at the same time.
Any HTML element that wishes to handle animation as stored in a glTF file needs to understand how the content creator intended the animation to play.
When exiting an AR experience it's sometimes useful to pass data from the model out back to the page. On iOS [1], for instance, a "message" event is sent:
Which then allows a web page to over and perform some action through the web page. In the case above, it triggers Apple Pay through (presumedly) the Payment Request API.
Obviously, the "message" with the custom .data
"_apple_ar_quicklook_button_tapped..."
is not something we would want to standardize, but it might be good to consider some kind of user activation action resulting form the format itself causing the scene to exit with some action. The .data
could be an IDL object
(or something better) that could be used to handle the action (e.g., buy a thing).
The proposal introduces an interactive
attribute, which would allow a user to interact with the model. We need to specify what that means to some degree, or at least some expectations.
As with other media elements (again #13), having "controls" for media specific things can be extremely helpful for accessibility (and just generally helpful for developers not needing to deal with things like the fullscreen API).
It would be nice to consider adding support for controls
and then leaving it mostly to the UA as to what those controls are... we could figure out a standard set of things, like <video>
provides.
I agree that it is very good to make it easy for people to display 3D content in a web page. I completely disagree with the methods and processes described in this proposal to make it an HTML element. HTML elements need to be fully defined so that they can be similarly implemented across browsers and reflect what people would see in applications outside of browsers. The process of rendering a high-quality model requires proper handling and rendering of the model's geometry, appearance, animation, and interaction.
My knowledge is in glTF (and glTF binary) so these comments may or may not reflect on the capabilities of USDZ. I will address the topics as separate issues: Appearance and Animation / interactivity; with respect to 3D models in glTF format. Static geometry is pretty straight-forward and not subject to much interpretation.
The really difficult part is appearance. The document states that "it is impractical to define a pixel accurate rendering..." for models. However, this is really important. Khronos has done extensive work in the 3D Commerce Working Group towards pixel accurate rendering across multiple 3D viewers (https://www.khronos.org/3dcommerce/certification/). The accuracy was demanded by retailers so their products would appear visually identical across different web sites. There were so many factors that mattered in producing acceptable renderings that include lighting, rendering calculations (including equation approximations), conversion from GPU to display, and tone mapping.
The component that caused the most issues and difficulties is lighting. A model built for physically-based rendering looks best in a complex lighting environment. This is usually done with image based lighting, but punctual plus area lights will also work. The statement that "A future version ... will describe the lighting model and environment .... Both items will require community collaboration and some consensus." makes the process sound much easier that Khronos found it to be.
Some issues that came from the Certification work. Note that the Certification program did not solve all of these in the initial release.
It may be possible to construct an initial release without resolving all of these items.
The Oculus browser is displayed on a curved surface.
How we envision the display of multiple models? Would we allow them to bump into each other or would there be clipping?
There are cases where it may not be desirable to have a shadow being cast, because the shadow can be larger than the viewport in which an object is being show. Also, as the "camera" rotates, the shadow can end up looking weird and exposing the boundary of the <model>
element.
For example:
It might be good to disable the shadow entirely? Or some other means for developers to express where the light source should be to cast the shadow where they might want.
In order for models to be rendered appropriately within the context of web document, it might be useful to give developers some means of controlling the IBL.
For example, it might not make sense to use sunny environment light for models sitting on a predominately dark document (and vice versa).
There may be some overlap here with light/dark mode in CSS... but not sure.
As the model can be interactive, it's foreseeable that users might place/rotate a model in such a way that they are unsure which direction the model is facing. It might be nice to provide some way of resetting the camera to its initial position.
This is a "nice to have" (i.e., probably not critical) because a developer could take a snapshot of the camera's position when the model loads. However, that's kinda annoying because then they need to keep their own accounting of the starting position of each model.
The proposal includes the ability to change the orientation/camera's view of the model. However, it's unclear how that will interact with requestAnimationFrame()
. That is, do we leave it to developers to control how those changes are interpolated? or do we just leave it to the user agent to perform the change in position.
There are naturally pro's and cons to each approach. Like, controlling the speed of the translation/transition, and the smoothness curve of the animation. Or. what if the developer wants to just flip the object without any animation?
What's the default CSS style for a model
element? Should it have a border around it? what about background color? etc.
Whether a
element is exposing a user
interface is not expected to affect the size of the rendering;
controls are expected to be overlaid above the page content without
causing any layout changes, and may disappear when the user does not
need them.
model
When a
element represents a poster frame, the poster frame is
expected to be rendered at the largest size that maintains the aspect
ratio of that poster frame without being taller or wider than the
model
element itself, and is expected to be centered in the
model
element.
model
Like other media and resources, model could emit events. An obvious one being a network error if a resource can't load. We should figure out what these events might be, and if model
is a media element (#13), then we should see what applies in general for media elements.
Need to investigate what formats are suitable for model
. We might need some kind of evaluation matrix. Model can support multiple formats out of the box, but it might be good to evaluate what is best of users and developers and why.
The <model>
element shares a lot of similarities with the <audio>
and <video>
elements, yet it's distinct in some ways (we need to tease these out). It's similar in being potentially temporal multimedia content (i.e., it has audio, it potentially animates over time). We need to figure out if model sufficiently different to warrant being its own element class, or if it can reuse much of "media element"'s infrastructure.
Additional integrations into HTML:
Need same behavior as audio
and video
when including into a p
element with no end tag.
Need to specify that model is an appropriate child of <figure>
.
I was looking at Vieweing Augment Reality Assets in Safari for iOS and it describes an "ar" link relationship that applies to anchors. I don't think that's been standardized though.
On iOS, adding "ar" link relation adds a AR overlap button inside the context of the anchor:
Can be seen here (on an iOS device):
https://jsfiddle.net/61g0m2ky/1/
I wonder if we should standardize that too?
The formats that model support can fetch a lot of other resources. We probably need a new fetch destination ("model").
We need to investigate what the privacy implications are of each model format we will recommend. The model formats themselves can fetch resources, so we need to put a privacy and security framework around what schemes they can fetch (https only, for instance). We also need to say what all the fetch policies are. Need to investigate if the formats provide any guidance here, or if they leave it up to the implementation. If they do, we need to specify it (i.e., don't send cookies, don't leak the referrer, etc.).
Need to clarify that 3D resources can fetch resources, and as such need to be subject the document's CORS policy (probably "media-src"). However, we need to clarify what this means in relation to, say, "img-src", for example... as models can load png/jpg textures.
Given the close relationship to media elements, and given the reliance on <source>
elements, we could just say that media-src
applies to <model>
too.
Need to describe that each format will come with its own security considerations (and link to the appropriate security considerations in their respective specs).
We need to figure out how to make <model>
accessible on a number of different fronts:
Usually, this would be provide by the embedded format... however, it appears that both glTF and USDZ are quite limited when it comes to accessibility.
As such, it may be that we need to leverage what we can from HTML + ARIA to overcome the shortcomings of these formats. We have quite a bit of precedent (e.g., from the humble, yet limited, alt
attribute, to how <canvas>
can be made accessibly, to the potential inclusion of <track>
elements, and so on).
We need to define how what the ARIA semantics are and what is exposed (application probably). We need to coordinate with the accessibility folks + get this added to the HTML Accessibility API Mappings.
[wai-aria-1.2] | No corresponding role |
---|---|
MSAA + IAccessible2 |
Not mapped
|
UIA |
Not mapped
|
ATK |
Not mapped
|
AX |
Not mapped
|
Comments |
We need to check if there are any relevant MIME parameters for model/*
content (if any).
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key word MAY in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
The following are some significant changes that were made since the initial proposal:
autoplay
§2.1controls
§2.3crossorigin
§2.4height
§2.5HTMLModelElement
interface
§3.interactive
§2.2loading
§2.6loop
§2.7model
§2.muted
§2.8poster
§2.9src
§2.10width
§2.11Referenced in: