It should work like this. This is based on the API in the HTML5 specification.
- A media element (i.e. video element) populates a list of text tracks.
- For each valid track element create a new track.
- For new tracks: "The text track list of cues is initially empty. It is dynamically modified when the referenced file is parsed."
- When a track is created, changed, or its parent is changed to a new media element, then start the track processing model for that track element.
- If the track element is valid, then the fetching algorithm load's the resource indicated by the URL. If the resource's type is a valid text track format, then the data is passed to the parser where output is a list of cues.
- [ISSUE 1] How the cues are added to the list of cues is not perfectly clear, but it appears the parser is passed a list of cues as the parameter "output" and adds cues incrementally. I say this because the output is not a stream, just a list, so it cannot return a stream of cues which the browser would add incrementally. After double checking I don't believe our interpreter does this, but I may be wrong.
- From WebVTT Parsing Rules:
"A WebVTT parser, given an input byte stream and a text track list of cues output"
- From HTML5 Track Processing Model:
"... the resource's data must be passed to the appropriate parser (e.g. the WebVTT parser) as it is received, with the text track list of cues being used for that parser's output."
- [ISSUE 2] According to rule 47, "Cue text processing", when the parser creates a cue, it sets the text track cue's text to the text loaded without processing the text. The text track cue text should just be a string, initially empty. Currently it is a list of nodes.
- From WebVTT Parsing Algorithm:
"47. Cue text processing: Let the text track cue text of cue be cue text, and let the rules for its interpretation be the WebVTT cue text parsing rules, the WebVTT cue text rendering rules, and the WebVTT cue text DOM construction rules."
- [ISSUE 3] The unprocessed text track cue text is processed only when it is to be displayed during the playback loop for cues. What should happen is that the cue is associated with the parser algorithm for rendering and calls that when it needs to render.
- From HTML5 Playback Loop for Cues:
"18. Run the rules for updating the text track rendering of each of the text tracks in affected tracks that are showing. For example, for text tracks based on WebVTT, the rules for updating the display of WebVTT text tracks."
- From TeckTrackCue getCueAsHTML Method:
getCueAsHTML()method must convert the text track cue text to a
DocumentFragmentfor the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules."
This is useful for dealing with order. I haven't checked if the parser sorts. It shouldn't.
The default render rules for dynamically created TextTrackCues are the WebVTT render rules.
4. Let cue's text track cue text be the value of the text argument, and let the rules for its interpretation be the WebVTT cue text parsing rules, the WebVTT cue text rendering rules, and the WebVTT cue text DOM construction rules.
This is how cues are ordered. I wouldn't worry about it. The ordering is done during the playback loop for cues via steps 1 and 13.
The text track cues of a media element's text tracks are ordered relative to each other in the text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first (so e.g. for cues from a WebVTT file, that would initially be the order in which the cues were listed in the file).