Thursday, 28 March 2013

WebVTT reftests

A reftest is a pass or fail test which checks that two web pages look identical. It does this by producing two bitmap images and verifying if they are identical or not. HTML content can be rendered the same way using different methods, such as an image and a specific frame of video. Reftests are useful to ensure that something is rendered correctly by comparing to what is ought to look like. Specifically, this is done by creating two web pages and comparing them.

Further information


This is tremendously important for WebVTT because it is the only way to ensure consistency to the specification, and therefore consistency between implementations. This means that a WebVTT file will look the same anywhere it is used, and thus enables content developers to use advanced features. WebVTT is a web standard after all.

The goal of web standards is to make implementation interoperable. (W3C) This means "the situation in which all implementations process particular content in a reliable and identical or equivalent way." (W3C) Historically when format implementations were not consistent with each other, developers only used a limited subset of the format features, had multiple versions for each implementation, or used a third-party solution. Recall the browser wars and the push for browsers to comply with web standards. (Web Standards Project)

Reftest Format

A reftest consists of three files:
  • Test page with feature to be tested
  • Reference page that show have the feature should look
  • A reftest.list file that lists the assertions


The following test should pass and serves to demonstrate the basic format.



== spaces1.html spaces2.html


There are two main types of assertions, expect pass (==) or expect fail (!=).

The two tests will be processed as soon as the page finished loading. Sometimes the test needs to be delayed for asynchronous content. This can be accomplished by adding the class "reftest-wait" to the HTML element and removing it at the appropriate time. This is required for WebVTT test because the video and text tracks are loaded  asynchronously.

Basic WebVTT Test

<html class="reftest-wait">
 <meta charset="UTF-8">
  #testVideo {
   position: absolute;
   left: 0px;
   top: 0px;
   width: 640px;
   height: 480px;
   margin: 0px;
   padding: 0px;
 <video id="testVideo">
  <source src="grey320x240.ogv" type="video/ogg">
  <track src="basic.vtt">
 <script src="testScript.js"></script>

<html class="reftest-wait">
 <meta charset="UTF-8">
  #testVideo {
   position: absolute;
   left: 0px;
   top: 0px;
   width: 640px;
   height: 480px;
   margin: 0px;
   padding: 0px;
  #testDiv {
   position: absolute;
   left: 0px;
   top: 250px;
   width: 640px;
   margin: 0px;
   padding: 0px;
 <video id="testVideo">
  <source src="grey320x240.ogv" type="video/ogg">
 <div id="testDiv">WebVTT Test</div>

 <script src="testScript.js"></script>

   Make sure video is loaded,
   and that it is always at the same frame.

// Need to play to load video
testVideo.onplay = function() {
 // Stop video and seek to 5 seconds
 testVideo.onpause = function() {
  // When video is loaded, preform test
  testVideo.oncanplaythrough = function() {
   document.documentElement.className = "";
  testVideo.currentTime = 5;


The style is to make sure margins, padding, and positions are not a factor. We are not testing that, and by knowing the exact properties of the video we can create precisely what the caption should look for. It is important to only test one thing in each test.

In order to make sure the test is correct, the bitmap must be created on exactly the same frame in the video. The following steps are performed to ensure that.

  • The video must be loaded and is not loaded until the video has been instructed to play.
  • The test must be performed on the same frame so the video must be paused on that frame.
  • To set the video to the correct frame for the test the video must be set to a predetermined frame.
    • currentTime is a decimal number in seconds
    • 1/24 is a second represents one frame in a 24 frame per second (fps) video
  • It may take time for the video to load the frame so it must wait until the video is loaded
    • A loading overlay may be shown while the video is loading. It must be gone before the test is performed.
  • Remove the "reftest-wait" class to perform the test.


I created the video for the WebVTT tests. To do this I created a completely grey image. I chose grey so that white and black will be easily seen and not blend into the background. I used VirtualDub to create a raw video AVI using an AviSynth file with the ImageSource function and a silent audio file I created. I then converted the AVI to the Theora video format (OGV). Theora was chose because it is a free and open format, is specified by the HTML5 standard, and works in most browsers including Mozilla Firefox and Google Chrome.

File size is 30 kilobytes.

Created by Kyle Barnhart (me)
Released to public domain under Unlicense

Running Tests

Test are easily run in a Mozilla build. Just used "./mach reftest" to run all tests. You can also specify a particular set of tests. For WebVTT I used the following.

./mach reftest layout/reftests/webvtt/reftest.list

This also shows the standard location for reftests in mozilla-central.

After you have built the code, running the tests only takes a few seconds or more depending on the number of tests and the time it takes to clear the "reftest-wait" class in each test that must be delayed.

Saturday, 9 March 2013

WebVTT Parser Rules

In some recent discussion the nature of the specification has come up again. Since this has never been dealt with, I was told to ignore and leave the issue, an update seems in order.

There is a new bug that has been posted where the following is stated.
"Also add a notice at syntax specification that implementers need to read the parsing section."
- Bug 21230 - [WebVTT] specify extension points in syntax spec

This comes from a discussion on the mailing list. I'll add the following relevant sections.

[Discussion of conflict between syntax and parser rules in WebVTT specification]

No, there is no conflict. The first one is the current spec, the second is the requirement on how to parse it so that the current spec can be extended in the future.

- Silvia Pfeiffer

Ralph Giles wrote:
"If you do want to do something application-specific here, at least try to follow the syntax rules implied by Silvia's draft. Then when you have some implementation experience, we should try to spec what those rules actually are."

Please don't do that, either. Don't put anything in there at all until we're sure of what the format should be.

- Glenn Maynard

Just to clarify (at least to my understanding): The first is the file format, and the second is the parser.  Both are part of the spec.  The file format tells you what's valid--what authors should be writing.  The parser tells you how to deal with every possible input, which includes error handling (inputs that don't follow the file format) and--as you said--forwards-compatibility.

I've seen a couple people confused by this, leading to people trying to write implementations by looking at the file format, which is bad.  Browser vendors understand it, since that's how the HTML spec works, but since non-browser people without experience with that spec may be implementing this (more than most other parts of the web), maybe there should be a brief note at the top of the file format section explaining this.  ("If you're implementing WebVTT, you're in the wrong place!")

- Glenn Maynard

Agreed, we need to clarify this. I want to make sure to specify extension points in the syntax specification to make sure that implementers are made aware of this.

- Silvia Pfeiffer

Tuesday, 5 March 2013

WebVTT Mozilla Reftest

I'm working on what needs to be done for testing the WebVTT rendering rules. The best way to go looks like using reftest. Mozilla MDN has a page on it. Full documentation for Mozilla is here.

It is very straightforward. Make two HTML pages that are identical except for the thing you want to test. For example, webvtt-basic.html will have a <track> in a <video>. webvtt-basic-ref.html will have <div> in <video>. The results must look exactly the same to pass the test.

Reftest works be taking a screenshot of the windows immediately after the document is loaded and comparing. However, since this is a video that might take a second to load, adding the class reftest-wait and removing it when the time is right with document.documentElement.className = "".

Here is a basic test.
<!DOCTYPE html>
<html class="reftest-wait">
<video id="testVideo" autoplay>
    <source src="test.ogv" type="video/ogg">
    <track src="test.vtt">
    document.getElementById('testVideo').onplay = function(e) {
  document.documentElement.className = "";

I found a nice public domain video to use.

WebVTT Rendering Issues

I've been working on the rendering stage of WebVTT. I've been trying to work out how WebVTT Cues are converted into DOM elements. I've found a problem with our interpreter.

It should work like this. This is based on the API in the HTML5 specification.
  1. A media element (i.e. video element) populates a list of text tracks.

  2. For each valid track element create a new track.

  3. For new tracks: "The text track list of cues is initially empty. It is dynamically modified when the referenced file is parsed."

  4. When a track is created, changed, or its parent is changed to a new media element, then start the track processing model for that track element.

  5. If the track element is valid, then the fetching algorithm load's the resource indicated by the URL. If the resource's type is a valid text track format, then the data is passed to the parser where output is a list of cues.

    • [ISSUE 1] How the cues are added to the list of cues is not perfectly clear, but it appears the parser is passed a list of cues as the parameter "output" and adds cues incrementally. I say this because the output is not a stream, just a list, so it cannot return a stream of cues which the browser would add incrementally. After double checking I don't believe our interpreter does this, but I may be wrong.

      • From WebVTT Parsing Rules:

        "A WebVTT parser, given an input byte stream and a text track list of cues output"

      • From HTML5 Track Processing Model:

        "... the resource's data must be passed to the appropriate parser (e.g. the WebVTT parser) as it is received, with the text track list of cues being used for that parser's output."

    • [ISSUE 2] According to rule 47, "Cue text processing", when the parser creates a cue, it sets the text track cue's text to the text loaded without processing the text. The text track cue text should just be a string, initially empty. Currently it is a list of nodes.

      • From WebVTT Parsing Algorithm:

        "47. Cue text processing: Let the text track cue text of cue be cue text, and let the rules for its interpretation be the WebVTT cue text parsing rules, the WebVTT cue text rendering rules, and the WebVTT cue text DOM construction rules."

  6. When a cue is displayed then the render rules are run.

    • [ISSUE 3] The unprocessed text track cue text is processed only when it is to be displayed during the playback loop for cues. What should happen is that the cue is associated with the parser algorithm for rendering and calls that when it needs to render.

      • From HTML5 Playback Loop for Cues:

        "18. Run the rules for updating the text track rendering of each of the text tracks in affected tracks that are showing. For example, for text tracks based on WebVTT, the rules for updating the display of WebVTT text tracks."

      • From TeckTrackCue getCueAsHTML Method:

        "The getCueAsHTML() method must convert the text track cue text to a DocumentFragment for the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules."

This is useful for dealing with order. I haven't checked if the parser sorts. It shouldn't.

The default render rules for dynamically created TextTrackCues are the WebVTT render rules.

4. Let cue's text track cue text be the value of the text argument, and let the rules for its interpretation be the WebVTT cue text parsing rules, the WebVTT cue text rendering rules, and the WebVTT cue text DOM construction rules.

This is how cues are ordered. I wouldn't worry about it. The ordering is done during the playback loop for cues via steps 1 and 13.

The text track cues of a media element's text tracks are ordered relative to each other in the text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first (so e.g. for cues from a WebVTT file, that would initially be the order in which the cues were listed in the file).

HTML Track Status

I've been looking at how the track gets implemented and how WebVTT cues are loaded. This is where things are concerning the HTML element track.



interface HTMLMediaElement
readonly attribute TextTrackList textTracks
TextTrack addTextTrack(DOMString kind, optional DOMString label, optional DOMString language)

Timed Text Track Model
this describes how HTMLMediaElement deals with text tracks. it's quite extensive.

interface HTMLTrackElement
readonly attribute TextTrack track

interface TextTrackList
getter TextTrack (unsigned long index)

REPO: unknown

interface TextTrack
readonly attribute TextTrackCueList? cues
readonly attribute TextTrackCueList? activeCues
void addCue(TextTrackCue cue)
void removeCue(TextTrackCue cue)

interface TextTrackCueList
getter TextTrackCue (unsigned long index)
TextTrackCue? getCueById(DOMString id)

interface TextTrackCue
readonly attribute TextTrack? track
DocumentFragment getCueAsHTML()

The getCueAsHTML() method must convert the text track cue text to a DocumentFragment for the script's document of the entry script, using the appropriate rules for doing so. For example, for WebVTT, those rules are the WebVTT cue text parsing rules and the WebVTT cue text DOM construction rules.


Coming Changes to Specification

There are big changes coming to the track element and WebVTT. Either the TextTrackCue interface is moving to the WebVTT spec, or the parsing, DOM construction, and rendering rules will move to the HTML spec.

See this and replies:


Also, I'm working on rendering. Just to note, there are 5 sections to the WebVTT specification, these three are in addition to syntax and parsing.


WebVTT DOM Construction Rules:

WebVTT Rendering Rules:

WebVTT CSS Rules: