Friday, 19 April 2013

WebVTT Reftest Bug

For the bug 855633 in Mozilla's bug tracker.

I've written many reftests for WebVTT. Outstanding tests include positioning cues with right-to-left test. There appears to be a bug in the specification. Overlapping cue tests need to be written. Tests for the tags ruby, class, and voice are not done. In addition, there are no tests for new CSS properties such as ::cue.

I've had to make a number of basic test changes. The first is that the height of the first line in a cue should be found using the css property height instead of line-height. Line-height returns "normal" on some browsers but height is always a number. Also, the specification wants the height of the first line not the line-height of the first line.

Another issue was that the height may not always be an integer. Decimal pixel values are valid (eg. 27.6px) so parseFloat should be used instead of parseInt.

The following tests are written.

Test the default behaviour. No particular rule. (April 17, 2013)

Make sure cues do not render for the audio element. (April 17, 2013)

Make sure control interface and cue do not overlap. (April 17, 2013)

Ensure cues display in the correct order for multiple cues and tracks. (April 17, 2013)

Right to left writing direction test (April 17, 2013)

Writing mode vertical test (April 17, 2013)

Cue writing direction, alignment, and writting mode tests. (April 17, 2013)
The rules for direction, alignment, and mode depend on each other
so tests for each individually would be the identical.
There is a specification bug for positioning cues with right-to-left text. Those tests need to be done.

Snap-to-line tests

Text wrapping tests

Cue test payload tag tests

Thursday, 18 April 2013

WebVTT Cue Display Order

I was working on the order that cues are displayed for a WebVTT file and I came across some interesting things. I was looking to test the following render rule which is a little hard to figure out but the HTML5 specification was very specific on the order.

"8. For each track track in tracks, append to cues all the cues from track's list of cues that have their text track cue active flag set." (April, 17, 2013)

This means that cues from different tracks should display the same as cues for the same track.

"13. Sort the tasks in events in ascending time order (tasks with earlier times first).
Further sort tasks in events that have the same time by the relative text track cue order of the text track cues associated with these tasks." (April, 17, 2013)

The event time is the start time for entering cues, and later of the start and end time for exiting cues.

"text track cue order, which is determined as follows: first group the cues by their text track, with the groups being sorted in the same order as their text tracks appear in the media element's list of text tracks; then, within each group, cues must be sorted by their start time, earliest first; then, any cues with the same start time must be sorted by their end time, latest first; and finally, any cues with identical end times must be sorted in the order they were last added to their respective text track list of cues, oldest first" (April, 17, 2013)

Therefore the correct cue order is:
  1. start time (ascending)
  2. track order (top to bottom)
  3. end time (descending)
  4. cue order (top to bottom)
There are a few important things to take away from this. First, the start time is more important than anything else and the cues from multiple tracks will be mixed together. Also, end times are descending. If the cue ending soonest were under the longer lasting cue, the lasting cue would drop when the other exits. This may make it look like another cue, so it's better to have the lasting one stay in the same place on the bottom.

With the starting time being most important, one would think the ending time would be second. Instead the track order is second. This at first seems odd, but because the tracks are likely for different purposes, separating them is useful. Start time trumps track order because cues could appear in between other cues instead of at the top.

There are 12 reftests to test all possible cases with 1 track and 2 tracks.

Wednesday, 17 April 2013

WebVTT Cue Layout Reference

The purpose of this post is to describe how a WebVTT file should render. I will reference rules from the rendering cues for video section.

3. Let output be an empty list of absolutely positioned CSS block boxes.

This is a div element with the the CSS property position set to absolute.

4. If the user agent is exposing a user interface for video, add to output one or more completely transparent positioned CSS block boxes that cover the same region as the user interface.

If the video has controls visible, add a empty div over the control area.

9. For each track track in tracks, append to cues all the cues from track's list of cues that have their text track cue active flag set.
10. If reset is false, then, for each text track cue cue in cues: if cue's text track cue display state has a set of CSS boxes, then add those boxes to output, and remove cue from cues.
10. 17. Add the CSS boxes in boxes to output.

For every cue to be displayed in all tracks, add a div. If the cue has already been rendered, use the existing div.

10. 12. - The children of the nodes must be wrapped in an anonymous box whose 'display' property has the value 'inline'. This is the WebVTT cue background box.

The cue contents should be rendered into a signle <div style="display:inline"> inside the cue div.

10. 12. - Text runs must be wrapped according to the CSS line-wrapping rules, with the following additional constraints:
10. 12. - * Regardless of the value of the 'white-space' property, lines must be wrapped at the edge of their containing blocks, even if doing so requires splitting a word where there is no line breaking opportunity. (Thus, normally text wraps as needed, but if there is a particularly long word, it does not overflow as it normally would in CSS, it is instead forcibly wrapped at the box's edge.)
10. 12. - * Regardless of the value of the 'white-space' property, any line breaks inserted by the user agent for the purposes of line wrapping must be placed so as to minimize Δ across each run of consecutive lines between preserved newlines in the source. Δ for a set of lines is defined as the sum over each line of the absolute of the difference between the line's length and the mean line length of the set.

If a word is longer than the allowed width break the work with a hyphen. Don't use the CSS property word-wrap to break-word, this needs to be determined dynamically. After the number of lines is know and words broken, minimize the cue width without creating new lines.

For example

This is a really long sentence that needs to be displayed on
two lines.

should be

This is a really long sentence that
needs to be displayed on two lines.


The specification uses the CSS viewport units vw and vh. I cannot get these to work properly with video, so I've calculated the values based on the video.


This is what the layout should be for a basic webvtt cue.

Test Video CSS

#testVideo {
 position: absolute;
 left: 0px;
 top: 0px;
 width: 640px;
 height: 480px;

Basic Cue Reference CSS

/* cue constants, same for every cue */
.cueBox {
 position: absolute;
 unicode-bidi: plaintext;
 font: 24px sans-serif;    /* 5vh = 24px */
 color: rgba(255,255,255,1);
 white-space: pre-line;
.cueBackgroundBox {
 background: rgba(0,0,0,0.8);

/* cue variables, depends on cue */
#testCue {
 direction: ltr;
 writing-mode: horizontal-tb;
 left: 0px;    /* 0vw = 0px */
 top: 0px;    /* 0vh = 0px */
 width: 640px;    /* 100vw = 640px */
 text-align: center;