Saturday, 27 October 2012

Visual Aid and Wiki for WebVTT

I stumbled across the W3C wiki page for WebVTT. There is a lot of good stuff linked to by it. There is a validator, a review on LeanBack Player, and 6 polyfills. On nice thing is a guide to what WebVTT should do with some pictures. I'm going to post some of the pictures.

http://www.w3.org/community/texttracks/wiki/Main_Page


Caption Model Draft


  • Caption content is displayed and grows in different directions.





  • Caption characters do not move as more characters are displayed.





  • Captions move in different directions depending on positioning.





  • Positioning of the rendering box changed depending on if the caption is to be rendered vertically or horizontally, and if growing right or left.





  • Possible alignments for the text.


Update to WebVTT rules

On October 10th two changes were made to the WebVTT specifications. The first disallows the use of the string --> in the payload of a cue. The second allows the use of the CSS properties opacity and visibility. Both these changes have been applied to my previous post on the specifications.

W3C WebVTT Revision 1.43
W3C WebVTT Revision 1.44

Monday, 15 October 2012

WebVTT Parser Pseudo Code

UPDATE 16/10/2012: After a little discussion on IRC I think I need to make one point clear. My mention of OO, usage of the word "extends", or any other phrase, does not mean the parser needs to be implemented in such a manner. I used such terms because I thought it best presented the meaning of the spec. However any part can be implemented in any way that has essentially the same result. In fact the data returned from the parse is only to be used for the render stage, also defined in the spec. Now it is possible that someone might use the parser in a different program that does rendering differently, but we shouldn't worry about that. We are only responsible for rendering WebVTT as specified in the spec. I wrote this to make it easier for people to make an implementation of the WebVTT spec because I think it is easier to read  and understand pseudo code than to read and understand the spec.



A number of people are working on building parsers for WebVTT. Since I'm quite familiar with the WebVTT spec, I figure can help by making a rough outline of how it should look like in an object oriented language.

The file is also available here.

Important Points

  • The input string in the main method or the parser (parse) must have the next character appended from the byte stream before the position can be advanced. If the next character in the byte stream has not been appended to the input string, the parser must wait until the next character is appended or the byte stream has finished.
  • All the arguments or functions and methods that I wrote in pseudo code are by reference. This is important because position is often an argument and is changed in the functions and methods.
  • This is basically a guide written as close to the WebVTT spec as I could. It is likely that a number of changes will need to be made.
  • I wrote this fairly quickly, so it is good to check what the specs say.
  • I varied between declaring variable at the start of a section or when it was required. It is better to do only one, so take your pick.
  • We will also want to add comments.
  • Most important, remember that the characters are Unicode. You may not be able to test characters as I wrote is ("b" for character b). 

Some Very Pseudo Code

The root method/function is parse().
I used function if it returns something and method if it does not. Although method is actually a class function I wanted a way to more easily distinguish between the two. Although there is no reason why you couldn't change the way it works to make it do so. Just remember as I wrote this all arguments passed are by reference.

class TextTrackCue
    Not really sure how this integrates
    See http://dev.w3.org/html5/spec/media-elements.html#text-track-cue for details.

class Node

class InternalNode extends Node
    OrderedList children
    OrderedList classNames

class LeafNode extends Node

class ClassNode extends InternalNode

class ItalicsNode extends InternalNode

class BoldNode extends InternalNode

class UnderlineNode extends InternalNode

class RubyNode extends InternalNode

class RubyTextNode extends InternalNode

class VoiceNode extends InternalNode
    String voiceName

class TextNode extends LeafNode
    String text

class TimestampNode extends LeafNode
    Float timestampSeconds

class Token

class StringToken extends Token
    String value

class StartTagToken extends Token
    String tagName
    OrderedList classes
    String annotation

class EndTagToken extends Token
    String tagName

class TimestampTagToken extends Token
    Float value
Method parse (ByteStream byteStreamInput, OrderedList output)
   String input = convert asynchronous byteStreamInput to Unicode
   
   replace NULL characters with REPLACEMENT Ccharacters
   replace CARRIAGE RETURN LINE FEED (CRLF) character pairs with single LINE FEED
   replace CARRIAGE RETURN characters with LINE FEED characters
   
   Integer position = start of input
   
   If character as position is BYTE ORDER MARK
      advancePosition(input, position)
   End If
   
   String line
   Boolean alreadyCollectedLine = False
   
   line = collectLine(input, position)
   
   If line has less than 6 characters
      Throw Error
   End If
   
   If line has exactly 6 charaters and is not "WEBVTT"
      Throw Error
   End If
   
   If line has more than 6 characters
   and (first 6 characters not "WEBVTT" or (7th character not SPACE or TAB))
      Throw Error 
   End If
   
   If position is past end of input
      return
   End If
   
   If character as position in input is LINE FEED
      advancePosition(input, position)
   End If
   
   # Header
   Do
      line = collectLine(input, position)
      
      If position is past end of input
         return
      End If
      
      If character as position in input is LINE FEED
         advancePosition(input, position)
      End If
      
      If line contains "-->"
         alreadyCollectedLine = True
         Exit While Loop
      End If
   While line is empty
   
   # Cue Loop
   Loop
      If alreadyCollectedLine is False
         While character in input as position is LINE FEED
            advancePosition(input, position)
         End While
         
         line = collectLine(input, position)
         
         If line is empty
            Exit Loop
         End If
      End If

      TextTrackCue cue = new TextTrackCue()
      
      cue.identifier = empty
      cue.pauseOnExit = False
      cue.writingDirection = horizontal
      cue.snapToLines = True
      cue.linePosition = auto
      cue.textPosition = 50
      cue.size = 100
      cue.alignment = middleAlignment
      cue.text = empty
      
      If line does not contain "-->"
         cue.identifier = line
         
         If position is past end of input
            Exit Loop
         End If
         
         If character as position in input is LINE FEED
            advancePosition(input, position)
         End If
         
         line = collectLine(input, position)
         
         If line is empty
            Exit Loop
         End IF
      End If
      
      alreadyCollectedLine = False
      
      Try
         collectCueTimingsAndSettings(line, cue)
      Catch
         Boolean end = False
         
         # Bad cue loop
         Loop
            If position is past end of input
               end = true
               Exit Loop
            End If
            
            If character as position in input is LINE FEED
               advancePosition(input, position)
            End If
            
            line = collectLine(input, position)
            
            If line contains "-->"
               alreadyCollectedLine = True
               Exit Loop
            End If
            
            If line is empty
               Exit Loop
            End If
         End Loop
         
         If end is true
            Exit Loop
         End If
         
         Continue Loop
      End Try

      String cueText = empty
      
      # Cue text loop
      Loop
         If position is past end of input
            Exit Loop
         End If
         
         If character as position in input is LINE FEED
            advancePosition(input, position)
         End If
         
         line = collectLine(input, position)
         
         If line is empty
            Exit Loop
         End If
         
         If line contains "-->"
            alreadyCollectedLine = True
            Exit Loop
         End If
         
         If cueText is not empty
            cueText += LINE FEED
         End If
         
         cueText += line
      End Loop
      
      # Cue text processing
      cue.text = cueTextDomContruction(parseCueText(cueText))
      
      output append cue
   End Loop
End Method parse
Method advancePosition(String input, Integer position)
   If position is at the end of input and bystream has not ended
      Wait for bytestream to add characters to input
   End If
   
   If bytestream has ended and next position is past end if input
      position = past end of input
   Else
      position = location of next character sin input
   End IF
End Method advancePosition
Function String collectLine(String input, Integer position)
   String result = empty
   
   While position not past end of input and character in input at position not LINE FEED
      result += character in input at position
      advancePosition(input, position)
   End While
   
   return result
End Function collectLine
Method collectCueTimingsAndSettings(String input, TextTrackCue cue)
   String remainder
   Integer position
   
   position = start of input
   
   skipWhitespace(input, position)
   
   cue.startTime = collectTimestamp(input, position)
   
   skipWhitespace(input, position)
   
   If character at position in input is not "-"
      Throw Error
   Else
      position = location of next character in input
   End If
   
   If character at position in input is not "-"
      Throw Error
   Else
      position = location of next character in input
   End If
   
   If character at position in input is not ">"
      Throw Error
   Else
      position = location of next character in input
   End If
   
   skipWhitespace(input, position)
   
   cue.endTime = collectTimestamp(input, position)
   
   String remainder = remainder of input starting at position
   
   parseSettings(remainder, TextTrackCue cue)
End Method
# Defined in http://dev.w3.org/html5/spec/common-microsyntaxes.html#common-parser-idioms
Method skipWhitespace(String input, Integer position)
   While character in input at position is SPACE or TAB OR LINE FEED or FORM FEED or CARRIAGE RETURN
      position = location of next character in input
   End While
End Method
Method parseSettings(String input, TextTrackCue cue)
   OrderedList settings = input split on SPACE and TAB
   
   For Each String setting in settings
      If setting does not contain ":" or first or last character in setting is ":"
         Next setting
      End If
      
      String name = substring of setting between start or setting and first ":"
      
      String value = substring of setting between first ":" and end of setting
      
      Switch (name)
         Case "vertical"
            If value is "rl"
               cue.writingDirection = verticalGrowingLeft
            End If
            
            If value is "lr"
               cue.writingDirection = verticalGrowingRight
            End If
            
            Break
         
         Case "line"
            If value conatains characters other than "-", "%", or "0" through "9"
               Break
            End If
            
            If value does not contaion at least on character between "0" through "9"
               Break
            End If
            
            If any character in value other than the first is "-"
               Break
            End If
            
            If any character in value other than the last is "%"
               Break
            End If
            
            Integer number = parse substring of value excluding trailing "%" as a signed integer
            
            If last character in value is "%" and (number < 0 or number > 100)
               Break
            End If
            
            cue.linePosition = number
            
            If last character in value is "%"
               cue.snapToLines = True
            End If
            
            Break
            
         Case "position"
            If value conatains characters other than "%" or "0" through "9"
               Break
            End If
            
            If value does not contaion at least on character between "0" through "9"
               Break
            End If
            
            If any character in value other than the last is "%"
               Break
            End If
            
            If last character in value is not "%"
               Break
            End If
            
            Integer number = parse substring of value excluding trailing "%" as a signed integer
            
            If number < 0 or number > 100
               Break
            End If
            
            cue.textPosition = number
            
            Break
            
         Case "size"
            If value conatains characters other than "%" or "0" through "9"
               Break
            End If
            
            If value does not contaion at least on character between "0" through "9"
               Break
            End If
            
            If any character in value other than the last is "%"
               Break
            End If
            
            If last character in value is not "%"
               Break
            End If
            
            Integer number = parse substring of value excluding trailing "%" as a signed integer
            
            If number < 0 or number > 100
               Break
            End If
            
            cue.size = number
            
            Break
            
         Case "align"
            If value is "start"
               cue.alignment = startAlignment
            End If
            
            If value is "middle"
               cue.alignment = middleAlignment
            End If
            
            If value is "end"
               cue.alignment = endAlignment
            End If
            
            If value is "left"
               cue.alignment = leftAlignment
            End If
            
            If value is "right"
               cue.alignment = rightAlignment
            End If
            
            Break
      End Switch (name)
   End For Each setting
End Method parseSettings
Function Float collectTimestamp(String input, Integer position)
   Enumerable SignificantUnits
      Minutes
      Hours
   End Enumberable
   
   Integer value1, value2, value3, value4
   String string
   SignificantUnits mostSignificantUnits
   
   mostSignificantUnits = Minutes
   
   If position is past end of input
      Throw Error
   End If
   
   If character as position in input is not "0" through "9"
      Throw Error
   End If
   
   string = collectDigits(input, position)
   
   value1 = parse string to integer
   
   If string not exactly two characters or value > 59 then
      mostSignificantUnits = Hours
   End If
   
   If position is past end of input or character in input at position is not ":"
      Throw Error
   Else
      position = location of next character in input
   End If
   
   string = collectDigits(input, position)
   
   If string not exactly two characters
      Throw Error
   End If
   
   value2 = parse string to integer
   
   If mostSignificantUnits = Hours
   or (position not past end of input and character as position in input is ":")
      If position is past end of input or character in input at position is not ":"
         Throw Error
      Else
         position = location of next character in input
      End If
      
      string = collectDigits(input, position)
      
      If string not exactly two characters
         Throw Error
      End If
      
      value3 = parse string to integer
   Else
      value3 = value2
      value2 = value1
      value1 = 0
   End If
   
   If position is past end of input or character in input at position is not "."
      Throw Error
   Else
      position = location of next character in input
   End If
   
   string = collectDigits(input, position)
   
   If string not exactly three characters
      Throw Error
   End If
   
   value4 = parse string to integer
   
   If value2 > 59 or value3 > 59
      Throw Error
   End If
   
   return value1 * 60 * 60 + value2 * 60 + value3 + value4 / 1000
End Function collectTimestamp
Function String collectDigits(String input, Integer position)
   String result = empty
   
   While position not past end of input and character in input at position is "0" through "9"
      result += character in input at position
      position = location of next character in input
   End While
   
   return result
End Function collectLine
Function OrderedList parseCueText (String input)
   Integer position = start of input
   OrderedList result = empty
   InternalNode current = new InternalNode

   Loop
      If position is past End of input
         return new StringToken(result)
      End If
      
      Token token = cueTextTokenizer(input, position)
      
      Switch (typeof(token))
         Case StringToken
            current.children append new TextNode(text: token.value)
            Break
            
         Case StartTagToken
            Switch (token.tagName)
               Case "c"
                  ClassNode node = new ClassNode()
                  appendClassesToNode(node, token)
                  current.children append node
                  current = node
                  Break
               
               Case "i"
                  ItalicsNode node = new ItalicsNode()
                  appendClassesToNode(node, token)
                  current.children append node
                  current = node
                  Break
                  
               Case "b"
                  BoldNode node = new BoldNode()
                  appendClassesToNode(node, token)
                  current.children append node
                  current = node
                  Break
                  
               Case "u"
                  UnderlineNode node = new UnderlineNode()
                  appendClassesToNode(node, token)
                  current.children append node
                  current = node
                  Break
                  
               Case "ruby"
                  RubyNode node = new RubyNode()
                  appendClassesToNode(node, token)
                  current.children append node
                  current = node
                  Break
                  
               Case "rt"
                  If typeof(current) is RubyNode
                     RubyTextNode node = new RubyTextNode()
                     appendClassesToNode(node, token)
                     current.children append node
                     current = node
                  End If
                  Break
               
               Case "v"
                  VoiceNode node = new VoiceNode()
                  appendClassesToNode(node, token)
                  
                  If token.annotation is not null
                     node.annotation = token.annotation
                  else
                     node.annotation = empty
                  End If
                  
                  current.children append node
                  current = node
                  Break
            End Switch (token.tagName)
            
            Break
            
         Case EndTagToken
            If (token.tagName is "c" And typeof(current) is ClassNode)
            Or (token.tagName is "i" And typeof(current) is ItalicsNode)
            Or (token.tagName is "b" And typeof(current) is BoldNode)
            Or (token.tagName is "u" And typeof(current) is UnderlineNode)
            Or (token.tagName is "ruby" And typeof(current) is RubyNode)
            Or (token.tagName is "rt" And typeof(current) is RubyTextNode)
            Or (token.tagName is "v" And typeof(current) is VoiceNode)
               current = parent of current
            else If token.tagName is "ruby" And typeof(current) is RubyTextNode
               current = parent of parent of current
            End If
            
            Break
         
         Case TimestampTagToken
            
      End Switch (token)
   End loop
End Function ParserMain
Method appendClassesToNode(InternalNode node, Token token)
   for each className in token.classes
      If className not empty
         node.classes append className
      End If
   End for
End Method appendClassesToNode
Function Token cueTextTokenizer(String input, Integer position)
   Enumerable TokenizerStates
      dataState
      escapeState
      tagState
      startTagState
      startTagClassState
      startTagAnnotationState
      EndTagState
      timestampTagState
   End Enumerable
   
   TokenizerStates tokenizerState = dataState
   String result = empty
   String buffer = empty
   OrderedList classes = empty
   Character c
   
   loop
      If position is past End of input
         c = End of file marker
      else
         c = character in input indiciated by position
      End If
      
      Switch (tokenizerState)
         Case dataState
            Switch (c)
               Case "&"
                  buffer = c
                  tokenizerState = escapeState
                  Break
               
               Case "<"
                  If result is empty
                     tokenizerState = tagState
                  else
                     return new StringToken(result)
                  End If
                  Break
               
               Case End-OF-FILE MARKER
                  return new StringToken(result)
                  Break
               
               default
                  result += c
            End Switch (c)
            
            Break
         
         Case escapeState
            Switch (c)
               Case "&"
                  result += buffer
                  buffer = c
                  Break
               
               Case "0" to "9"
               Case "a" to "z"
               Case "A" to "Z"
                  buffer += c
                  Break
               
               Case ";"
                  Switch (buffer)
                     Case "&amp"
                        result += "&"
                        Break
                     
                     Case "&lt"
                        result += "<"
                        Break
                     
                     Case "&gt"
                        result += ">"
                        Break
                     
                     Case "&lrm"
                        result += LEFT-TO-RIGHT MARK
                        Break
                     
                     Case "&rlm"
                        result += RIGHT-TO-LEFT MARK
                        Break
                     
                     Case "&nbsp"
                        result += NO-Break SPACE
                        Break
                     
                     default
                        result += buffer + ";"
                  End Switch (buffer)
                  
                  tokenizerState = dataState
                  Break
               
               Case "<"
               Case End-OF-FILE MARKER
                  result += buffer
                  return new StringToken(value: result)
                  Break
               
               default
                  result += buffer
                  result += c
                  tokenizerState = dataState
            End Switch (c)
            
            Break
         
         Case tagState
            Switch (c)
               Case TAB
               Case LINE FEED
               Case FROM FEED
               Case SPACE
                  tokenizerState = startTagAnnotationState
                  Break
                  
               Case "."
                  tokenizerState = startTagClassState
                  Break
               
               Case "/"
                  tokenizerState = EndTagState
                  Break
               
               Case "0" to "9"
                  result = c
                  tokenizerState = timestampTagState
                  Break
                  
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  return new StartTagToken(tagName: empty)
                  Break
               
               default
                  result = c
                  tokenizerState = startTagState
            End Switch (c)
            
            Break
            
         Case startTagState
            Switch (c)
               Case TAB
               Case LINE FEED
               Case SPACE
                  tokenizerState = startTagAnnotationState
                  Break
                  
               Case FROM FEED
                  buffer = c
                  tokenizerState = startTagAnnotationState
                  Break
                  
               Case "."
                  tokenizerState = startTagClassState
                  Break
                  
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  return new StartTagToken(tagName: result)
                  Break
               
               default
                  result += c
            End Switch (c)
            
            Break
         
         Case startTagClassState
            Switch (c)
               Case TAB
               Case LINE FEED
               Case SPACE
                  classes append buffer
                  buffer = empty
                  tokenizerState = startTagAnnotationState
                  Break
                  
               Case FROM FEED
                  classes append buffer
                  buffer = c
                  tokenizerState = startTagAnnotationState
                  Break
                  
               Case "."
                  classes append buffer
                  buffer = empty
                  Break
                  
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  classes append buffer
                  return new StartTagToken(tagName:result, classes: classes)
                  Break
               
               default
                  buffer += c
            End Switch (c)
            
            Break
         
         Case startTagAnnotationState
            Switch (c)               
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  remove leading and trailing space characters from buffer
                  replace sequences of one or more consecutive space characters with a single SPACE
                  return new StartTagToken(tageName: result, classes: classes, annotation: buffer)
                  Break
               
               default
                  buffer += c
            End Switch (c)
            
            Break
         
         Case EndTagState
            Switch (c)
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  return new EndTagToken(tagName: result)
                  Break
               
               default
                  result += c
            End Switch (c)
            
            Break
         
         Case timestampTagState
            Switch (c)
               Case ">"
                  position = location of next character in input
               
               Case End-OF-FILE MARKER
                  return new TimestampTagToken(tagName: result)
                  Break
               
               default
                  result += c
            End Switch (c)
            
            Break   
      End Switch (tokenizerState)
      
      position = location of next character in input
   End loop
End Function cueTextTokenizer
Function Tree cueTextDomContruction(OrderedList nodes)
   # Unsure how to do this at this post
   # Refer to http://dev.w3.org/html5/webvtt/#webvtt-cue-text-dom-construction-rules
End Function cueTextDomContruction

Sunday, 14 October 2012

Validation and Parsing WebVTT

(11/28/2012) Bumping this because it is important.

Follow the parsing specifications!

I was just on IRC and we were discussing issues regarding our javascript program and the testing files we have created. I've also noted that there are a number of differences between the parsing section of the WebVTT spec and the syntax section. I haven't really gone over the render section.

Validator

Validation is making sure that the file has exactly the correct syntax. No variation from the stated syntax on the WebVTT syntax section is allowed. Validation has no concern with the parsing rules. The programmers of the validator must figure out for themselves how to make sure the syntax of a WebVTT file is entirely correct. To do this they should have a set of validation test.

Compared to what we have already done in our class, the javascript program is for validation, and the tests we have written are only for the validator.

On a side note, my post about WebVTT is accurate to syntax specifications for WebVTT.

Parsing

Parsing is a different matter. According to Wikipedia a parser "is one of the components in an interpreter or compiler that checks for correct syntax and builds a data structure". It's job is not to check the syntax of a file. It is generally more flexible and permissive regarding syntax. It will do it's best to understand the file, but it may allow for errors by using defaults or discarding the invalid sections. The WebVTT parser will discard cues which are not valid but it won't typically throw an error if it finds something invalid. Think of it like how an html page will still display despite not passing the W3C HTML Validator.

Here are some syntactically incorrect things the parse will allow.
  • Tags (<b>) do not need to close. The tag will apply for the remainder of a cue's textual payload but it will not carry over to the next cue.
  • The header can actually contain newline characters, but not blank lines.
  • Invalid tags are just passed over.
  • Invalid cue settings are just passed over.
One thing that will throw an error is a bad timestamp (example: 0a:00:01.000).

The parser will require a different set of test. It will also likely include thorough unit testing.

Recommendations

Now based on the above information it is my opinion that the "js parser" should be renamed to "js validator". The set of test that we currently have should be in a directory structure that makes clear that these are validator tests. It should also be made clear the the development of the C parser should be clearly marked as separate from the validator and that it is for parsing a WebVTT file into data objects. And a place for test for the parser should be created.

Wednesday, 10 October 2012

WebVTT Test Bugs

First, from the work I just did I've made two correction to my WebVTT specification post.

I've gone over all the fail tests for WebVTT that we have made in our open source course DPS909. I've found many problems between the WebVTT syntax and parsing sections. I expect changes will have to be made to both parts.

Test errors

I only checked if the parser will pass or fail, not if things work as intended (they probably don’t). Most of the time malformed expressions are simply skipped.

webvtt / test / spec / bad / tc2003-no_bank_line_before_cue.test
Test will pass. Second cue will be treated as part of payload for first cue.

webvtt / test / spec / bad / tc2004-cue_id_cannot_be_standalone.test
Test will pass. The blank line after “1” will cause parser to simply skip the “1” line.

webvtt / test / spec / bad / tc3050-missing_spaces.test
webvtt / test / spec / bad / tc3051-missing_space_left.test
webvtt / test / spec / bad / tc3052-missing_space_right.test
Test result unknown. The syntax states one space character is required. The “skip whitespace” step allows for zero whitespace.

webvtt / test / spec / bad / tc3054-arrows_wrong_direction.test
Test will pass. “<--” line will be treated as additional header information. When parser hits end of line, it will end without error.

webvtt / test / spec / bad / tc3203-missing_time_cue_symbol.test
Test will pass. “00:00.000 00:00.001” line will be treated as additional header information. When parser hits end of line, it will end without error.

webvtt / test / spec / bad / tc3210-invalid_cue_spacing.test
Test result unknown. The syntax states one space character is required. The “skip whitespace” step allows for zero whitespace.

webvtt / test / spec / bad / tc3211-random_symbols_in_cue.test
webvtt / test / spec / bad / tc3053-nonnums_in_timestamp.test
Tests same things. Should have tests for “00:00:00.00a”, “00:00:0a.000”, “00:0a:00.000”, “0a:00:00.000”, “00:00.00a”, “00:0a.000”, “0a:00.000”

webvtt / test / spec / bad / tc4002-cue_settings_vertical_02_bad.test
Test will pass. “vertical:lr” is valid.

webvtt / test / spec / bad / tc4003-cue_settings_vertical_03_bad.test
webvtt / test / spec / bad / tc4005-cue_settings_vertical_05_bad.test
webvtt / test / spec / bad / tc4006-cue_settings_vertical_06_bad.test
webvtt / test / spec / bad / tc4007-cue_settings_vertical_07_bad.test
Test will pass. Parser skips settings without “:” or where “:” is first or last character in setting.

webvtt / test / spec / bad / tc4004-cue_settings_vertical_04_bad.test
Test will pass. Parser will ignore settings that do not match valid options.

webvtt / test / spec / bad / tc4008-cue_settings_line_01_bad.test
webvtt / test / spec / bad / tc4010-cue_settings_line_03_bad.test
Test will pass. Parser will ignore settings that do not match valid options.

webvtt / test / spec / bad / tc4009-cue_settings_line_02_bad.test
webvtt / test / spec / bad / tc4012-cue_settings_line_05_bad.test
webvtt / test / spec / bad / tc4013-cue_settings_line_06_bad.test
webvtt / test / spec / bad / tc4014-cue_settings_line_07_bad.test
webvtt / test / spec / bad / tc4015-cue_settings_line_08_bad.test
Test will pass. Parser skips settings without “:” or where “:” is first or last character in setting.

webvtt / test / spec / bad / tc4016-cue_settings_multi_01_bad.test
Test will pass. Parser does not check for duplicate settings. It will use the last setting, earlier ones get overridden.

webvtt / test / spec / bad / tc4017-cue_settings_multi_02_bad.test
Test will pass. Parser splits only on the first colon of each setting. The second half, which contains a colon, will be treated as the value, and since it isn’t a valid value the setting will be skipped.

webvtt / test / spec / bad / tc4019-cue_settings_delimiter_bad_02.test
Test will pass. Every charcter after the end time offset timestamp is used for settings, and the settings first splits on spaces, so not space after the timestamp has no effect. “^line” is not a valid option, so the parser just skips it.

webvtt / test / spec / bad / tc4503-cue_settings_align_bad_value.test
Test will pass. Parser will ignore settings that do not match valid options.

webvtt / test / spec / bad / tc4504-cue_settings_align_no_colon.test
webvtt / test / spec / bad / tc4505-cue_settings_align_no_setting.test
webvtt / test / spec / bad / tc4506-cue_settings_align_no_value.test
webvtt / test / spec / bad / tc4507-cue_settings_align_wrong_colon.test
webvtt / test / spec / bad / tc4603-cue_settings_position_no_colon.test
webvtt / test / spec / bad / tc4605-cue_settings_position_size_no_setting.test
webvtt / test / spec / bad / tc4606-cue_settings_position_wrong_colon.test
Test will pass. Parser skips settings without “:” or where “:” is first or last character in setting.

webvtt / test / spec / bad / tc4604-cue_settings_position_no_percent.test
Test will pass. Parser will ignore settings that do not match valid options.

webvtt / test / spec / bad / tc4703-cue_settings_size_no_colon.test
webvtt / test / spec / bad / tc4705-cue_settings_size_wrong_colon.test
Test will pass. Parser skips settings without “:” or where “:” is first or last character in setting.

webvtt / test / spec / bad / tc4704-cue_settings_size_no_percent.test
Test will pass. Parser will ignore settings that do not match valid options.

webvtt / test / spec / bad / tc5028-cue_text_format.test
Test will pass. Parser will consider “<i are</i>” a single tag with name “i” and annotation “are</i”. The annotation will be ignored and the remainder of the text in the payload will in the WebVTT Italic Objects (it will be italicized).

webvtt / test / spec / bad / tc5029-cue_text_format.test
Test will pass. Parser ignores tags that do not match valid names.

webvtt / test / spec / bad / tc5030-cue_text_format.test
Test will pass. Parser will return tag name at either “>” or end of input (they are treated the same). Parser ignores tags that do not match valid names. Note that in end tags there are no annotation so “i in New York City” will be considered the tag name.

webvtt / test / spec / bad / tc5031-cue_text_format.test
Test will pass. Parser does not require end tags. Parser creates a node object after tag and make current object it’s parent. Parser return root object list when it reaches the end of payload text.

webvtt / test / spec / bad / tc5032-cue_text_format.test
Test will pass. “>” outside of tag is treated normally (same as &gt;).

webvtt / test / spec / bad / tc5033-cue_text_format.test
Test will pass. Parser will return empty string for tag name if start tage reaches end of payload input before “>”. Parser will ignore tags with names that don’t match valid names.

webvtt / test / spec / bad / tc5034-cue_text_format.test
Test will pass. Parser will ignores end tags where previous tag is not the matching start tag. “</b>” will end and the rest will retain italicized.

webvtt / test / spec / bad / tc5035-disallow_annotation_italic.test
webvtt / test / spec / bad / tc5036-disallow_annotation_underline.test
webvtt / test / spec / bad / tc5037-disallow_annotation_bold.test
webvtt / test / spec / bad / tc5038-disallow_annotation_class.test
webvtt / test / spec / bad / tc5039-disallow_annotation_ruby.test
Test will pass. Parser allows annotations on all tag but will ignore annotations except those on voice tags (“<v> tags”).

webvtt / test / spec / bad / tc5040-disallow_annotation_time_stamp.test
Test will pass. Parser ignores timestamp tags with characters following the timestamp within the tag.

webvtt / test / spec / bad / tc5043-incorrect_cue_class.test
Test will pass. Parser ignores tags that do not match valid names.

webvtt / test / spec / bad / tc5057-incorrect_space_character_escape_nbp.test
webvtt / test / spec / bad / tc5058-incorrect_space_character_escape_nsp.test
webvtt / test / spec / bad / tc5059-incorrect_space_character_escape_bsp.test
webvtt / test / spec / bad / tc5060-incorrect_space_character_escape_bp.test
webvtt / test / spec / bad / tc5061-incorrect_space_character_escape_b.test
webvtt / test / spec / bad / tc5062-incorrect_space_character_escape_s.test
webvtt / test / spec / bad / tc5063-incorrect_space_character_escape_ns.test
webvtt / test / spec / bad / tc5064-incorrect_space_character_escape_np.test
webvtt / test / spec / bad / tc5065-incorrect_space_character_escape_sp.test
webvtt / test / spec / bad / tc5067-incorrect_space_character_escape_p.test
webvtt / test / spec / bad / tc5068-incorrect_space_character_escape_bs.test
webvtt / test / spec / bad / tc5069-incorrect_left_to_right_character_escape_lr.test
webvtt / test / spec / bad / tc5070-incorrect_left_to_right_character_escape_l.test
webvtt / test / spec / bad / tc5071-incorrect_left_to_right_character_escape_lm.test
webvtt / test / spec / bad / tc5072-incorrect_left_to_right_character_escape_rm.test
webvtt / test / spec / bad / tc5073-incorrect_left_to_right_character_escape_m.test
webvtt / test / spec / bad / tc5074-incorrect_left_to_right_character_escape_r.test
webvtt / test / spec / bad / tc5075-incorrect_right_to_left_character_escape_lm.test
webvtt / test / spec / bad / tc5077-incorrect_left_to_right_character_escape_rl.test
webvtt / test / spec / bad / tc5078-incorrect_right_to_left_character_escape_rl.test
webvtt / test / spec / bad / tc5079-incorrect_ampersand_without_escape.test
webvtt / test / spec / bad / tc5080-incorrect_ampersand_escape_a.test
webvtt / test / spec / bad / tc5081-incorrect_ampersand_escape_am.test
webvtt / test / spec / bad / tc5082-incorrect_ampersand_escape_mp.test
webvtt / test / spec / bad / tc5083-incorrect_ampersand_escape_p.test
webvtt / test / spec / bad / tc5084-incorrect_ampersand_escape_ap.test
webvtt / test / spec / bad / tc5085-incorrect_less_than_escape_l.test
webvtt / test / spec / bad / tc5086-incorrect_less_than_escape_t.test
webvtt / test / spec / bad / tc5087-incorrect_greater_than_escape_g.test
webvtt / test / spec / bad / tc5088-incorrect_space_character_escape_nbs.test
webvtt / test / spec / bad / tc5089-incorrect_space_character_escape_nb.test
webvtt / test / spec / bad / tc5090-incorrect_space_character_escape_n.test
Test will pass. Parser will treat invalid escape sequence as normal text.
webvtt / test / spec / bad / tc5115-multi_component_multi_line_bad_crlf.test
webvtt / test / spec / bad / tc5116-multi_component_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5117-multi_component_multi_line_bad_lf.test
webvtt / test / spec / bad / tc5118-multi_escape_multi_line_bad_crlf.test
webvtt / test / spec / bad / tc5119-multi_escape_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5120-multi_escape_multi_line_bad_lf.test
webvtt / test / spec / bad / tc5121-component_and_escape_multi_line_bad_crlf.test
webvtt / test / spec / bad / tc5122-component_and_escape_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5123-component_and_escape_multi_line_bad_lf.test
Test will pass. Parser discards malformed cues.

webvtt / test / spec / bad / tc5116-multi_component_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5119-multi_escape_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5122-component_and_escape_multi_line_bad_cr.test
webvtt / test / spec / bad / tc5123-component_and_escape_multi_line_bad_lf.test
Python comment is badly written. Github fails to display it properly.

webvtt / test / spec / bad / tc5301-voice_cue_no_annotation.test
Test will pass. Parser will use an empty string if there is no annotation.

webvtt / test / spec / bad / tc5306-ruby_missing_open_tag.test
Test will pass. Parser will skip rt tags not contained by a ruby tag. Parser will skip end tags where the previous tag was not the matching start tag.
webvtt / test / spec / bad / tc5307-ruby_missing_close_tag.test
Test will pass. Parser does not require end tags. Parser creates a node object after tag and make current object it’s parent. Parser return root object list when it reaches the end of payload text.

webvtt / test / spec / bad / tc5324-ruby_multiple_rt_optional_line_terminator.test
Test will pass. Cue will end before “</ruby>” and “</ruby>” will be considered a new cue, which will be discarded.

webvtt / test / spec / bad / tc5325-timestamp_less_than_previous.test
webvtt / test / spec / bad / tc5326-timestamp_less_than_start_timestamp.test
webvtt / test / spec / bad / tc5327-timestamp_greater_than_end_timestamp.test
Test is unknown. Parser does not check if timestamps in timestamp tags or cue timings are in the order specified in the syntax section of the spec. If the parser is not changed, then according to my reading of the HTML5 media spec, either the cues and timestamps are are sorted into the correct order, or bad ones are discarded.

webvtt / test / spec / bad / tc_1006_header-no-new-line.test
Test will pass. Parser allows newline in the header. A blank line (two newlines in a row) will end the header and start the cues. However, if the line contains the string “-->”, the parser will take that to be the timing line of the first cue.

webvtt / test / spec / bad / tc_1010_new-linestart.test
Test will pass. Parser does not require a newline, or anything, after “WEBVTT”.

webvtt / test / spec / bad / tc_1016_missing_new_line.test
Test will pass. Text payload can be over more than one line, but not blank lines.

webvtt / test / spec / bad / tc_1017_line_break.test
webvtt / test / spec / bad / tc_1018_missing_cue_identifier.test
Test will pass. Malformed cues are discarded.

webvtt / test / spec / bad / tc_1019_missing_req_blank_line.test
Test will pass. Second line has “-->” string and will be taken as start of a cue, with that line being the timing line.