Sunday, 14 October 2012

Validation and Parsing WebVTT

(11/28/2012) Bumping this because it is important.

Follow the parsing specifications!

I was just on IRC and we were discussing issues regarding our javascript program and the testing files we have created. I've also noted that there are a number of differences between the parsing section of the WebVTT spec and the syntax section. I haven't really gone over the render section.


Validation is making sure that the file has exactly the correct syntax. No variation from the stated syntax on the WebVTT syntax section is allowed. Validation has no concern with the parsing rules. The programmers of the validator must figure out for themselves how to make sure the syntax of a WebVTT file is entirely correct. To do this they should have a set of validation test.

Compared to what we have already done in our class, the javascript program is for validation, and the tests we have written are only for the validator.

On a side note, my post about WebVTT is accurate to syntax specifications for WebVTT.


Parsing is a different matter. According to Wikipedia a parser "is one of the components in an interpreter or compiler that checks for correct syntax and builds a data structure". It's job is not to check the syntax of a file. It is generally more flexible and permissive regarding syntax. It will do it's best to understand the file, but it may allow for errors by using defaults or discarding the invalid sections. The WebVTT parser will discard cues which are not valid but it won't typically throw an error if it finds something invalid. Think of it like how an html page will still display despite not passing the W3C HTML Validator.

Here are some syntactically incorrect things the parse will allow.
  • Tags (<b>) do not need to close. The tag will apply for the remainder of a cue's textual payload but it will not carry over to the next cue.
  • The header can actually contain newline characters, but not blank lines.
  • Invalid tags are just passed over.
  • Invalid cue settings are just passed over.
One thing that will throw an error is a bad timestamp (example: 0a:00:01.000).

The parser will require a different set of test. It will also likely include thorough unit testing.


Now based on the above information it is my opinion that the "js parser" should be renamed to "js validator". The set of test that we currently have should be in a directory structure that makes clear that these are validator tests. It should also be made clear the the development of the C parser should be clearly marked as separate from the validator and that it is for parsing a WebVTT file into data objects. And a place for test for the parser should be created.

No comments:

Post a comment