Thursday, 31 January 2013

WebVTT Parser Specification IRC Chat

I when onto #whatwg on irc.freenode.org. It is the main chat room for WHATWG. Here is their IRC page. Please note the regulars.

Log: http://krijnhoetmer.nl/irc-logs/whatwg/20130131

  1. # [18:22] <KyleBarnhart> Hi. Is there someone of authority how can definitively answer a question regarding the implementation of the WebVTT text track specification?
  2. # [18:22] <KyleBarnhart> Specifically...
  3. # [18:24] <KyleBarnhart> I am part of a team who is trying to implement the standard. Whereas I take the view that the parser section of the specification is to be adhered to when writing an implementation, they take the view that the implementation is should be written to the syntax rules and the parser section can be ignored.
  4. # [18:25] <Ms2ger> The parser section is the only relevant one for an implementation
  5. # [18:26] <KyleBarnhart> It is one. Not a validator. It is parsing the format for the eventual use in Mozilla.
  6. # [18:26] <Ms2ger> Right
  7. # [18:26] <Ms2ger> Then you want to follow the parsing section
  8. # [18:27] <Ms2ger> I understand your first patches are due today?
  9. # [18:28] <KyleBarnhart> I believe.
  10. # [18:28] <KyleBarnhart> I'm not responsible for that. I'm handling testing at the moment.
  11. # [18:29] <Ms2ger> Great, we'll need quite a few of those :)
  12. # [18:34] <KyleBarnhart> Is it okay if the parser (and I mean in the final future version) differs in some significant ways from the specification? Such as when to and not to discard a cue, and accepting input that the specification would not allow.
  13. # [18:43] <Ms2ger> If the code is to end up in Gecko, it had better follow the specification to the letter
  14. # [18:44] <Hixie> KyleBarnhart: what do they think the spec is for, if not following?
  15. # [18:45] <MikeSmith> inspiration
  16. # [18:46] <Ms2ger> Suggestions

...

  1. # [18:56] <KyleBarnhart> Thank you very much for your answer. I'm going to go now and discuss this with our team.
  2. # [18:56] <Hixie> KyleBarnhart: please don't hesitate to ask any other questions
  3. # [18:57] <KyleBarnhart> Thank you :)
  4. # [18:57] <Velmont> KyleBarnhart: What's also good to know is that specs are never really set in stone.
  5. # [18:58] <Hixie> MikeSmith: when you have a moment and happen to be at bugzilla's admin page again, if you could add a "Needs Research" between "2022 Q4" and "Needs Impl Interest", that'd be great
  6. # [18:58] <Velmont> KyleBarnhart: It should follow what implementations do as well. So if there's something strange there it might be a bug in the spec.
  7. # [18:58] <Hixie> KyleBarnhart: yeah, what Velmont said. If there's a reason to implement something other than the spec, we should change the spec.
  8. # [18:58] <Hixie> KyleBarnhart: the reason to have a spec is to make sure everyone does the same thing, but so long as they all do the same thing, it doesn't really matter what that thing is exactly
  9. # [18:58] <Velmont> KyleBarnhart: Although there is already implementions of that one, so hopefully there shouldn't be too much :-)

WebVTT File Structure Unit Tests Review

As usual I've changed the comments to reflect the specification for the parser. Fortunately a few of the comments were correct and I didn't have to change them.

File Structure Unit Tests Review (filestructure_unittest.cpp)

  • BlankFile
    • Added false to the load function. The parser fails if the first line has less that 6 characters.
    • Since it should fail there should be an error. I've added "WEBVTT_MALFORMED_TAG" error check.
  • BlankFileWithBOM
    • Added false to the load function. The parser fails if the first line has less that 6 characters.
    • Since it should fail there should be an error. I've added "WEBVTT_MALFORMED_TAG" error check.
  • HeaderNoNewLine
    • Nothing wrong. First comment ever that I didn't have to change.
  • MultiCueNoNewlineBetweenCues
    • There should be two cues. Changed load function cue count check to 2.
  • BOMGarbageData
    • This throws an error. Added "WEBVTT_MALFORMED_TAG" error check.
  • BOMTabWebvtt
    • Tab before WEBVTT string throws an error. Added "WEBVTT_MALFORMED_TAG" error check.
  • MissingNewLineBetweenCues
    • Duplicate of "MultiCueNoNewlineBetweenCues" test.
    • Removed test.

WebVTT Cue Timings Unit Test Review

I should note this file had almost 3000 lines.

I've changed all the comments to reflect the parser rules since that is what the tests need to match. I've added comments for any syntax error. The comments should be separate since they are from different parts of the specifications and regarding different things.

Cue Timings Unit Test Review (cuetimes_unittest.cpp)

  • FromOneDigitMinute
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • FromThreeDigitMinute
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • FromOneDigitSecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • FromThreeDigitSecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromTwoDigitMillisecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromFourDigitMillisecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromMinuteGreaterThan59
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromSecondGreaterThan59
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromMillisecondGreaterThan999
    • Technically this is the same test as "FromFourDigitMillisecond", but I left it in.
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • FromBadDelimiterMinuteSecond
    • To for the deliminator between minute and second, the hour must not be 2 digits, otherwise it checked for the deliminator between second and millisecond.
      • Changed hour from 00 to 000 in "bad-delimiter-minute-second.vtt".
  • FromBadDelimiterHourMinute
    • Keep consistant with "FromBadDelimiterMinuteSecond" test.
      • Changed hour from 00 to 000 in "bad-delimiter-hour-minute.vtt".
  • FromBadDelimiterSecondMillisecond
    • Keep consistant with "FromBadDelimiterMinuteSecond" test.
      • Changed hour from 00 to 000 in "bad-delimiter-second-millisecond.vtt".
  • UntilMinuteWithinLowBoundary
    • Minutes in timestamp not consistent with similar tests. It was the same as in "UntilMinuteWithinHighBoundary".
      • Changed "minute-within-low-boundary.vtt"
      • Changed test to match vtt file.
  • UntilMinuteWithinHighBoundary
    • Minutes in timestamp not consistent with similar tests. It was the same as in "UntilMinuteWithinLowBoundary".
      • Changed "minute-within-high-boundary.vtt".
      • Changed test to match vtt file.
  • UntilSecondWithinLowBoundary
    • Seconds in timestamp not consistent with similar tests. It was the same as in "UntilSecondWithinHighBoundary".
      • Changed "second-within-low-boundary.vtt"
      • Changed test to match vtt file.
  • UntilSecondWithinHighBoundary
    • Seconds in timestamp not consistent with similar tests. It was the same as in "UntilSecondWithinLowBoundary".
      • Changed "second-within-high-boundary.vtt".
      • Changed test to match vtt file.
  • UntilMillisecondWithinLowBoundary
    • Milliseconds in timestamp not consistent with similar tests. It was the same as in "UntilMillisecondWithinHighBoundary".
      • Changed "millisecond-within-low-boundary.vtt"
      • Changed test to match vtt file.
  • UntilMillisecondWithinHighBoundary
    • Milliseconds in timestamp not consistent with similar tests. It was the same as in "UntilMillisecondWithinLowBoundary".
      • Changed "millisecond-within-high-boundary.vtt".
      • Changed test to match vtt file.
  • UntilOneDigitMinute
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • UntilThreeDigitMinute
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • UntilOneDigitSecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert.
  • UntilThreeDigitSecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilTwoDigitMillisecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilFourDigitMillisecond
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilMinuteGreaterThan59
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilSecondGreaterThan59
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilMillisecondGreaterThan999
    • Technically this is the same test as "FromFourDigitMillisecond", but I left it in.
    • The cue will be bad and discarded.
      • Changed cues returned from 1 to 0.
      • Removed timestamp assert
  • UntilBadDelimiterMinuteSecond
    • To for the deliminator between minute and second, the hour must not be 2 digits, otherwise it checked for the deliminator between second and millisecond.
      • Changed hour from 00 to 000 in "bad-delimiter-minute-second.vtt".
  • UntilBadDelimiterHourMinute
    • Keep consistant with "UntilBadDelimiterMinuteSecond" test.
      • Changed hour from 00 to 000 in "bad-delimiter-hour-minute.vtt".
  • UntilBadDelimiterSecondMillisecond
    • Keep consistant with "UntilBadDelimiterMinuteSecond" test.
      • Changed hour from 00 to 000 in "bad-delimiter-second-millisecond.vtt".
  • BareMinimum
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • BareMinimumWithContent
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • HighestValues
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • SpaceTabs
    • Duplicate test. Removed. See separator tests.
  • Tabs
    • Duplicate test. Removed. See separator tests.
  • Hours
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • MultiCues
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • NestedCues
    • Added timestamps asserts since the goal of these tests is to test timestamps.
  • MissingSpaces
    • Duplicate test. Removed. See separator tests.
  • MissingSpaceLeft
    • Duplicate test. Removed. See separator tests.
  • MissingSpaceRight
    • Duplicate test. Removed. See separator tests.
  • NoNumsInTimeStamp
    • Duplicate test. Removed. See from and until in timings tests (this file).
  • ArrowsWrongDirection
    • Duplicate test. Removed. See separator tests.
  • BadTimeStamps1
    • Duplicate test. Removed. See deliminator in from and until in timings tests (this file).
  • BadTimeStamps2
    • Duplicate test. Removed. See deliminator in from and until in timings tests (this file).
  • TimeStampHoursOmitted renamed to StartTimeStampHoursOmitted.
  • EndTimeStampHoursOmitted  added.

WebVTT Cue Identifier and Timings Unit Test Reivew

I went over the cue identifier tests and general timings tests again to updated the file for the new repo.

I noticed some important mistakes about treating the second cue differently than the first and I've removed those tests. The first and second cue are treated the same, except if there is no blank line between the first cue's identifier and the WEBVTT string, then the identifier is treated as the file header and the cue is processed as if it had no header.

I also fixed some comments to reflect that the first and second cues are treated the same.

I ran the tests. Many of the tests crash (beyond the 4 I've changed)  since all the DISABLEDs have been removed. So I did not do anything to point out which one crashes.

The following are changed other than removing the unnecessary tests I added.

Cue Identifier Unit Tests

Identifier Line Ending Unit Tests (cilineendings_unittest.cpp)

  • LongStringLF
    • Shortend strings. Long identifier test is seperate. We shouldn't tests two things.
    • Renamed to "TwoLinesLF"
  • LongStringCRLF
    • Shortend strings. Long identifier test is seperate. We shouldn't tests two things.
    • Rename to "TwoLinesCRLF"

Wednesday, 23 January 2013

WebVTT Cue Timings General Unit Test Review

I am splitting these reviews up for one per C++ file. This should make them easier to deal with.

Changed all comments to reflect rules for parsers.

Added checks to make sure the start and end times are correct.

Fixed spelling "Seperator" to "Separator"

Unused vtt files deleted:
  • cue-times/separator/missing_both_spaces_bad.vtt
  • cue-times/separator/missing_separator_bad.vtt

Cue Timings General (ctgenstructure_unittest.cpp)

Changes

  • CueTimingsSeperator1LeftMix
    • Changes test file to actual filename "cuetimings_separator_1_space_left_mix.vtt"
  • CueTimingsSeperator1RightMix
    • Changes test file to actual filename "cuetimings_separator_1_space_right_mix.vtt"
  • CueTimingsSeperatorNTab
    • Changes test file to actual filename "cuetimings_separator_n_tab.vtt"
  • CueTimingsSeperatorNLMix
    • Changes test file to actual filename "cuetimings_separator_n_spaces_left_mix.vtt"
  • CueTimingsSeperatorNRMix
    • Changes test file to actual filename "cuetimings_separator_n_spaces_right_mix.vtt"
  • CueTimingsSeperatorXMix
    • Changes test file to actual filename "cuetimings_separator_n_spaces_tabs_on_both_sides.vtt"

Added

  • CueTimingsSeparatorMissingLeft
    •  Parser does not require whitespace around separator.
    • Added file "cuetimings_separator_0_spaces_left.vtt"
  • CueTimingsSeperatorMissingRight
    •  Parser does not require whitespace around separator.
    • Added file "cuetimings_separator_0_spaces_right.vtt"
  • CueTimingsSeperatorMissingBoth
    •  Parser does not require whitespace around separator.
    • Added file "cuetimings_separator_0_spaces.vtt"
  • CueTimingsArrowMalformed
    • If the first or second line does not contain --> the cue is bad. The parser will discard bad cues.
    • Added file "cuetimings_arrow_bad.vtt"
  • CueTimingsArrowMissing
    • If the first or second line does not contain --> the cue is bad. The parser will discard bad cues.
    • Added file "cuetimings_arrow_missing.vtt"

Failing Test

[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperator1LeftMix
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperator1RightMix
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorNSpace
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorNTab
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorNLMix
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorNRMix
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorXMix
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeparatorMissingLeft
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeparatorMissingRight
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsSeperatorMissingBoth
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsArrowMalformed
[  FAILED  ] CueTimingsGeneralStructure.CueTimingsArrowMissing