Missing texts and invalid characters #3

Dev-iL · 2017-08-22T07:53:37Z

Following this pull-request comment, please find attached two files (with renamed extensions), that can help reproduce/demonstrate the problem:

Some noticeable problems:

The 1st subtitle (line 9 in the .caption; line 3 in the .srt) gets a line break at ].
The 2nd subtitle (ln 13 / ln 7) has a b=+ that shouldn't be there.
After subtitle 25 (103 / 109), the .srt file has some intermediate characters between the subtitles.
(...and more of the above).

The text was updated successfully, but these errors were encountered:

Dev-iL · 2017-08-22T12:02:19Z

I have simplified the preparesrt logic somewhat, and now it seems to work properly. Before I submit a PR, here's my version:

        public string PrepareSrt()
        {
            const int METADATA_LINES = 7, CHARS_BEFORE_TIMESTAMP = 13, CHARS_AFTER_TIMESTAMP = 14;
            //read all file in memory
            string content = File.ReadAllText(filePath);

            // Discard the first lines, containing metadata used by Lynda desktop app to link subtitle to video:
            string output = RemoveFirstLines(content, METADATA_LINES);

            // Before every timestamp we have a constant amount of characters (starting by [NUL][SOH] and ending with a newline)
            output = Regex.Replace(output, @"\u0000\u0001[\s\S]{" + CHARS_BEFORE_TIMESTAMP + "}[\r\n]*", "");
            
            // After every timestamp we also have a constant amount of characters:
            output = Regex.Replace(output, @"(?<=\[\d\d:\d\d:\d\d\.\d\d\])[\s\S]{" + CHARS_AFTER_TIMESTAMP + "}", "");

            // Cleanup remaining non-UTF8 ASCII chars:            
            output = Regex.Replace(output, @"[^\u0020-\u007F \u000D\n\r]+", "");

            return output;
        }

mdomnita · 2017-08-24T13:29:43Z

Thanks, @Dev-iL . I stopped working on this because my Lynda free subscription ended a while ago and I don't get free subscription from my new workplace. Maybe I will make a new one on another email address and check out if it still works.

Dev-iL mentioned this issue Aug 22, 2017

Addressing the extra characters issue #4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing texts and invalid characters #3

Missing texts and invalid characters #3

Dev-iL commented Aug 22, 2017

Dev-iL commented Aug 22, 2017

mdomnita commented Aug 24, 2017

Missing texts and invalid characters #3

Missing texts and invalid characters #3

Comments

Dev-iL commented Aug 22, 2017

Dev-iL commented Aug 22, 2017

mdomnita commented Aug 24, 2017