added subtitle conversion to decyptor #22

mdomnita · 2017-02-04T15:55:42Z

I added my captiontosrt code for the windows version. Even if the conversion is quite basic and still needs work, the subtitle filename, renaming of files and conversion along with the other videos works fine.
Only tested on Windows 10, Visual Studio 15.

mdomnita · 2017-08-24T13:33:04Z

Hello @bittaurus . Someone updated and corrected a few things in my fork of the CaptionToSrtConvertor. I will check it out, maybe compile it and send a link.
I will also make a new pull request to @h4ck-rOOt if it works fine.

Dev-iL · 2017-08-24T13:55:16Z

@mdomnita I can submit a PR as I already have this integrated. Perhaps also upload the binaries... I think your repo should be a submodule of this one, but I have no experience with that sort of things.

mdomnita · 2017-08-25T16:40:11Z

Hey @Dev-iL . My repo was a separate project I did for myself that worked along with this one to solve #9 . I then submitted a pull request to integrate it in the initial project. You can also submit a PR if you have a fork with this integrated or if you made the changes directly on a clone of this repo. If you manage to do it, thanks. If not, I will take some time over the weekend to get a Lynda demo account and test it on some courses then make a PR.

The .caption files consist of subtitle text and timestamps with some binary data interposed. When converted to text, this binary data produces random, mostly unprintable characters. Many can be filtered out, but printable characters are sometimes overlooked and end up in the subtitle. This problem can be overcome by taking a different approach. Study revealed that the subtitle text and timestamp data appears at predictable positions in the .caption file. This commit replaces the filter method with one that makes use of this principle, extracting only text and timestamps and leaving out any binary data.

rosetinted · 2017-08-27T13:22:13Z

@bittaurus @Dev-iL @mdomnita

I came up with a solution for the subtitle problem. Looking good so far with the 16 courses I tested it on. I've submitted a pull request for mdomnita's repository, but while that's pending the code is available on my own fork. I've also uploaded a binary. Let me know if you find any flaws.

Dev-iL · 2017-08-27T13:36:48Z

@rosetinted just wondering.. Did you compare this to the current output of https://github.com/mdomnita/LyndaCaptionToSrtConvertor ? Does your code perform better?

rosetinted · 2017-08-27T14:00:52Z

@Dev-iL

~~I haven't compared it~~, I just learned about it now. My solution replaces the regex filter with an extraction based on location, avoiding any binary data. That should make it immune against edge-cases where binary characters look like part of the subtitle but aren't actually.

The regex code was a little too complicated for me to assess if your fixes to LyndaCaptionToSrtConvertor cover all possible problems with character filtering, but in that regard my solution seems more robust. No problems turned up during my testing, so if your code is good they should perform at least equally well.

Edit: I compared a set of 542 subtitle files. Here an excerpt of the resulting diff. The complete result was quite a bit longer, but you get the idea. The regex filter still has some problems with extra or missing characters, whereas my method doesn't.

Fix problem with extra characters in subtitle

Dev-iL · 2017-09-07T11:51:56Z

@rosetinted Looks like your solution is indeed more robust. Good job!

This fixes the following issues with the conversion process, which could lead to empty, incomplete or incorrect SRT files being produced. 1. Linebreaks other than CRLF are used. 2. Subtitles are separated by single instead of double linebreaks. 3. Linebreak characters can randomly pop up in binary data, at places where it becomes too complex to correct them. 4. Instead of the next subtitle, the first non-empty line below a subtitle contains a timestamp indicating its end time. 5. The "end time" subtitles are sometimes followed by junk in the text area. Often a single null terminator or other nonprintable character. These issues were solved, or were made easier to solve by no longer separating subtitles blocks by linebreaks. Instead they are located by timestamp markers, which has proven more reliable.

The .caption files consist of subtitle text and timestamps with some binary data interposed. When converted to text, this binary data produces random, mostly unprintable characters. Many can be filtered out, but printable characters are sometimes overlooked and end up in the subtitle. This problem can be overcome by taking a different approach. Study revealed that the subtitle text and timestamp data appears at predictable positions in the .caption file. This commit replaces the filter method with one that makes use of this principle, extracting only text and timestamps and leaving out any binary data.

Improvements to conversion reliability

hopa102 · 2018-03-05T18:09:51Z

where can I download the last version of this file ? " exe. file "

rosetinted · 2018-03-08T19:12:09Z

@hopa102 You can find an executable here.

hopa102 · 2018-03-09T06:14:57Z

@rosetinted Thanks , but I'm look for LyndaCaptionToSrtConvertor "exe. file"

rosetinted · 2018-03-09T19:29:28Z

@hopa102 I don't have a build for that right now, but the Lynda-Decryptor exe that I linked will also convert the subtitles.

Karambeigi · 2019-03-03T11:50:50Z

Dear my friend,
Many thanks for your great program.
It works properly, particularly in the case of subtitle conversion.
Kind regards,

mdomnita added 2 commits February 4, 2017 17:49

added subtitle conversion to decyptor

6ff0245

added more checks t subtitle conversion to solve issues.

4513e81

This was referenced Aug 22, 2017

Missing texts and invalid characters mdomnita/LyndaCaptionToSrtConvertor#3

Open

How to convert the caption #32

Open

rosetinted mentioned this pull request Aug 27, 2017

Fix problem with extra characters in subtitle mdomnita/Lynda-Decryptor#1

Merged

mdomnita and others added 3 commits August 30, 2017 09:49

Merge pull request #1 from rosetinted/master

da0ef57

Fix problem with extra characters in subtitle

Fix error message when there is no subtitle to convert.

480a7fb

Fix issue where subtitles with LF linebreaks weren't being converted.

9ae1718

rosetinted and others added 6 commits September 10, 2017 17:00

added subtitle conversion to decyptor

dbe6b70

added more checks t subtitle conversion to solve issues.

8f8f7f3

Merge branch 'master' of https://github.com/mdomnita/Lynda-Decryptor

9f7fded

Merge pull request #2 from rosetinted/master

d406bd2

Improvements to conversion reliability

h4ck-rOOt merged commit 65ee16a into h4ck-rOOt:master Sep 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added subtitle conversion to decyptor #22

added subtitle conversion to decyptor #22

mdomnita commented Feb 4, 2017

mdomnita commented Aug 24, 2017

Dev-iL commented Aug 24, 2017

mdomnita commented Aug 25, 2017

rosetinted commented Aug 27, 2017

Dev-iL commented Aug 27, 2017

rosetinted commented Aug 27, 2017 •

edited

Loading

Dev-iL commented Sep 7, 2017

hopa102 commented Mar 5, 2018 •

edited

Loading

rosetinted commented Mar 8, 2018

hopa102 commented Mar 9, 2018

rosetinted commented Mar 9, 2018

Karambeigi commented Mar 3, 2019

added subtitle conversion to decyptor #22

added subtitle conversion to decyptor #22

Conversation

mdomnita commented Feb 4, 2017

mdomnita commented Aug 24, 2017

Dev-iL commented Aug 24, 2017

mdomnita commented Aug 25, 2017

rosetinted commented Aug 27, 2017

Dev-iL commented Aug 27, 2017

rosetinted commented Aug 27, 2017 • edited Loading

Dev-iL commented Sep 7, 2017

hopa102 commented Mar 5, 2018 • edited Loading

rosetinted commented Mar 8, 2018

hopa102 commented Mar 9, 2018

rosetinted commented Mar 9, 2018

Karambeigi commented Mar 3, 2019

rosetinted commented Aug 27, 2017 •

edited

Loading

hopa102 commented Mar 5, 2018 •

edited

Loading