Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added subtitle conversion to decyptor #22

Merged
merged 12 commits into from
Sep 27, 2017
Merged

Conversation

mdomnita
Copy link
Contributor

@mdomnita mdomnita commented Feb 4, 2017

I added my captiontosrt code for the windows version. Even if the conversion is quite basic and still needs work, the subtitle filename, renaming of files and conversion along with the other videos works fine.
Only tested on Windows 10, Visual Studio 15.

@mdomnita
Copy link
Contributor Author

Hello @bittaurus . Someone updated and corrected a few things in my fork of the CaptionToSrtConvertor. I will check it out, maybe compile it and send a link.
I will also make a new pull request to @h4ck-rOOt if it works fine.

@Dev-iL
Copy link

Dev-iL commented Aug 24, 2017

@mdomnita I can submit a PR as I already have this integrated. Perhaps also upload the binaries... I think your repo should be a submodule of this one, but I have no experience with that sort of things.

@mdomnita
Copy link
Contributor Author

Hey @Dev-iL . My repo was a separate project I did for myself that worked along with this one to solve #9 . I then submitted a pull request to integrate it in the initial project. You can also submit a PR if you have a fork with this integrated or if you made the changes directly on a clone of this repo. If you manage to do it, thanks. If not, I will take some time over the weekend to get a Lynda demo account and test it on some courses then make a PR.

The .caption files consist of subtitle text and timestamps with some
binary data interposed. When converted to text, this binary data
produces random, mostly unprintable characters. Many can be filtered
out, but printable characters are sometimes overlooked and end up in the
subtitle. This problem can be overcome by taking a different approach.
Study revealed that the subtitle text and timestamp data appears at
predictable positions in the .caption file. This commit replaces the
filter method with one that makes use of this principle, extracting only
text and timestamps and leaving out any binary data.
@rosetinted
Copy link
Contributor

@bittaurus @Dev-iL @mdomnita

I came up with a solution for the subtitle problem. Looking good so far with the 16 courses I tested it on. I've submitted a pull request for mdomnita's repository, but while that's pending the code is available on my own fork. I've also uploaded a binary. Let me know if you find any flaws.

@Dev-iL
Copy link

Dev-iL commented Aug 27, 2017

@rosetinted just wondering.. Did you compare this to the current output of https://github.com/mdomnita/LyndaCaptionToSrtConvertor ? Does your code perform better?

@rosetinted
Copy link
Contributor

rosetinted commented Aug 27, 2017

@Dev-iL

I haven't compared it, I just learned about it now. My solution replaces the regex filter with an extraction based on location, avoiding any binary data. That should make it immune against edge-cases where binary characters look like part of the subtitle but aren't actually.

The regex code was a little too complicated for me to assess if your fixes to LyndaCaptionToSrtConvertor cover all possible problems with character filtering, but in that regard my solution seems more robust. No problems turned up during my testing, so if your code is good they should perform at least equally well.

Edit: I compared a set of 542 subtitle files. Here an excerpt of the resulting diff. The complete result was quite a bit longer, but you get the idea. The regex filter still has some problems with extra or missing characters, whereas my method doesn't.

@Dev-iL
Copy link

Dev-iL commented Sep 7, 2017

@rosetinted Looks like your solution is indeed more robust. Good job!

rosetinted and others added 6 commits September 10, 2017 17:00
This fixes the following issues with the conversion process, which
could lead to empty, incomplete or incorrect SRT files being produced.

1. Linebreaks other than CRLF are used.
2. Subtitles are separated by single instead of double linebreaks.
3. Linebreak characters can randomly pop up in binary data, at places
   where it becomes too complex to correct them.
4. Instead of the next subtitle, the first non-empty line below a
   subtitle contains a timestamp indicating its end time.
5. The "end time" subtitles are sometimes followed by junk in the text
   area. Often a single null terminator or other nonprintable character.

These issues were solved, or were made easier to solve by no longer
separating subtitles blocks by linebreaks. Instead they are located by
timestamp markers, which has proven more reliable.
The .caption files consist of subtitle text and timestamps with some
binary data interposed. When converted to text, this binary data
produces random, mostly unprintable characters. Many can be filtered
out, but printable characters are sometimes overlooked and end up in the
subtitle. This problem can be overcome by taking a different approach.
Study revealed that the subtitle text and timestamp data appears at
predictable positions in the .caption file. This commit replaces the
filter method with one that makes use of this principle, extracting only
text and timestamps and leaving out any binary data.
Improvements to conversion reliability
@h4ck-rOOt h4ck-rOOt merged commit 65ee16a into h4ck-rOOt:master Sep 27, 2017
@hopa102
Copy link

hopa102 commented Mar 5, 2018

where can I download the last version of this file ? " exe. file "

@rosetinted
Copy link
Contributor

@hopa102 You can find an executable here.

@hopa102
Copy link

hopa102 commented Mar 9, 2018

@rosetinted Thanks , but I'm look for LyndaCaptionToSrtConvertor "exe. file"

@rosetinted
Copy link
Contributor

@hopa102 I don't have a build for that right now, but the Lynda-Decryptor exe that I linked will also convert the subtitles.

@Karambeigi
Copy link

Dear my friend,
Many thanks for your great program.
It works properly, particularly in the case of subtitle conversion.
Kind regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants