-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added subtitle conversion to decyptor #22
Conversation
Hello @bittaurus . Someone updated and corrected a few things in my fork of the CaptionToSrtConvertor. I will check it out, maybe compile it and send a link. |
@mdomnita I can submit a PR as I already have this integrated. Perhaps also upload the binaries... I think your repo should be a submodule of this one, but I have no experience with that sort of things. |
Hey @Dev-iL . My repo was a separate project I did for myself that worked along with this one to solve #9 . I then submitted a pull request to integrate it in the initial project. You can also submit a PR if you have a fork with this integrated or if you made the changes directly on a clone of this repo. If you manage to do it, thanks. If not, I will take some time over the weekend to get a Lynda demo account and test it on some courses then make a PR. |
The .caption files consist of subtitle text and timestamps with some binary data interposed. When converted to text, this binary data produces random, mostly unprintable characters. Many can be filtered out, but printable characters are sometimes overlooked and end up in the subtitle. This problem can be overcome by taking a different approach. Study revealed that the subtitle text and timestamp data appears at predictable positions in the .caption file. This commit replaces the filter method with one that makes use of this principle, extracting only text and timestamps and leaving out any binary data.
I came up with a solution for the subtitle problem. Looking good so far with the 16 courses I tested it on. I've submitted a pull request for mdomnita's repository, but while that's pending the code is available on my own fork. I've also uploaded a binary. Let me know if you find any flaws. |
@rosetinted just wondering.. Did you compare this to the current output of https://github.com/mdomnita/LyndaCaptionToSrtConvertor ? Does your code perform better? |
The regex code was a little too complicated for me to assess if your fixes to LyndaCaptionToSrtConvertor cover all possible problems with character filtering, but in that regard my solution seems more robust. No problems turned up during my testing, so if your code is good they should perform at least equally well. Edit: I compared a set of 542 subtitle files. Here an excerpt of the resulting diff. The complete result was quite a bit longer, but you get the idea. The regex filter still has some problems with extra or missing characters, whereas my method doesn't. |
Fix problem with extra characters in subtitle
@rosetinted Looks like your solution is indeed more robust. Good job! |
This fixes the following issues with the conversion process, which could lead to empty, incomplete or incorrect SRT files being produced. 1. Linebreaks other than CRLF are used. 2. Subtitles are separated by single instead of double linebreaks. 3. Linebreak characters can randomly pop up in binary data, at places where it becomes too complex to correct them. 4. Instead of the next subtitle, the first non-empty line below a subtitle contains a timestamp indicating its end time. 5. The "end time" subtitles are sometimes followed by junk in the text area. Often a single null terminator or other nonprintable character. These issues were solved, or were made easier to solve by no longer separating subtitles blocks by linebreaks. Instead they are located by timestamp markers, which has proven more reliable.
The .caption files consist of subtitle text and timestamps with some binary data interposed. When converted to text, this binary data produces random, mostly unprintable characters. Many can be filtered out, but printable characters are sometimes overlooked and end up in the subtitle. This problem can be overcome by taking a different approach. Study revealed that the subtitle text and timestamp data appears at predictable positions in the .caption file. This commit replaces the filter method with one that makes use of this principle, extracting only text and timestamps and leaving out any binary data.
Improvements to conversion reliability
where can I download the last version of this file ? " exe. file " |
@rosetinted Thanks , but I'm look for LyndaCaptionToSrtConvertor "exe. file" |
@hopa102 I don't have a build for that right now, but the Lynda-Decryptor exe that I linked will also convert the subtitles. |
Dear my friend, |
I added my captiontosrt code for the windows version. Even if the conversion is quite basic and still needs work, the subtitle filename, renaming of files and conversion along with the other videos works fine.
Only tested on Windows 10, Visual Studio 15.