-
-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement seek operation for Theora video files #102360
base: master
Are you sure you want to change the base?
Conversation
b337297
to
426b9fd
Compare
I've removed a short silence after a seek that was intentional to allow for the audio to sync without having to start decoding audio sooner. It was usually unnoticeable, but I found some files where the silence could be up to 1s long. Now, it will play audio from the very start without any silence. While working on this, I thought I would explain a bit about how I've implemented the seek function in case anyone is interested. My goal was performance and precision, and at the same time make the code as simple as possible. Long technical explanationThe only official documentation I could find, apart from the API ref, was this wiki page, but it's pretty vague about the details. I've found some more information and code that helped, but it was either too vague or too confusing for me to fully use, so I started my own implementation. Any code I'd used would have to be heavily adapted anyway. OGG doesn't use indexes. The basic storage unit is a page, and it uses granules to put time marks on pages. Granulepos is the time unit in OGG containers. For video streams, this granulepos can be easily decoded into a frame number, or a key-frame number and an inter-frame offset. The problem is that not all pages have a granulepos, and when they have one, it's the granulepos of the last complete packet in the page. So finding the granule we need isn't as easy as it may seem, and it involves scanning through several pages until we guess the page it is in. Something similar happens with audio streams, although granulepos is calculated differently. In this case, it's easier because there are no key-frames or alike. But in both cases, when scanning, we have to make sure we catch the page where the packet we want starts. There's an added complication for video. Calculating the key-frame and the inter-frame offset from a granulepos is easy, but the inverse operation isn't. The GOP inside a stream is variable, although most streams don't change it. We only know the max GOP for a stream, so it's impossible to calculate the granulepos from a key-frame number.
The algorithm starts by guessing a position in the file where the key frame we're looking for might be, using the frame duration, the size in bytes of the file, and the stream length in seconds. This gives us a pretty good approximation. Then it starts scanning for the first page, and if we're past the time mark, it backtracks in the file. The first page with a granulepos just before our calculated granulepos key frame guess is the point where we start decoding. The initial decoding process is done as fast as possible up to the time we're seeking, then resume the normal decoding process. This algorithm gives very short seek times on my computer (Ryzen 3600) with debug builds. Immediate in some videos and one or two tenths of a second in others. I have to try it yet on optimized builds. It will depend on the GOP used, the higher, the more frames it will have to decode before displaying the first frame.
|
I've removed a workaround for a bad video encoding case that I only found in a very old video pre-dating Theora 1.0. It's not worth it. |
c48d7f1
to
aa77130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally using https://commons.wikimedia.org/wiki/File:Big_Buck_Bunny_first_23_seconds_1080p.ogv and https://commons.wikimedia.org/wiki/File:Big_Buck_Bunny_medium.ogv, it works as expected. I made sure sound remains in sync after seeking back and forth.
I don't have an OGV file with more than 2 audio channels to test the downmixing functionality though.
This comment has a video with 7.1 audio. I've also tested with this video that has 5.1 audio but it needs to be extracted and converted. EDIT: I've uploaded the 5.1 video to make it easier. You can check witth |
Co-authored-by: K. S. Ernest (iFire) Lee <fire@users.noreply.github.com>
Includes a fix for crackling sound when the audio buffer isn't big enough to hold a full Vorbis packet. It would happen when the video had 6 channel audio tracks.
Co-authored-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
Co-authored-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
Co-authored-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
I've implemented the
set_stream_position
operation for Theora video files and as a consequence alsoget_length
. It also tries to comply with the documentation by not changing the current frame whenstop
is called and playing seamless loops.I've made a simple videoplayer project to help test video playing on Godot.
This is based on #101958. The bulk of the work is done in the commit with the message matching this PR title but there are improvements and fixes afterwards. Since I'm using this PR as my testing code base, instead of creating more PRs depending each one on the previous one, I'll add any fixes/improvements here. Please, talk to me if you need me to rearrange anything for easier review.
get_stream_length()
for built-in supported video formats godot-proposals#10148.See also VideoPlayer
set_stream_position()
doesn't work (seeking is not implemented) #14430and [3.x] Support WebM seeking operation #57744.
UPDATE: I've done more testing and added a few more improvements/fixes: