-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v2.10.2 #7401
Merged
Merged
Release v2.10.2 #7401
+853
−354
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update develop after v2.10.1
#7368) Fix a typo. refer to docker-compose.yml
…o tasks (#7374) Currently, the following situation causes the export worker to consume more memory than necessary: * A user exports a project with images included, and * The project contains multiple tasks with video chunks. The main reason for this is that the `image_maker_per_task` dictionary in `CVATProjectDataExtractor.__init__` indirectly references a distinct `FrameProvider` for each task, which in turn, indirectly references an `av.containers.Container` object corresponding to the most recent chunk opened in that task. Initially, none of the chunks are opened, so the memory usage is low. But as Datumaro iterates over each frame, it eventually requests at least one image from each task of the project. This causes the corresponding `FrameProvider` to open a chunk file and keep a handle to it. The only way for a `FrameProvider` to close this chunk file is to open a new chunk when a frame from that chunk is requested. Therefore, each `FrameProvider` keeps at least one chunk open from the moment Datumaro requests the first frame from its task and _until the end of the export_. This manifests in the export worker's memory usage growing and growing as Datumaro goes from task to task. An open chunk consumes whatever Python objects represent it, but more importantly, any C-level buffers allocated by FFmpeg, which can be quite significant. In my testing, the size of the per-chunk memory was on the order of tens of megabytes. An open chunk also takes up a file descriptor. The fix for this is conceptually simple: ensure that only one `FrameProvider` object exists at a time. AFAICS, when a project is exported, all frames from a given task are grouped together, so we shouldn't need multiple tasks' chunks to be open at the same time anyway. I had to restructure the code to make this work, so I took the opportunity to fix a few other things, as well: * The code currently relies on garbage collection of PyAV's `Container` objects to free the resources used by them. Even though `VideoReader.__iter__` uses a `with` block to close the container, the `with` block can only do so if the container is iterated all the way to the end. This doesn't happen when `FrameProvider` uses it, since `FrameProvider` seeks to the needed frame and then stops iterating. I don't have evidence that this causes any issues at the moment, but Python does not guarantee that objects are GC'd promptly, so this could become another source of excessive memory usage. I added cleanup methods (`close`/`unload`/`__exit__`) at several layers of the code to ensure that each chunk is closed as soon as it is no longer needed. * I factored out and merged the code used to generate `dm.Image` objects when exporting projects and jobs/tasks. It's likely possible to merge even more code, but I don't want to expand the scope of the patch too much. * I fixed a seemingly useless optimization in the handling for 3D frames. Specifically, `CVATProjectDataExtractor` performs queries of the form: task.data.images.prefetch_related().get(id=i) The prefetch here looks useless, as only a single object is queried - it wouldn't be any less efficient to just let Django fetch the `Image`'s `related_files` when needed. I rewrote this code to prefetch `related_files` for all images in the task at once instead.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #7401 +/- ##
==========================================
+ Coverage 83.28% 83.47% +0.18%
==========================================
Files 373 373
Lines 39615 39760 +145
Branches 3701 3720 +19
==========================================
+ Hits 32993 33188 +195
+ Misses 6622 6572 -50
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changed
Fixed
Empty masks might be created with
polygon-minus
tool (Remove empty masks #7295)Empty masks might be created as a result of removing underlying pixels (Remove empty masks #7295)
Fixed excessive memory usage when exporting a project with multiple video tasks (Fix excessive memory usage when exporting projects with multiple video tasks #7374)
OpenCV tracker MIL works one frame behind (Fixed TrackerMIL one frame late #7399)