Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v2.10.2 #7401

Merged
merged 13 commits into from
Jan 26, 2024
Merged

Release v2.10.2 #7401

merged 13 commits into from
Jan 26, 2024

Conversation

cvat-bot[bot]
Copy link
Contributor

@cvat-bot cvat-bot bot commented Jan 26, 2024

Changed

Fixed

cvat-bot bot and others added 13 commits January 18, 2024 11:10
…o tasks (#7374)

Currently, the following situation causes the export worker to consume
more memory than necessary:

* A user exports a project with images included, and
* The project contains multiple tasks with video chunks.

The main reason for this is that the `image_maker_per_task` dictionary
in `CVATProjectDataExtractor.__init__` indirectly references a distinct
`FrameProvider` for each task, which in turn, indirectly references an
`av.containers.Container` object corresponding to the most recent chunk
opened in that task.

Initially, none of the chunks are opened, so the memory usage is low.
But as Datumaro iterates over each frame, it eventually requests at
least one image from each task of the project. This causes the
corresponding `FrameProvider` to open a chunk file and keep a handle to
it. The only way for a `FrameProvider` to close this chunk file is to
open a new chunk when a frame
from that chunk is requested. Therefore, each `FrameProvider` keeps at
least one chunk open from the moment Datumaro requests the first frame
from its task and _until the end of the export_.

This manifests in the export worker's memory usage growing and growing
as Datumaro goes from task to task. An open chunk consumes whatever
Python objects represent it, but more importantly, any C-level buffers
allocated by FFmpeg, which can be quite significant. In my testing, the
size of the per-chunk memory was on the order of tens of megabytes. An
open chunk also takes up a file descriptor.

The fix for this is conceptually simple: ensure that only one
`FrameProvider` object exists at a time. AFAICS, when a project is
exported, all frames from a given task are grouped together, so we
shouldn't need multiple tasks' chunks to be open at the same time
anyway.

I had to restructure the code to make this work, so I took the
opportunity to fix a few other things, as well:

* The code currently relies on garbage collection of PyAV's `Container`
  objects to free the resources used by them. Even though
  `VideoReader.__iter__` uses a `with` block to close the container, the
  `with` block can only do so if the container is iterated all the way to
   the end. This doesn't happen when `FrameProvider` uses it, since
  `FrameProvider` seeks to the needed frame and then stops iterating.

  I don't have evidence that this causes any issues at the moment, but
  Python does not guarantee that objects are GC'd promptly, so this could
  become another source of excessive memory usage.

  I added cleanup methods (`close`/`unload`/`__exit__`) at several layers
  of the code to ensure that each chunk is closed as soon as it is no
  longer needed.

* I factored out and merged the code used to generate `dm.Image` objects
  when exporting projects and jobs/tasks. It's likely possible to merge
  even more code, but I don't want to expand the scope of the patch too
  much.

* I fixed a seemingly useless optimization in the handling for 3D
  frames. Specifically, `CVATProjectDataExtractor` performs queries of the
  form:

      task.data.images.prefetch_related().get(id=i)

  The prefetch here looks useless, as only a single object is queried - it
  wouldn't be any less efficient to just let Django fetch the `Image`'s
  `related_files` when needed.

  I rewrote this code to prefetch `related_files` for all images in the
  task at once instead.
Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Merging #7401 (3733a07) into master (7a1a4b1) will increase coverage by 0.18%.
Report is 9 commits behind head on master.
The diff coverage is 92.04%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7401      +/-   ##
==========================================
+ Coverage   83.28%   83.47%   +0.18%     
==========================================
  Files         373      373              
  Lines       39615    39760     +145     
  Branches     3701     3720      +19     
==========================================
+ Hits        32993    33188     +195     
+ Misses       6622     6572      -50     
Components Coverage Δ
cvat-ui 79.36% <84.84%> (+0.27%) ⬆️
cvat-server 87.24% <96.92%> (+0.12%) ⬆️

@cvat-bot cvat-bot bot merged commit d427822 into master Jan 26, 2024
31 checks passed
@cvat-bot cvat-bot bot deleted the release-2.10.2 branch January 26, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants