Drop support of the TFRecord format #7416
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and context
Usage statistics on app.cvat.ai show that this format is rarely used, with single-digit numbers of projects/tasks/jobs being exported or imported using this format. Moreover, TensorFlow's popularity appears to be shrinking, so I'm not expecting this format to make a comeback.
Meanwhile, supporting this format has a cost that has to be borne by everyone deploying CVAT, because it requires TensorFlow to be installed and loaded. This has various ill effects:
Loading time is increased. In my testing, even a command as trivial as
manage.py --help
is slowed down by 3.6 seconds. This may not seem like much, but the effect is compounded, because we have multiple processes (server + workers) all loading the same codebase. Plus, the container entrypoint may execute several Django commands.Memory usage is increased. TensorFlow adds ~100MB of RAM usage per process with data alone; and the libraries add more (although it's hard to estimate the impact of library code, since it can be shared between processes in RAM).
Docker image size is increased by ~1.5GB (when unpacked). This is more than half of the current total size! Building time is increased as well.
Overall, it seems that the drawbacks of keeping support for this format outweigh the benefits, so it's time to drop it.
How has this been tested?
Checklist
develop
branch[ ] I have linked related issues (see GitHub docs)[ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.