All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Add task_type property for dataset (open-edge-platform#1422)
- Fix ambiguous COCO format detector (open-edge-platform#1442)
- Changed supported Python version range (>=3.9, <=3.11) (open-edge-platform#1269)
- Support MMDetection COCO format (open-edge-platform#1213)
- Develop JsonSectionPageMapper in Rust API (open-edge-platform#1224)
- Add Filtering via User-Provided Python Functions (open-edge-platform#1230, open-edge-platform#1233)
- Remove supporting MacOS platform (open-edge-platform#1235)
- Support Kaggle image data (
KaggleImageCsvBase
,KaggleImageTxtBase
,KaggleImageMaskBase
,KaggleVocBase
,KaggleYoloBase
) (open-edge-platform#1240) - Add
__getitem__()
for random accessing with O(1) time complexity (open-edge-platform#1247) - Add Data-aware Anchor Generator (open-edge-platform#1251)
- Support bounding box import within Kaggle extractors and add
KaggleCocoBase
(open-edge-platform#1273)
- Optimize Python import to make CLI entrypoint faster (open-edge-platform#1182)
- Add ImageColorScale context manager (open-edge-platform#1194)
- Enhance visualizer to toggle plot title visibility (open-edge-platform#1228)
- Enhance Datumaro data format detect() to be memory-bounded and performant (open-edge-platform#1229)
- Change RoIImage and MosaicImage to have np.uint8 dtype as default (open-edge-platform#1245)
- Enable image backend and color channel format to be selectable (open-edge-platform#1246)
- Boost up
CityscapesBase
andKaggleImageMaskBase
by droppingnp.unique
(open-edge-platform#1261) - Enhance RISE algortihm for explainable AI (open-edge-platform#1263)
- Enhance explore unit test to use real dataset from ImageNet (open-edge-platform#1266)
- Fix each method of the comparator to be used separately (open-edge-platform#1290)
- Bump ONNX version to 1.16.0 (open-edge-platform#1376)
- Print the color channel format (RGB) for datum stats command (open-edge-platform#1389)
- Add ignore_index argument to Mask.as_class_mask() and Mask.as_instance_mask() (open-edge-platform#1409)
- Fix wrong example of Datumaro dataset creation in document (open-edge-platform#1195)
- Fix wrong command to install datumaro from github (open-edge-platform#1202, open-edge-platform#1207)
- Update document to correct wrong
datum project import
command and add filtering example to filter out items containing annotations. (open-edge-platform#1210) - Fix label compare of distance method (open-edge-platform#1205)
- Fix Datumaro visualizer's import errors after introducing lazy import (open-edge-platform#1220)
- Fix broken link to supported formats in readme (open-edge-platform#1221)
- Fix Kinetics data format to have media data (open-edge-platform#1223)
- Handling undefined labels at the annotation statistics (open-edge-platform#1232)
- Add unit test for item rename (open-edge-platform#1237)
- Fix a bug in the previous behavior when importing nested datasets in the project (open-edge-platform#1243)
- Fix Kaggle importer when adding duplicated labels (open-edge-platform#1244)
- Fix input tensor shape in model interpreter for OpenVINO 2023.3 (open-edge-platform#1251)
- Add default value for target in prune cli (open-edge-platform#1253)
- Remove deprecated MediaManager (open-edge-platform#1262)
- Fix explore command without project (open-edge-platform#1271)
- Fix enable COCO to import only bboxes (open-edge-platform#1360)
- Fix resize transform for RleMask annotation
- (open-edge-platform#1361)
- Fix import YOLO variants from extractor when
urls
is not specified (open-edge-platform#1362)
- Add memory bounded datumaro data format detect to release 1.5.1 (open-edge-platform#1241)
- Bump version string to 1.5.2 (open-edge-platform#1249)
- Remove Protobuf version limitation (<4) (open-edge-platform#1248)
- Enhance Datumaro data format stream importer performance (open-edge-platform#1153)
- Change image default dtype from float32 to uint8 (open-edge-platform#1175)
- Add comparison level-up doc (open-edge-platform#1174)
- Add ImportError to catch GitPython import error (open-edge-platform#1174)
- Modify the draw function in the visualizer not to raise an error for unsupported annotation types. (open-edge-platform#1180)
- Correct explore path in the related document. (open-edge-platform#1176)
- Fix errata in the voc document. Color values in the labelmap.txt should be separated by commas, not colons. (open-edge-platform#1162)
- Fix hyperlink errors in the document (open-edge-platform#1159, open-edge-platform#1161)
- Fix memory unbounded Arrow data format export/import (open-edge-platform#1169)
- Update CVAT format doc to bypass warning (open-edge-platform#1183)
- Add SAMAutomaticMaskGeneration transform (open-edge-platform#1168)
- Add tabular data import/export (open-edge-platform#1089)
- Support video annotation import/export (open-edge-platform#1124)
- Add multiframework (PyTorch, Tensorflow) converter (open-edge-platform#1125)
- Add SAM OVMS and Triton server Docker image builders (open-edge-platform#1129)
- Add SAMBboxToInstanceMask transform (open-edge-platform#1133, open-edge-platform#1134)
- Add ConfigurableValidator (open-edge-platform#1142)
- Enhance
ClassificationValidator
for multi-label classification datasets withlabel_groups
(open-edge-platform#1116) - Replace Roboflow
xml.etree
withdefusedxml
(open-edge-platform#1117) - Define
GroupType
withIntEnum
for, where0
isEXCLUSIVE
(open-edge-platform#1116) - Add Rust API to optimize COCOPageMapper performance (open-edge-platform#1120)
- Support a dictionary input in addition to a single image input for the model launcher to support Segment Anything Model (open-edge-platform#1133)
- Remove deprecates announced to be removed in 1.5.0 (open-edge-platform#1140)
- Add multi-threading option to ModelTransform and SAMBboxToInstanceMask (open-edge-platform#1145, open-edge-platform#1149)
- Coco exporter can export annotations even if there is no media, except for mask annotations which require media info. (open-edge-platform#1147)(open-edge-platform#1158)
- Fix bugs for Tile transform (open-edge-platform#1123)
- Disable Roboflow Tfrecord format when Tensorflow is not installed (open-edge-platform#1130)
- Raise VcsAlreadyExists error if vcs directory exists (open-edge-platform#1138)
- Report errors for COCO (stream) and Datumaro importers (open-edge-platform#1110)
- Add documentation and notebook example for Prune API (open-edge-platform#1070)
- Changed supported Python version range (>=3.8, <=3.11) (open-edge-platform#1083)
- Migrate OpenVINO v2023.0.0 (open-edge-platform#1036)
- Add Roboflow data format support (COCO JSON, Pascal VOC XML, YOLOv5-PyTorch, YOLOv7-PyTorch, YOLOv8, YOLOv5 Oriented Bounding Boxes, Multiclass CSV, TFRecord, CreateML JSON) (open-edge-platform#1044)
- Add MissingAnnotationDetection transform (open-edge-platform#1049, open-edge-platform#1063, open-edge-platform#1064)
- Add OVMSLauncher (open-edge-platform#1056)
- Add Prune API (open-edge-platform#1058)
- Add TritonLauncher (open-edge-platform#1059)
- Migrate DVC v3.0.0 (open-edge-platform#1072)
- Stream dataset import/export (open-edge-platform#1077, open-edge-platform#1081, open-edge-platform#1082, open-edge-platform#1091, open-edge-platform#1093, open-edge-platform#1098, open-edge-platform#1102)
- Support mask annotations for CVAT data format (open-edge-platform#1078)
- Support list query for explorer (open-edge-platform#1087)
- update contributing.md (open-edge-platform#1094)
- Update 3rd-party.txt for release 1.4.0 (open-edge-platform#1099)
- Give notice that the deprecation works will be done in datumaro==1.5.0 (open-edge-platform#1085)
- Unify COCO, Datumaro, VOC, YOLO importer/exporter progress reporter descriptions (open-edge-platform#1100)
- Enhance import performance for built-in plugins (open-edge-platform#1031)
- Change default dtype of load_image() to np.uint8 (open-edge-platform#1041)
- Add OTX ATSS detector model interpreter & refactor interfaces (open-edge-platform#1047)
- Refactor Launcher and ModelInterpreter (open-edge-platform#1055)
- Add CVAT data format document (open-edge-platform#1060)
- Reduce peak memory usage when importing COCO and Datumaro formats (open-edge-platform#1061)
- Enhance the error message for datum stats to be more user friendly (open-edge-platform#1069)
- Refactor dataset.py to seperate DatasetStorage (open-edge-platform#1073)
- Create cache dir under only writable filesystem (open-edge-platform#1088)
- Fix: Dataset infos() can be broken if a transform not redefining infos() is stacked on the top (open-edge-platform#1101)
- Fix warnings in test_visualizer.py (open-edge-platform#1039)
- Fix LabelMe data format (open-edge-platform#1053)
- Prevent installing protobuf>=4 (open-edge-platform#1054)
- Fix UnionMerge (open-edge-platform#1086)
- Let CocoBase continue even if an InvalidAnnotationError is raised (open-edge-platform#1050)
- Install dvc version to 2.x (open-edge-platform#1048)
- Replace np.append() in Validator (open-edge-platform#1050)
- Fix Cityscapes format mis-detection (open-edge-platform#1029)
- Add CocoRoboflowImporter (open-edge-platform#976, open-edge-platform#1000)
- Add SynthiaSfImporter and SynthiaAlImporter (open-edge-platform#987)
- Add intermediate skill docs for filter (open-edge-platform#996)
- Add VocInstanceSegmentationImporter and VocInstanceSegmentationExporter (open-edge-platform#997)
- Add Segment Anything data format support (open-edge-platform#1005, open-edge-platform#1009)
- Add Correct transformation (open-edge-platform#1006)
- Implement ReindexAnnotations transform (open-edge-platform#1008)
- Add notebook examples for importing/exporting detection and segmentation data (open-edge-platform#1020, open-edge-platform#1023)
- Update CLI from diff to compare, add TableComparator (open-edge-platform#1012)
- Use autosummary for fully-automatic Python module docs generation (open-edge-platform#973)
- Enrich stack trace for better user experience when importing (open-edge-platform#992)
- Save and load hashkey for explorer (open-edge-platform#981) (open-edge-platform#1003)
- Add MOT and MOTS data format docs (open-edge-platform#999)
- Improve RemoveAnnotations to remove specific annotations with ids (open-edge-platform#1004)
- Add Jupyter notebook example of noisy label detection for detection tasks (open-edge-platform#1011)
- Fix Mapillary Vistas data format (open-edge-platform#977)
- Fix
bytes
property returningNone
if function is given todata
(open-edge-platform#978) - Fix Synthia-Rand data format (open-edge-platform#987)
- Fix
person_layout
categories andaction_classification
attributes in imported Pascal-VOC dataset (open-edge-platform#997) - Drop a malformed transform from StackedTransform automatically (open-edge-platform#1001)
- Fix
Cityscapes
to dropImgsFine
directory (open-edge-platform#1023)
- Fix project level CVAT for images format import (open-edge-platform#980)
- Fix an info message when using the convert CLI command with no args.input_format (open-edge-platform#982)
- Fix media contents not returning bytes in arrow format (open-edge-platform#986)
- Add Skill Up section to documentation (open-edge-platform#920, open-edge-platform#933, open-edge-platform#935, open-edge-platform#945, open-edge-platform#949, open-edge-platform#953, open-edge-platform#959, open-edge-platform#960, open-edge-platform#967)
- Add LossDynamicsAnalyzer for noisy label detection (open-edge-platform#928)
- Add Apache Arrow format support (open-edge-platform#931, open-edge-platform#948)
- Add sort transform (open-edge-platform#931)
- Add multiprocessing to DatumaroBinaryBase (open-edge-platform#897)
- Refactor merge code (open-edge-platform#901, open-edge-platform#906)
- Refactor download CLI commands (open-edge-platform#909)
- Refactor CLI commands w/ and w/o project (open-edge-platform#910, open-edge-platform#952)
- Refactor Media to be initialized from explicit sources (open-edge-platform#911 open-edge-platform#921, open-edge-platform#944)
- Refactor hl_ops.py (open-edge-platform#912)
- Add tfds:uc_merced and tfds:eurosat download (open-edge-platform#914)
- Migrate documentation framework to Sphinx (open-edge-platform#917, open-edge-platform#922, open-edge-platform#947, open-edge-platform#954, open-edge-platform#958, open-edge-platform#961, open-edge-platform#962, open-edge-platform#963, open-edge-platform#964, open-edge-platform#965, open-edge-platform#969)
- Update merge tutorial for real life usecase (open-edge-platform#930)
- Abbreviate "detect-format" to "detect" for prettifying (open-edge-platform#951)
- Add UserWarning if an invalid media_type comes to image statistics computation (open-edge-platform#891)
- Fix negated
is_encrypted
(open-edge-platform#907) - Save extra images of PointCloud when exporting to datumaro format (open-edge-platform#918)
- Fix log issue when importing celeba and align celeba dataset (open-edge-platform#919)
- Fix to not export absolute media path in Datumaro and DatumaroBinary formats (open-edge-platform#896)
- Change pypi_publish.yml to publish_sdist_to_pypi.yml (open-edge-platform#895)
- Add with_subset_dirs decorator (Add ImagenetWithSubsetDirsImporter) (open-edge-platform#816)
- Add CommonSemanticSegmentationWithSubsetDirsImporter (open-edge-platform#826)
- Add DatumaroBinary format (open-edge-platform#828, open-edge-platform#829, open-edge-platform#830, open-edge-platform#831, open-edge-platform#880, open-edge-platform#883)
- Add Explorer CLI documentation (open-edge-platform#838)
- Add version to dataset exported as datumaro format (open-edge-platform#842)
- Add Ava action data format support (open-edge-platform#847)
- Add Shift Analyzer (both covariate and label shifts) (open-edge-platform#855)
- Add YOLO Loose format (open-edge-platform#856)
- Add Ultralytics YOLO format (open-edge-platform#859)
- Refactor Datumaro format code and test code (open-edge-platform#824)
- Add publish to PyPI Github action (open-edge-platform#867)
- Add --no-media-encryption option (open-edge-platform#875)
- Fix image filenames and anomaly mask appearance in MVTec exporter (open-edge-platform#835)
- Fix CIFAR10 and 100 detect function (open-edge-platform#836)
- Fix celeba and align_celeba detect function (open-edge-platform#837)
- Choose the top priority detect format for all directory depths (open-edge-platform#839)
- Fix MVTec format detect function (open-edge-platform#843)
- Fix wrong
__len__()
of Subset when the item is removed (open-edge-platform#854) - Fix mask visualization bug (open-edge-platform#860)
- Fix detect unit tests to test false negatives as well (open-edge-platform#868)
- Add Data Explorer (open-edge-platform#773)
- Add Ellipse annotation type (open-edge-platform#807)
- Add MVTec anomaly data support (open-edge-platform#810)
- Refactor existing tests (open-edge-platform#803)
- Raise ImportError on importing malformed COCO directory (open-edge-platform#812)
- Remove the duplicated and cyclical category context in documentation (open-edge-platform#822)
- Fix for importing CVAT image 1.1 data format exported to project level (open-edge-platform#795)
- Fix a problem on setting log-level via CLI (open-edge-platform#800)
- Fix code format with the latest black==23.1.0 (open-edge-platform#802)
- Fix Explain command cannot find the model (#721) (open-edge-platform#804)
- Fix a problem found on model remove CLI command (open-edge-platform#805)
- Add Tile transformation (open-edge-platform#790)
- Add Video keyframe extraction (open-edge-platform#791)
- Add TileTransform documentation and Jupyter notebook example (open-edge-platform#794)
- Add MergeTile transformation (open-edge-platform#796)
- Improved mask_to_rle performance (open-edge-platform#770)
- N/A
- N/A
- Fix MacOS CI failures (open-edge-platform#789)
- Fix auto-documentation for the data_format plugins (open-edge-platform#793)
- Add security.md file for the SDL (open-edge-platform#798)
- Support for exclusive of labels with LabelGroup (open-edge-platform#742)
- Jupyter samples
- Introducing how to merge datasets (open-edge-platform#738)
- Introducing how to visualize dataset (open-edge-platform#747)
- Introducing how to filter dataset (open-edge-platform#748)
- Introducing how to transform dataset (open-edge-platform#759)
- Visualization Python API
- Bbox feature (open-edge-platform#744)
- Label, Points, Polygon, PolyLine, and Caption visualization features (open-edge-platform#746)
- Mask, SuperResolution, Depth visualization features (open-edge-platform#747)
- Documentation for Python API
(open-edge-platform#753)
- dataset handler, visualizer, filter descriptions (open-edge-platform#761)
__repr__
for Dataset (open-edge-platform#750)- Support for exporting as CVAT video format (open-edge-platform#757)
- CodeCov coverage reporting feature to CI/CD (open-edge-platform#756)
- Jupyter notebook example rendering to documentation (open-edge-platform#758)
- An interface to manipulate 'infos' to store the dataset meta-info (open-edge-platform#767)
- 'bbox' annotation when importing a COCO dataset (open-edge-platform#772)
- Wrap title text according to its plot width (open-edge-platform#769)
- Get list of subsets and support only Image media type in visualizer (open-edge-platform#768)
- N/A
- N/A
- Correcting static type checking (open-edge-platform#743)
- Fixing a VOC dataset export when a label contains 'space' (open-edge-platform#771)
- N/A
- Support for custom media types, new
PointCloud
media type,DatasetItem.media
and.media_as(type)
members (open-edge-platform#539) - [API] A way to request dataset and extractor media type with
media_type
(open-edge-platform#539) - BraTS format (import-only) (.npy and .nii.gz), new
MultiframeImage
media type (open-edge-platform#628) - Common Semantic Segmentation dataset format (import-only) (open-edge-platform#685)
- An option to disable
data/
prefix inclusion in YOLO export (open-edge-platform#689) - New command
describe-downloads
to print information about downloadable datasets (open-edge-platform#678) - Detection for Cityscapes format (open-edge-platform#680)
- Maximum recursion
--depth
parameter fordetect-dataset
CLI command (open-edge-platform#680) - An option to save a single subset in the
download
command (open-edge-platform#697) - Common Super Resolution dataset format (import-only) (open-edge-platform#700)
- Kinetics 400/600/700 dataset format (import-only) (open-edge-platform#706)
- NYU Depth Dataset V2 format (import-only) (open-edge-platform#712)
env.detect_dataset()
now returns a list of detected formats at all recursion levels instead of just the lowest one (open-edge-platform#680)- Open Images: allowed to store annotations file in root path as well (open-edge-platform#680)
- Improved parsing error messages in COCO, VOC and YOLO formats (open-edge-platform#684, open-edge-platform#686, open-edge-platform#687)
- YOLO format now supports almost any subset names, except
backup
,names
andclasses
(instead of justtrain
andvalid
). The reserved names now raise an error on exporting. (open-edge-platform#688)
--save-images
is replaced with--save-media
in CLI and converter API (open-edge-platform#539)- [API]
image
,point_cloud
andrelated_images
ofDatasetItem
are replaced withmedia
andmedia_as(type)
members and c-tor parameters (open-edge-platform#539)
- N/A
- Detection for LFW format (open-edge-platform#680)
- Adding depth value of image when dataset is exported in VOC format (open-edge-platform#726)
- Adding to handle the numerical labels in task chains properly (open-edge-platform#726)
- Fixing the issue that annotations inside another annotation (polygon) are duplicated during import for VOC format (open-edge-platform#726)
- N/A
- Ability to import a video as frames with the
video_frames
format and to split a video into frames with thedatum util split_video
command (open-edge-platform#555) --subset
parameter in theimage_dir
format (open-edge-platform#555)MediaManager
API to control loaded media resources at runtime (open-edge-platform#555)- Command to detect the format of a dataset (open-edge-platform#576)
- More comfortable access to library API via
import datumaro
(open-edge-platform#630) - CLI command-like free functions (
export
,transform
, ...) (open-edge-platform#630) - Reading specific annotation files for train dataset in Cityscapes (open-edge-platform#632)
- Random sampling transforms (
random_sampler
,label_random_sampler
) to create smaller datasets from bigger ones (open-edge-platform#636, open-edge-platform#640) - API to report dataset import and export progress; API to report dataset import and export errors and take action (skip, fail) (supported in COCO, VOC and YOLO formats) (open-edge-platform#650)
- Support for downloading the ImageNetV2 and COCO datasets (open-edge-platform#653, open-edge-platform#659)
- A way for formats to signal that they don't support detection (open-edge-platform#665)
- Removal transforms to remove items/annoations/attributes from dataset
(
remove_items
,remove_annotations
,remove_attributes
) (open-edge-platform#670)
- Allowed direct file paths in
datum import
. Such sources are imported like when therpath
parameter is specified, however, only the selected path is copied into the project (open-edge-platform#555) - Improved
stats
performance, added new filtering parameters, image stats (unique
,repeated
) moved to thedataset
section, removedmean
andstd
from thedataset
section (open-edge-platform#621) - Allowed
Image
creation from justsize
info (open-edge-platform#634) - Added image search in VOC XML-based subformats (open-edge-platform#634)
- Added image path equality checks in simple merge, when applicable (open-edge-platform#634)
- Supported saving box attributes when downloading the TFDS version of VOC (open-edge-platform#668)
- Switched to a
pyproject.toml
-based build (open-edge-platform#671)
- TBD
- Official support of Python 3.6 (due to it's EOL) (open-edge-platform#617)
- Backward compatibility annotation symbols in
components.extractor
(open-edge-platform#630)
- Prohibited calling
add
,import
andexport
commands without a project (open-edge-platform#555) - Calling
make_dataset
on empty project tree now produces the error properly (open-edge-platform#555) - Saving (overwriting) a dataset in a project when rpath is used (open-edge-platform#613)
- Output image extension preserving in the
Resize
transform (open-edge-platform#606) - Memory overuse in the
Resize
transform (open-edge-platform#607) - Invalid image pixels produced by the
Resize
transform (open-edge-platform#618) - Numeric warnings that sometimes occurred in
stats
command (e.g. open-edge-platform#607) (open-edge-platform#621) - Added missing item attribute merging in simple merge (open-edge-platform#634)
- Inability to disambiguate VOC from LabelMe in some cases (open-edge-platform#658)
- TBD
- Command to download public datasets (open-edge-platform#582)
- Extension autodetection in
ByteImage
(open-edge-platform#595) - MPII Human Pose Dataset (import-only) (.mat and .json) (open-edge-platform#584)
- MARS format (import-only) (open-edge-platform#585)
- The
pycocotools
dependency lower bound is raised to2.0.4
. (open-edge-platform#449) smooth_line
fromdatumaro.util.annotation_util
- the function is renamed toapproximate_line
and has updated interface (open-edge-platform#592)
- Python 3.6 support
- TBD
- Fails in multimerge when lines are not approximated and when there are no label categories (open-edge-platform#592)
- Cannot convert LabelMe dataset, that has no subsets (open-edge-platform#600)
- TBD
- Video reading API (open-edge-platform#521)
- Python API documentation (open-edge-platform#526)
- Mapillary Vistas dataset format (Import-only) (open-edge-platform#537)
- Datumaro can now be installed on Windows on Python 3.9 (open-edge-platform#547)
- Import for SYNTHIA dataset format (open-edge-platform#532)
- Support of
score
attribute in KITTI detetion (open-edge-platform#571) - Support for Accuracy Checker dataset meta files in formats (open-edge-platform#553, open-edge-platform#569, open-edge-platform#575)
- Import for VoTT dataset format (open-edge-platform#573)
- Image resizing transform (open-edge-platform#581)
- The following formats can now be detected unambiguously:
ade20k2017
,ade20k2020
,camvid
,coco
,cvat
,datumaro
,icdar_text_localization
,icdar_text_segmentation
,icdar_word_recognition
,imagenet_txt
,kitti_raw
,label_me
,lfw
,mot_seq
,open_images
,vgg_face2
,voc
,widerface
,yolo
(open-edge-platform#531, open-edge-platform#536, open-edge-platform#550, open-edge-platform#557, open-edge-platform#558) - Allowed Pytest-native tests (open-edge-platform#563)
- Allowed export options in the
datum merge
command (open-edge-platform#545)
- Using
Image
,ByteImage
fromdatumaro.util.image
- these classes are moved todatumaro.components.media
(open-edge-platform#538)
- Equality comparison support between
datumaro.components.media.Image
andnumpy.ndarray
(open-edge-platform#568)
- Bug #560: import issue with MOT dataset when using seqinfo.ini file (open-edge-platform#564)
- Empty lines in VOC subset lists are not ignored (open-edge-platform#587)
- TBD
- Import for CelebA dataset format. (open-edge-platform#484)
- File
people.txt
became optional in LFW (open-edge-platform#509) - File
image_ids_and_rotation.csv
became optional Open Images (open-edge-platform#509) - Allowed underscores (
_
) in subset names in COCO (open-edge-platform#509) - Allowed annotation files with arbitrary names in COCO (open-edge-platform#509)
- The
icdar_text_localization
format is no longer detected in every directory (open-edge-platform#531) - Updated
pycocotools
version to 2.0.2 (open-edge-platform#534)
- TBD
- TBD
- Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (open-edge-platform#530)
- Exporting dataset without
color
attribute into theicdar_text_segmentation
format (open-edge-platform#556)
- TBD
- A new installation target:
pip install datumaro[default]
, which should be used by default. The simpledatumaro
is supposed for library users. (open-edge-platform#238) - Dataset and project versioning capabilities (Git-like) (open-edge-platform#238)
- "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format in
diff
,merge
,explain
andinfo
CLI commands (open-edge-platform#238) import
,remove
,commit
,checkout
,log
,status
,info
CLI commands (open-edge-platform#238)Coco*Extractor
classes now have an option to preserve label IDs from the original annotation file (open-edge-platform#453)patch
CLI command to patch datasets (open-edge-platform#401)ProjectLabels
transform to change dataset labels for merging etc. (open-edge-platform#401, open-edge-platform#478)- Support for custom labels in the KITTI detection format (open-edge-platform#481)
- Type annotations and docs for Annotation classes (open-edge-platform#493)
- Options to control label loading behavior in
imagenet_txt
import (open-edge-platform#434, open-edge-platform#489)
- A project can contain and manage multiple datasets instead of a single one. CLI operations can be applied to the whole project, or to separate datasets. Datasets are modified inplace, by default (open-edge-platform#328)
- CLI help for builtin plugins doesn't require project (open-edge-platform#328)
- Annotation-related classes were moved into a new module,
datumaro.components.annotation
(open-edge-platform#439) - Rollback utilities replaced with Scope utilities (open-edge-platform#444)
- The
Project
class fromdatumaro.components
is changed completely (open-edge-platform#238) diff
andediff
are joined into a singlediff
CLI command (open-edge-platform#238)- Projects use new file layout, incompatible with old projects.
An old project can be updated with
datum project migrate
(open-edge-platform#238) - Inheriting
CliPlugin
is not required in plugin classes (open-edge-platform#238) Importer
s do not createProject
s anymore and just return a list of extractor configurations (open-edge-platform#238)
- TBD
import
,project merge
CLI commands (open-edge-platform#238)- Support for project hierarchies. A project cannot be a source anymore (open-edge-platform#238)
- Project cannot have independent internal dataset anymore. All the project data must be stored in the project data sources (open-edge-platform#238)
datumaro_project
format (open-edge-platform#238)- Unused
path
field ofDatasetItem
(open-edge-platform#455)
- Deprecation warning in
open_images_format.py
(open-edge-platform#440) lazy_image
returning unrelated data sometimes (open-edge-platform#409)- Invalid call to
pycocotools.mask.iou
(open-edge-platform#450) - Importing of Open Images datasets without image data (open-edge-platform#463)
- Return value type in
Dataset.is_modified
(open-edge-platform#401) - Remapping of secondary categories in
RemapLabels
(open-edge-platform#401) - VOC dataset patching for classification and segmentation tasks (open-edge-platform#478)
- Exported mask label ids in KITTI segmentation (open-edge-platform#481)
- Missing
label
forPoints
read in the LFW format (open-edge-platform#494)
- TBD
- The Open Images format now supports bounding box and segmentation mask annotations (open-edge-platform#352, open-edge-platform#388).
- Bounding boxes values decrement transform (open-edge-platform#366)
- Improved error reporting in
Dataset
(open-edge-platform#386) - Support ADE20K format (import only) (open-edge-platform#400)
- Documentation website at https://openvinotoolkit.github.io/datumaro (open-edge-platform#420)
- Datumaro no longer depends on scikit-image (open-edge-platform#379)
Dataset
remembers export options on saving / exporting for the first time (open-edge-platform#386)
- TBD
- TBD
- Application of
remap_labels
to dataset categories of different length (open-edge-platform#314) - Patching of datasets in formats (open-edge-platform#348)
- Improved Cityscapes export performance (open-edge-platform#367)
- Incorrect format of
*_labelIds.png
in Cityscapes export (open-edge-platform#325, open-edge-platform#342) - Item id in ImageNet format (open-edge-platform#371)
- Double quotes for ICDAR Word Recognition (open-edge-platform#375)
- Wrong display of builtin formats in CLI (open-edge-platform#332)
- Non utf-8 encoding of annotation files in Market-1501 export (open-edge-platform#392)
- Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (open-edge-platform#392)
- Saving of images with Unicode paths on Windows (open-edge-platform#392)
- Calling
ProjectDataset.transform()
with a string argument (open-edge-platform#402) - Attributes casting for CVAT format (open-edge-platform#403)
- Loading of custom project plugins (open-edge-platform#404)
- Reading, writing anno file and saving name of the subset for test subset (open-edge-platform#447)
- Fixed unsafe unpickling in CIFAR import (open-edge-platform#362)
- Support for import/export zip archives with images (open-edge-platform#273)
- Subformat importers for VOC and COCO (open-edge-platform#281)
- Support for KITTI dataset segmentation and detection format (open-edge-platform#282)
- Updated YOLO format user manual (open-edge-platform#295)
ItemTransform
class, which describes item-wise datasetTransform
s (open-edge-platform#297)keep-empty
export parameter in VOC format (open-edge-platform#297)- A base class for dataset validation plugins (open-edge-platform#299)
- Partial support for the Open Images format; only images and image-level labels can be read/written (open-edge-platform#291, open-edge-platform#315).
- Support for Supervisely Point Cloud dataset format (open-edge-platform#245, open-edge-platform#353)
- Support for KITTI Raw / Velodyne Points dataset format (open-edge-platform#245)
- Support for CIFAR-100 and documentation for CIFAR-10/100 (open-edge-platform#301)
- Tensorflow AVX check is made optional in API and disabled by default (open-edge-platform#305)
- Extensions for images in ImageNet_txt are now mandatory (open-edge-platform#302)
- Several dependencies now have lower bounds (open-edge-platform#308)
- TBD
- TBD
- Incorrect image layout on saving and a problem with ecoding on loading (open-edge-platform#284)
- An error when XPath filter is applied to the dataset or its subset (open-edge-platform#259)
- Tracking of
Dataset
changes done by transforms (open-edge-platform#297) - Improved CLI startup time in several cases (open-edge-platform#306)
- Known issue: loading CIFAR can result in arbitrary code execution (open-edge-platform#327)
- Support for escaping in attribute values in LabelMe format (open-edge-platform#49)
- Support for Segmentation Splitting (open-edge-platform#223)
- Support for CIFAR-10/100 dataset format (open-edge-platform#225, open-edge-platform#243)
- Support for COCO panoptic and stuff format (open-edge-platform#210)
- Documentation file and integration tests for Pascal VOC format (open-edge-platform#228)
- Support for MNIST and MNIST in CSV dataset formats (open-edge-platform#234)
- Documentation file for COCO format (open-edge-platform#241)
- Documentation file and integration tests for YOLO format (open-edge-platform#246)
- Support for Cityscapes dataset format (open-edge-platform#249)
- Support for Validator configurable threshold (open-edge-platform#250)
- LabelMe format saves dataset items with their relative paths by subsets without changing names (open-edge-platform#200)
- Allowed arbitrary subset count and names in classification and detection splitters (open-edge-platform#207)
- Annotation-less dataset elements are now participate in subset splitting (open-edge-platform#211)
- Classification task in LFW dataset format (open-edge-platform#222)
- Testing is now performed with pytest instead of unittest (open-edge-platform#248)
- TBD
- TBD
- Added support for auto-merging (joining) of datasets with no labels and having labels (open-edge-platform#200)
- Allowed explicit label removal in
remap_labels
transform (open-edge-platform#203) - Image extension in CVAT format export (open-edge-platform#214)
- Added a label "face" for bounding boxes in Wider Face (open-edge-platform#215)
- Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (open-edge-platform#216)
- Empty lines in YOLO annotations are ignored (open-edge-platform#221)
- Export in VOC format when no image info is available (open-edge-platform#239)
- Fixed saving attribute in WiderFace extractor (open-edge-platform#251)
- TBD
- TBD
- Added an option to allow undeclared annotation attributes in CVAT format export (open-edge-platform#192)
- COCO exports images in separate dirs by subsets. Added an option to control this (open-edge-platform#195)
- TBD
- TBD
- Instance masks of
background
class no more introduce an instance (open-edge-platform#188) - Added support for label attributes in Datumaro format (open-edge-platform#192)
- TBD
- OpenVINO plugin examples (open-edge-platform#159)
- Dataset validation for classification and detection datasets (open-edge-platform#160)
- Arbitrary image extensions in formats (import and export) (open-edge-platform#166)
- Ability to set a custom subset name for an imported dataset (open-edge-platform#166)
- CLI support for NDR(open-edge-platform#178)
- Common ICDAR format is split into 3 sub-formats (open-edge-platform#174)
- TBD
- TBD
- The ability to work with file names containing Cyrillic and spaces (open-edge-platform#148)
- Image reading and saving in ICDAR formats (open-edge-platform#174)
- Unnecessary image loading on dataset saving (open-edge-platform#176)
- Allowed spaces in ICDAR captions (open-edge-platform#182)
- Saving of masks in VOC when masks are not requested (open-edge-platform#184)
- TBD
- TBD
- TBD
- TBD
- TBD
- Images with no annotations are exported again in VOC formats (open-edge-platform#123)
- Inference result for only one output layer in OpenVINO launcher (open-edge-platform#125)
- TBD
Icdar13/15
dataset format (open-edge-platform#96)- Laziness, source caching, tracking of changes and partial updating for
Dataset
(open-edge-platform#102) Market-1501
dataset format (open-edge-platform#108)LFW
dataset format (open-edge-platform#110)- Support of polygons' and masks' confusion matrices and mismathing classes in
diff
command (open-edge-platform#117) - Add near duplicate image removal plugin (open-edge-platform#113)
- Sampler Plugin that analyzes inference result from the given dataset and selects samples for annotation(open-edge-platform#115)
- OpenVINO model launcher is updated for OpenVINO r2021.1 (open-edge-platform#100)
- TBD
- TBD
- High memory consumption and low performance of mask import/export, #53 (open-edge-platform#101)
- Masks, covered by class 0 (background), should be exported with holes inside (open-edge-platform#104)
diff
command invocation problem with missing class methods (open-edge-platform#117)
- TBD
WiderFace
dataset format (open-edge-platform#65, open-edge-platform#90)- Function to transform annotations to labels (open-edge-platform#66)
- Dataset splits for classification, detection and re-id tasks (open-edge-platform#68, open-edge-platform#81)
VGGFace2
dataset format (open-edge-platform#69, open-edge-platform#82)- Unique image count statistic (open-edge-platform#87)
- Installation with pip by name
datumaro
Dataset
class extended with new operations:save
,load
,export
,import_from
,detect
,run_model
(open-edge-platform#71)- Allowed importing
Extractor
-only defined formats (inProject.import_from
,dataset.import_from
and CLI/project import
) (open-edge-platform#71) datum project ...
commands replaced withdatum ...
commands (open-edge-platform#84)- Supported more image formats in
ImageNet
extractors (open-edge-platform#85) - Allowed adding
Importer
-defined formats as project sources (source add
) (open-edge-platform#86) - Added max search depth in
ImageDir
format and importers (open-edge-platform#86)
datum project ...
CLI context (open-edge-platform#84)
- TBD
- Allow plugins inherited from
Extractor
(instead of onlySourceExtractor
) (open-edge-platform#70) - Windows installation with
pip
forpycocotools
(open-edge-platform#73) YOLO
extractor path matching on Windows (open-edge-platform#73)- Fixed inplace file copying when saving images (open-edge-platform#76)
- Fixed
labelmap
parameter type checking inVOC
converter (open-edge-platform#76) - Fixed model copying on addition in CLI (open-edge-platform#94)
- TBD
CamVid
dataset format (open-edge-platform#57)- Ability to install
opencv-python-headless
dependency withDATUMARO_HEADLESS=1
environment variable instead ofopencv-python
(open-edge-platform#62)
- Allow empty supercategory in COCO (open-edge-platform#54)
- Allow Pascal VOC to search in subdirectories (open-edge-platform#50)
- TBD
- TBD
- TBD
- TBD
ImageNet
andImageNetTxt
dataset formats (open-edge-platform#41)
- TBD
- TBD
- TBD
- Default
label-map
parameter value for VOC converter (open-edge-platform#34) - Randomness of random split transform (open-edge-platform#38)
Transform.subsets()
method (open-edge-platform#38)- Supported unknown image formats in TF Detection API converter (open-edge-platform#40)
- Supported empty attribute values in CVAT extractor (open-edge-platform#45)
- TBD
ByteImage
class to represent encoded images in memory and avoid recoding on save (open-edge-platform#27)
- Implementation of format plugins simplified (open-edge-platform#22)
default
is now a default subset name, instead ofNone
. The values are interchangeable. (open-edge-platform#22)- Improved performance of transforms (open-edge-platform#22)
- TBD
image/depth
value from VOC export (open-edge-platform#27)
- Zero division errors in dataset statistics (open-edge-platform#31)
- TBD
reindex
option in COCO and CVAT converters (open-edge-platform#18)- Support for relative paths in LabelMe format (open-edge-platform#19)
- MOTS png mask format support (https://github.com/openvinotoolkit/datumaro/21)
- TBD
- TBD
- TBD
- TBD
- TBD
- Initial release
## [Unreleased]
### New features
- TBD
### Enhancements
- TBD
### Deprecated
- TBD
### Removed
- TBD
### Bug fixes
- TBD
### Security
- TBD