[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. #1475

eunwoosh · 2023-01-02T08:43:25Z

Summary

fix a bug that error is raised when multi gpu training with none zero num_workers.

Reason of the bug

When process is spawned, deafult multi porcess method is set as "spawn". It raises error when dataloader is used with num_workers > 0.
This is because dataloader has a DatasetItemEntity which has a thread lock attribute and thread lock is unpickleable.
I don't know exact reason, but when forking a new process, unpickleable argument can be passed to new process.

otx/cli/utils/multi_gpu.py

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

* Update MPA submodule to origin/otx * [OTX-MMCV] Public mmdetection (#1382) Enable model training and NNCF in mmdet (#1355) * Enable detection training on latest mmcv/det - ATSS / SSD / YOLOX - NNCF support for ATSS * fix: import errors * feat: add monkey patch to mmdet modules - most of patches would be just wrapping for not tracing in nncf context * feat: add trainable yolox - add trainable yolox - recursively search dataset cfg for nested dataset classes * fix: change device to cpu when nncf tracing * feat: add trainable ssd * refactor: rearange nncf adapter * feat: add trainable mask rcnn models * refactor: move out common utils * fix: ssd head bug * feat: add lr scheduler for accuracy aware runner * refactor: nncf module and monkey patch * fix: proper clustering anchors for ssd * fix: unable to trace the first module in NNCFNetwork * fix: bring back ssd head structure * feat: add train_step method to NNCFNetwork * fix: mismatches * fix: update pipeline for wrapper * fix: add missing file * Fix merge error * Enable model training and NNCF in mmseg (#1400) * refactor: remove redundant * feat: enable mmseg training * feat: add nncf related stuff * fix: change lr config * fix: align nncf target metric * refactor: use mpa for training and inference * test: enable tests * fix: minor bug * refactor: patcher * fix: build consistent nncf graph * fix: minor bug * fix: remove unused backup * fix: dealt with datacontainer * [OTX-MMCLS] Enable NNCF (#1435) * fix: use patcher * feat: update mmcls version * feat: enable NNCF for mmcls * refactor: add build NNCF model functions * fix: minor bug * fix: typo * fix: make sure importing nncf when enabled only * fix: inherit from base super class of otx * [OTX] Introduce mmdeploy to export cls/seg/det models (#1466) * feat: export using mmdeploy * fix: adapt mmdeploy exported model * test: enable openvino export * fix: patch depending on fn type * feat: mmdeploy for classification model * test: enable export and openvino performance test * fix: change temporary requirements * refactor: use builder * fix: do not propagate logger * fix: remove image channel format conversion * fix: handle unlabeled data * fix: run eval before optimizing nncf network * feat: change confidence threshold after nncf optimization * fix: remove redundant attribute * fix: official released openvino version * fix: remove redundants * feat: public mm series libraries * feat: otx refactoring and bug fix * Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. (#1475)" This reverts commit c076902. * feat: enable multi-nodes distributed training * fix: redundant parts * Revert "[OTX] Evaluate a model before training starts (#1472)" This reverts commit f728295. * feat: enable evaluation before and after training * style: fix failed cases * fix: disable sam optimizer for nncf task * fix: add frezelayer hook for segmentation * fix: deepcopy instead of shallowcopy * fix: enable temporary disabled features * fix: handle nncf state simply * fix: remove submodule * feat: proper test runner handler * fix: add forcetrainmodehook * fix: make sure model is evaluated before run * fix: more merge conflicts * fix: buffer line by line in userspace * fix: patch torch, etc. only when nncf task is executed * fix: restrict kornia version * fix: restrict version * fix: align data pipeline for supcon * fix: unclutter things * fix: ignore annoying leftover data.yaml Co-authored-by: Songki Choi <songki.choi@intel.com>

bugfix: multi process method of spawned method is set as default

935743a

eunwoosh requested review from goodsong81, supersoob, harimkang, JihwanEom, sungmanc and jaegukhyun January 2, 2023 08:43

eunwoosh requested a review from a team as a code owner January 2, 2023 08:43

github-actions bot added the CLI Any changes in OTE CLI label Jan 2, 2023

eunwoosh changed the title ~~[OTX] bugfix: multi GPU raise error when num_workers isn't set as 0.~~ [OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. Jan 2, 2023

goodsong81 reviewed Jan 2, 2023

View reviewed changes

otx/cli/utils/multi_gpu.py Outdated Show resolved Hide resolved

make code more readable

036c3f2

harimkang approved these changes Jan 3, 2023

View reviewed changes

goodsong81 approved these changes Jan 3, 2023

View reviewed changes

eunwoosh merged commit c076902 into feature/otx Jan 3, 2023

eunwoosh deleted the es/multi_gpu_num_worker_fix branch January 3, 2023 04:47

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 5, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

1e6bcb7

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 5, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

75e8a45

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 5, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

13c797f

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

088bdc7

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

476ea4d

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

45fb6af

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

1d5c10d

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

494db54

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

93b7f73

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

4617c6c

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

74a20d2

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

7465d26

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 6, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

8a1f9e4

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

cih9088 added a commit to cih9088/training_extensions that referenced this pull request Jan 9, 2023

Revert "[OTX] Bugfix: multi GPU raise error when num_workers isn't se…

3a64319

…t as 0. (openvinotoolkit#1475)" This reverts commit c076902.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. #1475

[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. #1475

eunwoosh commented Jan 2, 2023 •

edited

Loading

[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. #1475

[OTX] Bugfix: multi GPU raise error when num_workers isn't set as 0. #1475

Conversation

eunwoosh commented Jan 2, 2023 • edited Loading

Summary

Reason of the bug

eunwoosh commented Jan 2, 2023 •

edited

Loading