Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OTX] Replace forked mm libraries with public ones #1497

Merged

Conversation

cih9088
Copy link
Contributor

@cih9088 cih9088 commented Jan 9, 2023

I highly encourage you to look into this PR thoroughly as this PR impacts on all tasks in OTX.

Changes

  • replaced forked mm libraries with public ones
  • employed mmdeploy to export model to OpenVINO IR
  • enabled multi-node distributed training by extending [OTX] Enable multi-GPU training #1392
  • used existing hooks (AdaptiveTrainSchedulingHook, EvalHook) to evaluate model before and after training instead of createing new one [OTX] Evaluate a model before training starts #1472
  • refactored (align code base across cls/det/seg to future refactoring)
  • fixed bugs

Tests

  • all pre-commit tests and pytests are passed

Related PRs

goodsong81 and others added 17 commits January 9, 2023 10:13
Signed-off-by: Songki Choi <songki.choi@intel.com>
Enable model training and NNCF in mmdet (openvinotoolkit#1355)

* Enable detection training on latest mmcv/det
- ATSS / SSD / YOLOX
- NNCF support for ATSS

* fix: import errors

* feat: add monkey patch to mmdet modules
- most of patches would be just wrapping for not tracing in nncf context

* feat: add trainable yolox
- add trainable yolox
- recursively search dataset cfg for nested dataset classes

* fix: change device to cpu when nncf tracing

* feat: add trainable ssd

* refactor: rearange nncf adapter

* feat: add trainable mask rcnn models

* refactor: move out common utils

* fix: ssd head bug

* feat: add lr scheduler for accuracy aware runner

* refactor: nncf module and monkey patch

* fix: proper clustering anchors for ssd

* fix: unable to trace the first module in NNCFNetwork

* fix: bring back ssd head structure

* feat: add train_step method to NNCFNetwork

* fix: mismatches

* fix: update pipeline for wrapper

* fix: add missing file

* Fix merge error

Signed-off-by: Songki Choi <songki.choi@intel.com>
Co-authored-by: Inhyuk Cho <andy.inhyuk.jo@intel.com>
* refactor: remove redundant

* feat: enable mmseg training

* feat: add nncf related stuff

* fix: change lr config

* fix: align nncf target metric

* refactor: use mpa for training and inference

* test: enable tests

* fix: minor bug

* refactor: patcher

* fix: build consistent nncf graph

* fix: minor bug

* fix: remove unused backup

* fix: dealt with datacontainer
* fix: use patcher

* feat: update mmcls version

* feat: enable NNCF for mmcls

* refactor: add build NNCF model functions

* fix: minor bug

* fix: typo

* fix: make sure importing nncf when enabled only

* fix: inherit from base super class of otx
…t#1466)

* feat: export using mmdeploy

* fix: adapt mmdeploy exported model

* test: enable openvino export

* fix: patch depending on fn type

* feat: mmdeploy for classification model

* test: enable export and openvino performance test

* fix: change temporary requirements

* refactor: use builder

* fix: do not propagate logger

* fix: remove image channel format conversion

* fix: handle unlabeled data

* fix: run eval before optimizing nncf network

* feat: change confidence threshold after nncf optimization

* fix: remove redundant attribute

* fix: official released openvino version

* fix: remove redundants
@cih9088 cih9088 requested a review from a team as a code owner January 9, 2023 02:28
@cih9088 cih9088 force-pushed the feature/otx-public-mmcv branch from 52ea65a to 52e3b1d Compare January 9, 2023 03:03
@github-actions github-actions bot added CLI Any changes in OTE CLI DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM ALGO Any changes in OTX Algo Tasks implementation TEST Any changes in tests labels Jan 9, 2023
@cih9088 cih9088 force-pushed the feature/otx-public-mmcv branch from 79e1baa to 3031515 Compare January 10, 2023 04:04
@cih9088 cih9088 marked this pull request as draft January 10, 2023 04:09
@cih9088 cih9088 force-pushed the feature/otx-public-mmcv branch 2 times, most recently from 1a75de3 to 2c21b40 Compare January 12, 2023 13:35
@cih9088 cih9088 force-pushed the feature/otx-public-mmcv branch 6 times, most recently from d781dcd to 33fe7ec Compare January 13, 2023 01:10
@cih9088 cih9088 force-pushed the feature/otx-public-mmcv branch from 734db24 to c821e5b Compare January 13, 2023 09:54
@cih9088
Copy link
Contributor Author

cih9088 commented Jan 13, 2023

@harimkang @goodsong81 @wonjuleee @sungmanc
As we agreed to merge this PR without review and let contributors fix bugs while they are developing on this PR, can we do it without CI result?
CI seems to have a problem with this PR. It gets stuck with no log and reaches to the time limit.

For a reference, here are test results in local with a clean python 3.8.5 environment installed with otx/algorithms/init_venv.sh.
(pre-commit, tests/units, tests/integration/cli, in order)
image

NNCF optimisation for YOLOX failed from time to time.

AssertionError: trained_performance[k]=0.9512195121951218, evaluated_performance[k]=0.9367088607594937

@JihwanEom
Copy link
Contributor

JihwanEom commented Jan 13, 2023

I agree, CI seems like not working properly due to multi-gpu training issues. (Fails after 30 minutes for each TC)

@sungmanc
Copy link
Contributor

@cih9088 , What I have experienced during the CI tests, FAIL cases could be shown in CI although all test cases were passed on the local test. So, checking with CI test is needed before merging the branch unless we are in a hurry.

So, I suggest waiting for the fixing of the CI.

Copy link
Contributor

@harimkang harimkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue on the HPO side, but we decided to fix it in the next PR, not this PR.

Copy link
Contributor

@goodsong81 goodsong81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers!

@goodsong81 goodsong81 merged commit 91fff1d into openvinotoolkit:feature/otx Jan 16, 2023
@sungmanc
Copy link
Contributor

@cih9088 , maybe we need to change the URL of repo to more public one. What do you think?

image

@goodsong81
Copy link
Contributor

@cih9088 , maybe we need to change the URL of repo to more public one. What do you think?

image

This PR #1393 might be helpful for reference.

@cih9088
Copy link
Contributor Author

cih9088 commented Jan 16, 2023

@sungmanc @goodsong81
Agreed. Let me change the repo to the one in #1393 and see how things work out.

@sovrasov
Copy link
Contributor

@sungmanc @goodsong81 Agreed. Let me change the repo to the one in #1393 and see how things work out.

You may need #1486 to use the updated NNCF commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ALGO Any changes in OTX Algo Tasks implementation CLI Any changes in OTE CLI DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants