-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit #2670
Initial commit #2670
Conversation
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the dir structure below? (Gathering entities)
I believe users need to change more for model/entity
than model/module
when they want to bring new models. Categorizing the entity itself looks better to me. I know it could be different from person to person. So, it's just a suggestion.
root/
algo/ # Custom algo (e.g., hierarchical_cls_head)
cli/ # CLI entrypoints
config/ # Default YAML config files
core/
config/ # Structured data type object for configurations
entities/ ## OTX entities
model/
base.py
detection.py
...
data/
base.py
detection.py
...
data/ # Data related things
dataset/ # OTXDataset
base.py
detection.py
...
transform_libs/ # To support transform libraries (e.g., MMCV)
factory.py # Factory to instantiate data related objects
module.py # OTXDataModule
engine/ # PyTorchLightning engine
train.py
...
model/ # Model related things
module/ # OTXLitModule
base.py
detection.py
...
types/ # Enum definitions (e.g. OTXTaskType)
utils/ # Utility functions
recipe/ # Recipe YAML config for each model we support
detection/ # (e.g., rtmdet_tiny)
...
tools/ # Python runnable scripts for some TBD use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, new design looks good to me. On the next refactoring stage I propose to move "recipe" to configs folder, because it is confusing that we have configs in different places + I would rename it. "Recipe" is not appropriate name for model configs.
Also, we should think about folder structure under "algo" folder.
Next, I personally don't understand why we use Structured Configs not for all Yaml config groups from "src/configs". And overall, for me, it is a bit complicated structure using these Structured Configs since different parts of configs located in different places and use two different types of configs (yaml and python dataclasses).
Last thing, creating types like "Transforms = Union[Callable, List[Callable]]", "class OTXBatchLossEntity(Dict[str, Tensor])", etc. seems to be unnecessarily overhead
And of course, we should make Engine to be a Python class
Thank you for a such hard work!
Recipe is not just a model config. Recipe is a literally "recipe" which includes everything to make a deep learning model. It includes not just model configs, but also data transforms, optimizer, LR scheduler, callbacks, .... I did many comments about this to @eugene123tw, but saying it again here. In our design, the recipe is who overrides the base configs ( And, I don't think that "recipe" seems not weird literally in the concept I've been mentioned. I just googled about it and can found an article using the word in this way: http://karpathy.github.io/2019/04/25/recipe/ +You guys have also https://github.com/openvinotoolkit/training_extensions/tree/develop/src/otx/recipes/stages However, if this is really weird to you guys opinion. Yes, we can rename it, but it cannot be mold into
I've been talking about this many time but again. Structure config is important. Let's see this training_extensions/src/otx/v2/adapters/torch/mmengine/modules/utils/config_utils.py Lines 35 to 278 in b47eaae
You never catch which variables
Let's imagine that you are in the middle of implementing this from scratch. Do you think that it is possible to catch On the other hand, let's see this code training_extensions/src/otx/core/data/module.py Lines 30 to 41 in 61c09ab
I think that you have an IDE. If your IDE is visual studio code. You can just move your cursor to training_extensions/src/otx/core/config/data.py Lines 24 to 28 in 61c09ab
This makes such a small learning curve at the first time, but I believe that the hard typing is better for development cycle then the soft typing. It is same as you guys not feeling any cumbersome for
|
Signed-off-by: Kim, Vinnam <vinnam.kim@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still we need to decide related to the pre-commit, for my side, I'm okay to merge AS-IS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eugene123tw , @kprokofi , @sungmanc. @jaegukhyun , @vinnamkim
I think I can enable ruff's "UP007" rule, I'll create a PR for this.
As for the rest, I think most of the rules related to docstrings are ignored, but we'll have to discuss this further. Let's come up with a docstring convention and fix it together.
@@ -0,0 +1,38 @@ | |||
```console | |||
root/ | |||
algo/ # Custom algo (e.g., hierarchical_cls_head) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be just models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just for model see #2682
"FIX002" # line-contains-todo | ||
"FIX002", # line-contains-todo | ||
|
||
"UP007" # Use `X | Y` for type annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree to this. We should use UP007, since it is the new suggested format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we have a really good reason, and agreed by everyone, we should not change the rules here. These are to imrove the overall code quality.
src/otx/config/debug/default.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is a common approach in mmx ecosystem. However, as a developer, I find this hard to follow. In every other config file that refers to this default.yaml
file, we need to go back to this file. Due to this limitation, new mmx api released a new mechanism just to properly navigate between config files.
|
||
@dataclass | ||
class SubsetConfig: | ||
"""DTO for dataset subset configuration.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the real advantage of this DTO again? This seems that all these parameters are manually set here. Would it be an idea to consider a more dynamic approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to ask you what is the real disadvantage of DTO? It is just a group of arguments. People feels uncomfortable if there are so many arguments not organized. I believe it is not just my personal feeling, https://docs.astral.sh/ruff/rules/too-many-arguments/.
Let's see this example.
from dataclasses import dataclass
from jsonargparse import CLI
@dataclass
class TaskInfo:
task_type: str | None = "good_task"
@dataclass
class MyConfig:
"""My engine config
Args:
task_type: My task type
data_root: Root path for dataset
learning_rate: Learning rate for model optimizer
test_after_training: If true, execute the test pipeline for the best validation fit model checkpoint
"""
task_info: TaskInfo
data_root: str
learning_rate: float
test_after_training: bool = True
class MyDataModule:
def __init__(self, data_root: str):
pass
class Engine:
"""My Engine class
Args:
engine_cfg: configuration for my engine
"""
def __init__(
self,
task_info: TaskInfo,
data_root: str,
learning_rate: float,
test_after_training: bool,
):
self.data_module = MyDataModule(data_root=data_root)
pass
@classmethod
def from_config(cls, engine_cfg: MyConfig) -> "Engine":
return cls(
task_type=engine_cfg.task_info,
data_root=engine_cfg.data_root,
learning_rate=engine_cfg.learning_rate,
test_after_training=engine_cfg.test_after_training,
)
if __name__ == "__main__":
CLI(Engine.from_config)
(otx-v2) vinnamki@vinnamki:~/otx/training_extensions$ python test.py --help
usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--engine_cfg CONFIG] [--engine_cfg.task_info CONFIG] [--engine_cfg.task_info.task_type TASK_TYPE] --engine_cfg.data_root DATA_ROOT
--engine_cfg.learning_rate LEARNING_RATE [--engine_cfg.test_after_training {true,false}]
<bound method Engine.from_config of <class '__main__.Engine'>>
options:
-h, --help Show this help message and exit.
--config CONFIG Path to a configuration file.
--print_config[=flags]
Print the configuration after applying all other arguments and exit. The optional flags customizes the output and are one or more keywords separated by comma. The supported flags
are: comments, skip_default, skip_null.
My engine config:
--engine_cfg CONFIG Path to a configuration file.
--engine_cfg.data_root DATA_ROOT
Root path for dataset (required, type: str)
--engine_cfg.learning_rate LEARNING_RATE
Learning rate for model optimizer (required, type: float)
--engine_cfg.test_after_training {true,false}
If true, execute the test pipeline for the best validation fit model checkpoint (type: bool, default: True)
TaskInfo(task_type: str | None = 'good_task'):
--engine_cfg.task_info CONFIG
Path to a configuration file.
--engine_cfg.task_info.task_type TASK_TYPE
(type: str | None, default: good_task)
It just can add one degree of freedom to create Engine
from the DTO, not only from the very long explicit arguments. I cannot understand why you think they are mutually exclusive each other.
In addition, there is clear advantage during the development phase. You see that Engine
has a member object MyDataModule
. data_root
should be propagated from Engine
to MyDataModule
. As you can see, we have to fix more than two places for refactoring or newly added variables. It makes our early development phase slow.
|
||
|
||
@dataclass | ||
class TrainerConfig(DictConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lightning Trainer has more args defined. Why we only have these here? Also, the parameters in Lightning Trainer frequently change, which might cause some issues here that we need to maintain. Again, I'd prefer a more dynamic approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that it is our mission to provide complete training templates (the recipe named in this PR) for Intel device (I guess you read Yannis's reply Mark forwarded to you). Those templates should be reproducible and can enable our users to estimate their cost to solve their problem with our tools.
https://github.com/open-mmlab/mmdetection/blob/main/configs/rtmdet/README.md
It is a very good example for this from mmx
. I personally don't like their code structures and the learning curve to learn mm things is steep. However, you know, this scene has so much reproduction problem and the computation cost is not free (even very costly for deep learning workload, e.g. GPUs). I think that the cost to learn mmx is smaller than the computation cost wasted for trials and errors. I think that everyone who acts economically thinks so as well.
At this time, the complete training templates can have the different parameters from the defaults of PyTorch Lightning (especially for Intel devices since the major player is NVIDIA in this scene).
|
||
def __init__( | ||
self, | ||
dm_subset: DatasetSubset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does dm
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah found that it refers to datumaro. Here is my next question, why do we need to refer to datumaro subset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, let's assume that I don't have a datumaro background, but would like to use OTX. How can I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's assume that I'm using the API, and would like to create OTXDataset
. In this case I will have to create Subset
stuff first, right?
If yes, is there any easier way to create the object with a single liner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the entrypoint I'm planning for Python API.
class OTXDetectionDataset(OTXDataset[DetDataEntity]): | ||
"""OTXDataset class for detection task.""" | ||
|
||
def __init__(self, dm_subset: DatasetSubset, transforms: Transforms) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about subset
?
def __init__(self, dm_subset: DatasetSubset, transforms: Transforms) -> None: | |
def __init__(self, subset: DatasetSubset, transforms: Transforms) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many renaming comments here. Please see my other comments for other naming things too.
|
||
|
||
@dataclass | ||
class OTXDataEntity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the new OTXDatasetItem
from v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we might need a more descriptive name for this dataclass. Data entity is a bit a broad term. Looking at the attributes of this class, i would say this could be called something like OtxDatasetItem
, OtxImage
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my other comments.
|
||
|
||
@dataclass | ||
class ImageInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only store the shape info? Or any plans to add some other metadata as well? depending on this, this might be renamed to some alternatives for clarity maybe. ImageMetadata
, ImageShapeInfo
, ImageShapeMetadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for thee development only? Will it be kept after the release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are hardly under development now. Absolutely, it's temporal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify the purpose of this default yaml file? Not clear why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see src/otx/config/train.yaml
. It refers this file.
defaults: | ||
- default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I like this approach. I think this is one of the main limitations of mmx ecosystem. When I look at a mmx config file, I found extremely annoying to navigate around to find the base config stuff. I understand that this is to avoid duplication; however, there should be trade-off between duplication and abstraction.
This will make the config navigation so difficult that when the reader needs to see a parameter, they will need to navigate around to see what is inside this default stuff.
In fact, even mmx developers agreed to this and introduced a new feature to make this a bit easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If overriding by groups causes more loss than gain, then of course we will give up. However, we don't know for now how much the gain will be since this project is at very early stage. That decision can be made in later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these files automatically or manually generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the latter, is a developer supposed to update these files when there is a change in these configurations parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we want to keep our own default arguments for this, it is not evitable you know. I guess you might say about it like "No, JSONARGPARSE can do ...". I don't think so. For example, if lightning.pytorch.callbacks.EarlyStopping
has default arguments in its construction, it can do the same such as
early_stopping:
_target_: lightning.pytorch.callbacks.EarlyStopping
It will initiate lightning.pytorch.callbacks.EarlyStopping()
with its default arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a future reference for everyone. Please limit the use of abbreviations within the code base. Abbvreviations might be clear to the writer, but not the reader. It hurts the readability.
train: | ||
batch_size: 8 | ||
num_workers: 2 | ||
transform_lib_type: MMDET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does transform_lib_type
mean? You mean like these transforms are coming from the mmdetection library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like these transforms are coming from the mmdetection library?
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the motivation of having debug
configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It enables us to turn on debug related configurations easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why this file is needed, and why it is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to have more parameters in the future? If no, not sure if this is a good idea to store a separate config for 2 lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying again. We are very early stage and we cannot estimate how much this file will be bloating in the future currently. I'll absolutely restructure this later if needed. As you mentioned, refactoring should be done continuously.
@@ -0,0 +1,14 @@ | |||
defaults: | |||
- default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of referring to the default config that has only two lines, can't we just add those parameters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my other comments.
@@ -0,0 +1,5 @@ | |||
defaults: | |||
- default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, as a user/developer, I usually prefer to see the parameters explicitly here. With this approach, I'll need to refer to the default config file, which causes coupling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my other comments.
from dataclasses import dataclass | ||
from typing import Any | ||
|
||
from otx.core.types.transformer_libs import TransformLibType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure TransformLibType
is descriptive enough. Would it be possible to have a more descriptive name to reveal what it actually is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to understand the real value of this DTO. With this approach, we need to create one yaml file and one python file for each of these, no?
Can you elaborate the advantages of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developers can look at how the configuration variables are propagated in the code in a structured way.
if TYPE_CHECKING: | ||
from datumaro import DatasetSubset | ||
|
||
Transforms = Union[Callable, List[Callable]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is following the old annotation format. Can you use the new annotation suggested by pep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's my fault. @harimkang is preparing a fix for this.
|
||
def __init__( | ||
self, | ||
dm_subset: DatasetSubset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's assume that I'm using the API, and would like to create OTXDataset
. In this case I will have to create Subset
stuff first, right?
If yes, is there any easier way to create the object with a single liner?
self, | ||
outputs: Any, # noqa: ANN401 | ||
inputs: T_OTXBatchDataEntity, | ||
) -> Union[T_OTXBatchPredEntity, OTXBatchLossEntity]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be the difference in OTXBatchLossEntity
when we have a single image or a batch? I mean do we need a batch loss entity? Wouldn't having an OtxLoss
do the job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make naming consistency between T_OTXBatchPredEntity and OTXBatchLossEntity
def forward( | ||
self, | ||
inputs: T_OTXBatchDataEntity, | ||
) -> Union[T_OTXBatchPredEntity, OTXBatchLossEntity]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if the above suggestions worked, this would be simplified to something like the following?
def forward(self, inputs: OtxBatch) -> OtxBatchPrediction | OtxBathLoss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or if there is not much difference between OtxBatchLoss
and OtxLoss
it could also be just OtxLoss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my above comments.
compatible for OTX pipelines. | ||
""" | ||
|
||
def __init__(self, config: DictConfig) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, as I mentioned above, config
dependency will really hurt the API use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is inevitable to build mmx
models. It's the way how mmx
gonna work.
|
||
return model | ||
|
||
def _customize_inputs(self, entity: DetBatchDataEntity) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the name of the method is _customize_inputs
, would it be an idea to rename entity
to inputs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my above comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous OTXModel
is also a module
, right? It's inheriting nn.Module
. Can you clarify why it is in a different module? It seems to me that the only difference is that the former is a TorchModel
, while the latter is a LightningModel
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model architectures can be different. But, they share the similar loss
and metric
computations task-wise. That's why I decided to inject OTXModel
to OTXLitModule
.
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.