Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

fix(speedup): refactor the execution logic of functions in speedup #5107

Closed
wants to merge 9 commits into from

Conversation

Louis-J
Copy link
Contributor

@Louis-J Louis-J commented Sep 2, 2022

Description

In speedup, we use a non-general trick to treat the prim ops. We analyze the relationship between the prim ops, then delete all the prim ops, finally connect the non-prim op directly. We also assume that all the intermediate variables are tensors, tuples of tensors, or lists of tensors. But it'll cause many problems in various models.

So, it's better to execute all the prim ops for real. In this pr, I remove the deletion of prim ops, and execute the prim recursively. And the assumption of variable types is also removed. Now it's closer to the original execution.

But it changed a lot and is untested now. More work is needed.

#5097

Test Options

  • fast test
  • full test - HPO
  • full test - NAS
  • full test - compression

Checklist

  • test case
  • doc

How to test

@@ -105,4 +105,5 @@ In addition, for the convolutional layers that have more than one filter group,
``dependency-aware pruner`` will also try to prune the same number of the channels for each filter group.
Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains (channel dependency, etc) to improve the final speed gain after the speedup process.

Operations that will be recognized as having channel dependencies: add/sub/mul/div, addcmul/addcdiv, logical_and/or/xor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.?

@@ -725,8 +717,6 @@ def _build_graph(self):

# associate module name with their trace graph nodes
for node in graph.nodes():
if node.kind() == CONSTANT_KIND:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we should skip the CONSTANT_KIND and when we should not? I found you did not remove all the if node.kind() == CONSTANT_KIND: in the code

if node.type == 'module':
inputs_name = node.inputs
else:
inputs_name = [val_node.debugName() for val_node in node.key_node.inputs()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is key_node? maybe we need a doc to explain the important attrs in a node

elif isinstance(obj, dict):
return {k: recr_detacher(v) for k, v in obj.items()}
else:
return obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can obj a customized data type with detach function?

try:
    return obj.detach()
except AttributeError:
    return obj

@@ -297,16 +318,15 @@ def update_indirect_sparsity(self, node):
debug_name = auto_infer.input_debugname[in_id]

last_output = self.internal_result[debug_name]
# if isinstance(last_output, torch.Tensor):
# TODO what if last output is tuple/list of tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what if last output is tuple/list of tensor?

@Louis-J Louis-J added v3.0 and removed v2.9.1 labels Oct 19, 2022
@liuzhe-lz liuzhe-lz added v3.1 and removed v3.0 labels Nov 23, 2022
@Louis-J Louis-J closed this Dec 6, 2022
@Louis-J
Copy link
Contributor Author

Louis-J commented Dec 6, 2022

totally refactored old speedup in https://github.com/Louis-J/nni/tree/refactor_speedup

@Louis-J
Copy link
Contributor Author

Louis-J commented Dec 6, 2022

the speedup v2 now is runnable. so better to reassess that refactor the old speedup is suitable or not.

@Lijiaoa
Copy link
Contributor

Lijiaoa commented Feb 16, 2023

#5143

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants