title | metaTitle | metaDescription | index |
---|---|---|---|
Creating recipes with SparseML Recipe Template |
Creating Sparsification Recipes with SparseML |
Creating Sparsification Recipes with SparseML |
1000 |
This page explains how to create recipes using the SparseML Recipe Template local API.
SparseML Recipe Template is a simple API to generate sparsification recipes on the fly. In the most basic use case it takes in information about the pruning and quantization techniques and produces a relevant recipe.
Optionally path to a torch.jit
loadable model can also be specified in which case the recipe produced is specific to the architecture of the specified model. The sparsification recipes are produced in markdown
or yaml
format which can be directly used across all SparseML utilities. Learn more about sparsification recipes here
SparseML Recipe Template is built for users who may be inclined to create their recipes by hand to enable more
fine-grained control or add custom modifiers when sparsifying a new model from scratch with a single API command.
SparseML Recipe Template can optionally take in the path to an torch.jit
loadable model that a user wishes to generate a Sparsification recipe for; if no such model is provided, a generic recipe based on good defaults and designated parameters is produced. If you are using the python API, you may also supply an already instantiated PyTorch Module.
This utility also provides different options for configuring the generated recipe according to required sparsification techniques. Users are encouraged to change the recipes according to their specific needs.
To learn more about the usage invoke the Recipe Template cli with --help
option:
sparseml.recipe_template --help
The arguments specified configure different aspects of the recipe such as whether it should include pruning and if so, what algorithm should be used, quantization and if so, what algorithm should be used, as well as the learning rate schedule function type, to mention a few.
--pruning true|false(default)|ALGORITHM
- ALGORITHM: global magnitude pruning, gradual magnitude pruning – sparsity per layer, ACDC, Movement, MFAC, OBC
--quantization true|false(default)|TARGET_HARDWARE(default to QAT algorithm ex: vnni, tensorrt)
--lr TYPE
- TYPE: constant, stepped, cyclic, exponential, linear
Once sparseml.recipe_template
is run, a base recipe template will be created with the necessary top level variables based on the compression methods specified (ie pruning, quantization). These templates will either be python strings with {variable}
placeholders left in for on the fly formatting, or functions that take in the desired variables and return the constructed template.
Notes on template construction:
- For QAT only, a constant pruning modifier with
__ALL_PRUNABLE__
will be added. - If a pytorch model is supplied, pruning params will be selected using this helper function. If time is allowed the following filters among others should be added:
- Ignoring depthwise layers in mobilenet-like models (ie convs with groups==num_channels)
- Ignoring the first and last layers for pruning
- [stretch] Setting a minimum number of parameters a layer should have to be considered prunable in the recipe
- Variables and values in the template recipe should then be updated according to user input and the standards set in the comments above along with acceptance criteria.
- Different pruning modifiers may have different constructor argos - will need to account for this when swapping the algorithm
The output will be a recipe.md
file, with information added based on intended usage.
The yaml frontmatter of the markdown recipe will be the generated recipe template. In the markdown section of the recipe, the user will be provided instructions on how to plug this recipe into a Manager and run training aware (and one-shot for testing) using SparseML.
Additionally, a printout of the settable recipe variables and instructions on how to override in the Manager API will be provided. [STRETCH] parse documentation on each variable from comments in the recipe template, or [MINIMAL] direct the user to read in the yaml section above.
- Sparse transfer mode (use constant pruning modifier instead of pruning modifier)
- Convenience functions to replace manager from_yaml (ie.
manager.sparsification(...)
) - For LR modifiers:
- When pruning, cycle through
init_lr
andfinal_lr
- When finetuning, cycle through
init_lr
andfinal_lr
- When quantizing, hold constant at
final_lr
- When pruning, cycle through
When using the Python API, you can also fine tune specific parameters at a more detialed level such as Epoch, LR, Pruning, QAT, and Modifiers in addition to the base arguments to configure the generated recipe based on your needs.