Add ability to gather and analyse some model metadata #376

zjgarvey · 2024-10-23T19:33:39Z

Usage:

When running python run.py <other-args> --get-metadata, this will save a dictionary with the model size and op frequencies to the log directory.

After a run, you can use python utils/find_duplicate_models.py to save or print a json dump of redundant models.

Options:

"-s" "--simplified" will only return the list of model names (doesn't include the corresponding metadata).
"-o" "--output" allows specifying the name of a json file you want to save the result to.
"-r" "--rundirectory" allows specifying a different run directory to search, if run.py was run with a non-default run directory arg.

Sample:

I saved the tests below to a file called sample.txt.

add_test
model--bart-base-booksum--KamilAin
model--bart-base-cnn--ainize
model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla
model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla

With a clean test-run directory, I ran

python run.py --testsfile=sample.txt --stages "setup" --get-metadata

The result of running

python utils/find_duplicate_models.py -s

was:

[
    [
        "model--bart-base-booksum--KamilAin",
        "model--bart-base-cnn--ainize"
    ],
    [
        "model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla",
        "model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla"
    ]
]

and without the -s arg, it includes the metadata for each grouping:

[
    {
        "models": [
            "model--bart-base-booksum--KamilAin",
            "model--bart-base-cnn--ainize"
        ],
        "shared_metadata": {
            "model_size": 712772272,
            "op_frequency": {
                "Add": 227,
                "Cast": 13,
                "Concat": 188,
                "Constant": 886,
                "ConstantOfShape": 6,
                "Div": 44,
                "Equal": 5,
                "Erf": 12,
                "Expand": 5,
                "Gather": 64,
                "Less": 1,
                "MatMul": 133,
                "Mul": 99,
                "Pow": 32,
                "Range": 3,
                "ReduceMean": 64,
                "Reshape": 187,
                "Shape": 67,
                "Slice": 2,
                "Softmax": 18,
                "Sqrt": 32,
                "Squeeze": 2,
                "Sub": 35,
                "Transpose": 90,
                "Unsqueeze": 325,
                "Where": 8
            }
        }
    },
    {
        "models": [
            "model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla",
            "model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla"
        ],
        "shared_metadata": {
            "model_size": 558176646,
            "op_frequency": {
                "Add": 229,
                "Cast": 17,
                "Concat": 193,
                "Constant": 937,
                "ConstantOfShape": 12,
                "Div": 44,
                "Equal": 10,
                "Erf": 12,
                "Expand": 11,
                "Gather": 70,
                "Less": 1,
                "MatMul": 133,
                "Mul": 103,
                "Pow": 32,
                "Range": 6,
                "ReduceMean": 64,
                "Reshape": 191,
                "ScatterND": 2,
                "Shape": 83,
                "Slice": 7,
                "Softmax": 18,
                "Split": 1,
                "Sqrt": 32,
                "Squeeze": 4,
                "Sub": 35,
                "Transpose": 90,
                "Unsqueeze": 333,
                "Where": 13
            }
        }
    }
]

alt_e2eshark/utils/find_duplicate_models.py

saienduri · 2024-10-23T20:01:01Z

Thanks

zjgarvey added 2 commits October 23, 2024 14:17

Add ability to gather and analyse some model metadata

0cfce49

add simplified flag

761ec27

saienduri self-requested a review October 23, 2024 19:49

saienduri reviewed Oct 23, 2024

View reviewed changes

alt_e2eshark/utils/find_duplicate_models.py Show resolved Hide resolved

keep .json files when using cleanup level 3

77aabfe

saienduri self-requested a review October 23, 2024 20:00

saienduri approved these changes Oct 23, 2024

View reviewed changes

saienduri merged commit dc15d53 into nod-ai:main Oct 23, 2024

saienduri mentioned this pull request Oct 28, 2024

removing model which are not valid and duplicates #379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to gather and analyse some model metadata #376

Add ability to gather and analyse some model metadata #376

zjgarvey commented Oct 23, 2024 •

edited

Loading

saienduri commented Oct 23, 2024

Add ability to gather and analyse some model metadata #376

Add ability to gather and analyse some model metadata #376

Conversation

zjgarvey commented Oct 23, 2024 • edited Loading

Usage:

Options:

Sample:

saienduri commented Oct 23, 2024

zjgarvey commented Oct 23, 2024 •

edited

Loading