[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

mdemoret-nv · 2023-07-24T21:09:40Z

How would you describe the priority of this documentation request

Medium

Describe the future/missing documentation

Without running NetQ, there is no way to generate new sample data for the ABP nvsmi detection example. However, something like the following should allow users to generate new data from a local GPU.

import time

import pandas as pd
from pynvml.smi import NVSMI_QUERY_GPU
from pynvml.smi import nvidia_smi

# Output name
output_file = "nvsmi.json"

# Interval
interval_ms = 1000

query_opts = NVSMI_QUERY_GPU.copy()

# Remove the timestamp and supported clocks from the query
del query_opts["timestamp"]
del query_opts["supported-clocks"]

nvsmi = nvidia_smi.getInstance()

with open(output_file, "w", encoding="UTF-8") as f:

    while (True):

        dq = nvsmi.DeviceQuery(list(query_opts.values()))

        output_dicts = []

        # Flatten the GPUs to allow for a new row per GPU
        for gpu in dq["gpu"]:
            single_gpu = dq.copy()

            # overwrite the gpu list with a single gpu
            single_gpu["gpu"] = gpu

            output_dicts.append(single_gpu)

        df = pd.json_normalize(output_dicts, record_prefix="nvidia_smi_log")

        # Rename the id column to match the XML converted output from NetQ
        df.rename(columns={"gpu.id": "gpu.@id", "count": "attached_gpus"}, inplace=True)

        df.rename(columns=lambda x: "nvidia_smi_log" + "." + x, inplace=True)

        # Add the current timestamp
        df.insert(0, "timestamp", time.time())

        df.to_json(f, orient="records", lines=True)

        f.flush()

        time.sleep(interval_ms / 1000.0)

This should be included in the documentation to make it clearer for the user how to run their own data through the example.

Where have you looked?

No response

Code of Conduct

I agree to follow this project's Code of Conduct
I have searched the open documentation issues and have found no duplicates for this bug report

ABHIPATEL98 · 2023-07-26T05:13:33Z

@mdemoret-nv Thank you for your response. We have run the script you provided and it is working fine. However, most of the values are coming back as N/A. When we give the generated data from the script to the "abp_nvsmi_detection" model, it produces an error: "Inference Rate[Complete]: 0 inf [00:00, ? inf/s]". The error message indicates that the model is missing the "nvidia_smi_log.gpu.pci.tx_util" column. Since NetQ is not open source, we are unable to generate new sample data for the ABP nvsmi detection example. Is there any other alternative way to do this? Thank you for your help in advance.

efajardo-nv · 2023-07-26T18:42:43Z

@ABHIPATEL98 could you try using this notebook to retrain the model with the intersection of columns between /datasets/training-data/abp-sample-nvsmi-training-data.json and your generated dataset? If the accuracy is acceptable, you can deploy the new model to Triton and run the inference pipeline against that.

Before running the pipeline, you'll have to update the columns file. Also, model_fea_len in your pipeline run command. For example, if your new column count is 18 (default is 29) you would need to update your command to use this:

pipeline-fil --model_fea_length=18 --columns_file=${MORPHEUS_ROOT}/morpheus/data/columns_fil.txt \

- Add script to ABP nvsmi example for generating sample data - Data generated using script does not contain all the columns used to train the current nvsmi model. Retrain the model using the 18 overlapping columns. - Update model, model config, training notebook/script, feature columns file - Update README with instructions on how to run script Closes #1097 Authors: - Eli Fajardo (https://github.com/efajardo-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) - https://github.com/gbatmaz URL: #1108

mdemoret-nv added the doc Improvements or additions to documentation label Jul 24, 2023

github-project-automation bot added this to Morpheus Boards Jul 24, 2023

github-project-automation bot moved this to Todo in Morpheus Boards Jul 24, 2023

efajardo-nv self-assigned this Jul 26, 2023

efajardo-nv mentioned this issue Jul 27, 2023

ABP nvsmi sample data generation #1108

Merged

rapids-bot bot closed this as completed in #1108 Aug 31, 2023

github-project-automation bot moved this from Todo to Done in Morpheus Boards Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

mdemoret-nv commented Jul 24, 2023

ABHIPATEL98 commented Jul 26, 2023

efajardo-nv commented Jul 26, 2023

[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

Comments

mdemoret-nv commented Jul 24, 2023

How would you describe the priority of this documentation request

Describe the future/missing documentation

Where have you looked?

Code of Conduct

ABHIPATEL98 commented Jul 26, 2023

efajardo-nv commented Jul 26, 2023