Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: Add information on how to generate new sample data for the ABP nvsmi example #1097

Closed
2 tasks done
mdemoret-nv opened this issue Jul 24, 2023 · 2 comments · Fixed by #1108
Closed
2 tasks done
Assignees
Labels
doc Improvements or additions to documentation

Comments

@mdemoret-nv
Copy link
Contributor

How would you describe the priority of this documentation request

Medium

Describe the future/missing documentation

Without running NetQ, there is no way to generate new sample data for the ABP nvsmi detection example. However, something like the following should allow users to generate new data from a local GPU.

import time

import pandas as pd
from pynvml.smi import NVSMI_QUERY_GPU
from pynvml.smi import nvidia_smi

# Output name
output_file = "nvsmi.json"

# Interval
interval_ms = 1000

query_opts = NVSMI_QUERY_GPU.copy()

# Remove the timestamp and supported clocks from the query
del query_opts["timestamp"]
del query_opts["supported-clocks"]

nvsmi = nvidia_smi.getInstance()

with open(output_file, "w", encoding="UTF-8") as f:

    while (True):

        dq = nvsmi.DeviceQuery(list(query_opts.values()))

        output_dicts = []

        # Flatten the GPUs to allow for a new row per GPU
        for gpu in dq["gpu"]:
            single_gpu = dq.copy()

            # overwrite the gpu list with a single gpu
            single_gpu["gpu"] = gpu

            output_dicts.append(single_gpu)

        df = pd.json_normalize(output_dicts, record_prefix="nvidia_smi_log")

        # Rename the id column to match the XML converted output from NetQ
        df.rename(columns={"gpu.id": "gpu.@id", "count": "attached_gpus"}, inplace=True)

        df.rename(columns=lambda x: "nvidia_smi_log" + "." + x, inplace=True)

        # Add the current timestamp
        df.insert(0, "timestamp", time.time())

        df.to_json(f, orient="records", lines=True)

        f.flush()

        time.sleep(interval_ms / 1000.0)

This should be included in the documentation to make it clearer for the user how to run their own data through the example.

Where have you looked?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open documentation issues and have found no duplicates for this bug report
@mdemoret-nv mdemoret-nv added the doc Improvements or additions to documentation label Jul 24, 2023
@ABHIPATEL98
Copy link

@mdemoret-nv Thank you for your response. We have run the script you provided and it is working fine. However, most of the values are coming back as N/A. When we give the generated data from the script to the "abp_nvsmi_detection" model, it produces an error: "Inference Rate[Complete]: 0 inf [00:00, ? inf/s]". The error message indicates that the model is missing the "nvidia_smi_log.gpu.pci.tx_util" column. Since NetQ is not open source, we are unable to generate new sample data for the ABP nvsmi detection example. Is there any other alternative way to do this? Thank you for your help in advance.

@efajardo-nv
Copy link
Contributor

@ABHIPATEL98 could you try using this notebook to retrain the model with the intersection of columns between /datasets/training-data/abp-sample-nvsmi-training-data.json and your generated dataset? If the accuracy is acceptable, you can deploy the new model to Triton and run the inference pipeline against that.

Before running the pipeline, you'll have to update the columns file. Also, model_fea_len in your pipeline run command. For example, if your new column count is 18 (default is 29) you would need to update your command to use this:

pipeline-fil --model_fea_length=18 --columns_file=${MORPHEUS_ROOT}/morpheus/data/columns_fil.txt \

@efajardo-nv efajardo-nv self-assigned this Jul 26, 2023
rapids-bot bot pushed a commit that referenced this issue Aug 31, 2023
- Add script to ABP nvsmi example for generating sample data
- Data generated using script does not contain all the columns used to train the current nvsmi model. Retrain the model using the 18 overlapping columns.
- Update model, model config, training notebook/script, feature columns file
- Update README with instructions on how to run script

Closes #1097

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)
  - https://github.com/gbatmaz

URL: #1108
@github-project-automation github-project-automation bot moved this from Todo to Done in Morpheus Boards Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants