Skip to content

Operations Tracking Toolkit to monitor down nodes and cables in clusters

License

Notifications You must be signed in to change notification settings

NCAR/ops_tracking_toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ops_tracking_toolkit

Operations Tracking Toolkit to monitor down nodes and cables in clusters

Requirements:

  • A user be setup on the Extraview server, usually in the resolver role.
    • Sockets connectivity to Extraview server API (usually in /evj/ExtraView/ev_api.action on the webserver).
  • Python3
  • pyextraview (currently required)
  • PIP or easy_install
  • make
  • clush
    • Clush must be fully configured to run on cluster.

Limitations:

This project is currently restricted to sites with PBSPro, SGI ICE, and Mellanox Infiniband. Support is planned for generic Linux clusters and Slurm.

Tools Provided:

  • Bad Cable List
    • Tool to track, control and repair Infiniband cables.
  • Bad Node List
    • Tool to track, control and repair PBSPro scheduled compute nodes.

Setup:

  1. Clone the repo
user@host# git clone https://github.com/NCAR/ops_tracking_toolkit.git
  1. Call make to install
user@host# make

Configuration

TODO

About

Operations Tracking Toolkit to monitor down nodes and cables in clusters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published