Operations Tracking Toolkit to monitor down nodes and cables in clusters
- A user be setup on the Extraview server, usually in the resolver role.
- Sockets connectivity to Extraview server API (usually in /evj/ExtraView/ev_api.action on the webserver).
- Python3
- pyextraview (currently required)
- PIP or easy_install
- make
- clush
- Clush must be fully configured to run on cluster.
This project is currently restricted to sites with PBSPro, SGI ICE, and Mellanox Infiniband. Support is planned for generic Linux clusters and Slurm.
- Bad Cable List
- Tool to track, control and repair Infiniband cables.
- Bad Node List
- Tool to track, control and repair PBSPro scheduled compute nodes.
- Clone the repo
user@host# git clone https://github.com/NCAR/ops_tracking_toolkit.git
- Call make to install
user@host# make
TODO