Skip to content

2426 hpc metrics

Florian Deconinck edited this page Nov 27, 2024 · 1 revision

HPC metrics

In collaboration with the NCCS, we are listing below a range of metrics that should be evaluated for each benchmark.

  • Time to solution
    • Despite no aiming for production-ready code by the end of the project we will still keep an eye on the "job-level" turn around and document improvement and potential non-numerics slowdown due to the technology swap.
  • Energy
    • Light software sampling to document amplitude of TPU’s chip: imprecise but can be easily ran with little overhead.
    • Hardware monitoring on selected runs for precise measure: precise but requires close cooperation with NCCS sys admin and IT.
  • Node-to-node
    • Compare CPU node with GPU nodes
    • Minimize generation difference for valid comparison
  • Node usage
    • Chip usage: measure in % of theoretical throughput rather than FLOPs
    • Chip idle time: important for hybrid work
  • Minimal hardware requirements
    • For developments
    • For scientific runs
Clone this wiki locally