You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use npkit_trace_generator.py to convert the trace file generated by npkit to a json file, I get some errors.
Traceback (most recent call last):
File "/home/zhangshizhuo/msccl/tools/npkit_trace_generator.py", line 232, in <module>
convert_npkit_dump_to_trace(args.input_dir, args.output_dir, npkit_event_def)
File "/home/zhangshizhuo/msccl/tools/npkit_trace_generator.py", line 211, in convert_npkit_dump_to_trace
gpu_events = parse_gpu_event_file(npkit_dump_dir, npkit_event_def, rank, buf_idx, gpu_clock_scale, cpu_clock_scale)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhangshizhuo/msccl/tools/npkit_trace_generator.py", line 95, in parse_gpu_event_file
'ts': curr_cpu_base_time + parsed_gpu_event['timestamp'] / gpu_clock_scale - curr_gpu_base_time,
~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Specifically, I used the msccl-tools/examples/mscclang/allgather_recursive_doubling.py to generate the xml file and communicate on the cluster. This error also occurs when testing reduce scatter, but allreduce and alltoall not. Can you help me with this error? Looking forward to your reply.
Issue
When I use
npkit_trace_generator.py
to convert the trace file generated bynpkit
to a json file, I get some errors.Specifically, I used the
msccl-tools/examples/mscclang/allgather_recursive_doubling.py
to generate the xml file and communicate on the cluster. This error also occurs when testing reduce scatter, but allreduce and alltoall not. Can you help me with this error? Looking forward to your reply.Details
Generate xml file:
mpirun test:
The text was updated successfully, but these errors were encountered: