Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radiation plugin: Save amplitudes additionally per rank #4456

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Feb 8, 2023

Unlike the aggregated amplitudes, this uses std::complex to store the output, making things a bit inconsistent. The goal should be to use std::complex for the aggregated output too, but Richard says that some postprocessing still relies on the current format. (In the same instant, we should remove the useless third dimension there)

Output now looks like (with two ranks):

$ bpls radiationOpenPMD/e_radAmplitudes200.bp/   
  double          /data/200/DetectorMesh/Amplitude/x_Im           {128, 1024, 1}
  double          /data/200/DetectorMesh/Amplitude/x_Re           {128, 1024, 1}
  double          /data/200/DetectorMesh/Amplitude/y_Im           {128, 1024, 1}
  double          /data/200/DetectorMesh/Amplitude/y_Re           {128, 1024, 1}
  double          /data/200/DetectorMesh/Amplitude/z_Im           {128, 1024, 1}
  double          /data/200/DetectorMesh/Amplitude/z_Re           {128, 1024, 1}
  double          /data/200/DetectorMesh/DetectorDirection/x      {128, 1, 1}
  double          /data/200/DetectorMesh/DetectorDirection/y      {128, 1, 1}
  double          /data/200/DetectorMesh/DetectorDirection/z      {128, 1, 1}
  double          /data/200/DetectorMesh/DetectorFrequency/omega  {1, 1024, 1}
  double complex  /data/200/DetectorMesh/amplitude_distributed/x  {2, 128, 1024}
  double complex  /data/200/DetectorMesh/amplitude_distributed/y  {2, 128, 1024}
  double complex  /data/200/DetectorMesh/amplitude_distributed/z  {2, 128, 1024}

TODO

  • Testing, documentation
  • mesh naming?
  • Command line option to switch this on off

@PrometheusPi
Copy link
Member

Thanks @franzpoeschel for building this. I agree that at some point, it should be converted to complex only.
If I understand your bpls output above correctly, amplitude_distributed contains the data of two nodes which is represented by the first dimension of the 2 x 128 x 1024 array, right?

@franzpoeschel
Copy link
Contributor Author

Thanks @franzpoeschel for building this. I agree that at some point, it should be converted to complex only. If I understand your bpls output above correctly, amplitude_distributed contains the data of two nodes which is represented by the first dimension of the 2 x 128 x 1024 array, right?

Yes, exactly.

@PrometheusPi
Copy link
Member

@franzpoeschel what is the status of this pull request. Should I do some testing?

@franzpoeschel
Copy link
Contributor Author

@franzpoeschel what is the status of this pull request. Should I do some testing?

It would be helpful, yes. It's all implemented, I just need to add some .rst documentation.

@PrometheusPi
Copy link
Member

@franzpoeschel I just saw you pushed a few minutes ago. Should I still review?

@franzpoeschel
Copy link
Contributor Author

@franzpoeschel I just saw you pushed a few minutes ago. Should I still review?

Yes, I only rebased

@PrometheusPi
Copy link
Member

@franzpoeschel You pushed some changes. But I am not sure what changed - could you elaborate?

@franzpoeschel
Copy link
Contributor Author

I rebased after disentangling this PR from the one with the Juwels templates
Need to have a look at why it fails

@franzpoeschel
Copy link
Contributor Author

It seems that the CI is just failing some jobs? Otherwise, I've changed nothing.

@PrometheusPi
Copy link
Member

I will trigger the CI again

@PrometheusPi
Copy link
Member

@franzpoeschel I just tested your pull request using the share/picongpu/example/Bunch setup and adding --e_radiation.distributedAmplitude 1 to the call of PIConGPU.
As soon as the radiation plugin started, the simulation crashed with a segmentation fault.

Two things that this default examole does differently than astrophysics simulations:

  • used moving window
  • not all GPUs have particles and thus some have no (everything zeros) radiation

@franzpoeschel
Copy link
Contributor Author

Can you give me the entire command line call for PIConGPU that you used?
The radiation being zero should be no issue, as that does not affect the actual geometry.

@PrometheusPi
Copy link
Member

PrometheusPi commented Apr 4, 2023

here is the call of picongpu:

  source /.../pr_4456/runs/001_bunch/tbg/handleSlurmSignals.sh mpiexec -np 32 /.../pr_4456/runs/001_bunch/input/bin/picongpu  -d 2 8 2                    -g 128 3072 128                      -s 7500                         --periodic 1 0 1                         --e_energyHistogram.period 500 --e_energyHistogram.filter all --e_energyHistogram.binCount 1024 --e_energyHistogram.minEnergy 0 --e_energyHistogram.maxEnergy 500000                                                                      --e_radiation.period 1 --e_radiation.dump 2 --e_radiation.totalRadiation                --e_radiation.start 2800 --e_radiation.end 3000 --e_radiation.distributedAmplitude 1                               --e_macroParticlesCount.period 100                          --versionOnce

/.../ is just an anonymization of the directory

@PrometheusPi
Copy link
Member

Setting --e_radiation.distributedAmplitude to 0 also crashes with the start of the radiation plugin at iteration 2800.

@PrometheusPi
Copy link
Member

I will test whether the dev works.

@PrometheusPi
Copy link
Member

dev works fine.
Should I test something specific @franzpoeschel ?

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Apr 5, 2023

I can't currently reproduce the crash, I ran exactly your command line call on a default PIConGPU Bunch simulation.
Things are getting incredibly slow once the radiation plugin sets in, but that's probably in the nature of what the plugin does (is it?), but otherwise things are functioning.

> ls
e_radAmplitudes_2800_0_0_0.bp4  e_radAmplitudes_2802_0_0_0.bp4  e_radAmplitudes_2804_0_0_0.bp4  e_radAmplitudes_2806_0_0_0.bp4
> bpls e_radAmplitudes_2800_0_0_0.bp4/
  double          /data/2800/DetectorMesh/Amplitude/x_Im           {128, 1024, 1}
  double          /data/2800/DetectorMesh/Amplitude/x_Re           {128, 1024, 1}
  double          /data/2800/DetectorMesh/Amplitude/y_Im           {128, 1024, 1}
  double          /data/2800/DetectorMesh/Amplitude/y_Re           {128, 1024, 1}
  double          /data/2800/DetectorMesh/Amplitude/z_Im           {128, 1024, 1}
  double          /data/2800/DetectorMesh/Amplitude/z_Re           {128, 1024, 1}
  double complex  /data/2800/DetectorMesh/Amplitude_distributed/x  {32, 128, 1024}
  double complex  /data/2800/DetectorMesh/Amplitude_distributed/y  {32, 128, 1024}
  double complex  /data/2800/DetectorMesh/Amplitude_distributed/z  {32, 128, 1024}
  double          /data/2800/DetectorMesh/DetectorDirection/x      {128, 1, 1}
  double          /data/2800/DetectorMesh/DetectorDirection/y      {128, 1, 1}
  double          /data/2800/DetectorMesh/DetectorDirection/z      {128, 1, 1}
  double          /data/2800/DetectorMesh/DetectorFrequency/omega  {1, 1024, 1}

I'm running this on the K80 partition of Hemera and the memory of that partition is barely sufficient to run the simulation, but it works.

What versions of openPMD and ADIOS2 are you using? Or are you using HDF5? Where do you run the setup and with which software environment? Did you change any templates?

include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
@PrometheusPi
Copy link
Member

@franzpoeschel Sorry for the late reply.
Yes, the radiation plugin is extremely expensive computational-wise. The plugin causes the slowdown.
I ran on your branch, using the default fwkt_v100 setup. This uses:

  • openpmd/0.14.3-cuda115
  • adios2/2.7.1-cuda115
  • hdf5-parallel/1.12.0-cuda115

The radiation plugin itself tried to create a hdf5 file. (the file was created but it is zero bytes in size.)

@PrometheusPi
Copy link
Member

I can confirm that the test case runs on the hemera k80, but still crashes on the hemera v100.

@PrometheusPi
Copy link
Member

I will check the validity of the k80 data asap.

@franzpoeschel
Copy link
Contributor Author

Nope, even on V100 everything finishes cleanly for me..?

module load python/3.6.5
module load git
module load gcc/11.2.0
module load cmake/3.20.2
module load cuda/11.5
module load openmpi/4.1.1-cuda115
module load boost/1.78.0
module load zlib/1.2.11
module load libfabric/1.11.1-co79
module load c-blosc/1.14.4
module load hdf5-parallel/1.12.0-cuda115
module load libpng/1.6.35
module load adios2/2.7.1-cuda115
module load openpmd/0.14.3-cuda115

export PIC_BACKEND="cuda:70"

In this environment, I compiled a normal Bunch simulation and ran it:

#!/usr/bin/env bash
#SBATCH -n 32
#SBATCH -p fwkt_v100
#SBATCH -A fwkt_v100
#SBATCH --gres=gpu:4
#SBATCH --tasks-per-node=4

binary="$(realpath "$1/bin/picongpu")"

mkdir -p "$2"
cd "$2"

mpirun "$binary" \
    -d 2 8 2 \
    -g 128 3072 128 \
    -s 7500 \
    --periodic 1 0 1 \
    --e_energyHistogram.period 500 \
    --e_energyHistogram.filter all \
    --e_energyHistogram.binCount 1024 \
    --e_energyHistogram.minEnergy 0 \
    --e_energyHistogram.maxEnergy 500000 \
    --e_radiation.period 1 \
    --e_radiation.dump 2 \
    --e_radiation.totalRadiation \
    --e_radiation.start 2800 \
    --e_radiation.end 3000 \
    --e_radiation.distributedAmplitude 1 \
    --e_macroParticlesCount.period 100 \
    --versionOnce
> /trinity/shared/pkg/filelib/openpmd/0.14.3-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/bin/openpmd-ls e_radAmplitudes_%T_0_0_0.h5
openPMD series: e_radAmplitudes_%T_0_0_0
openPMD standard: 1.1.0
openPMD extensions: 0

data author: unknown
data created: 2023-04-21 17:56:54 +0200
data backend: HDF5
generating machine: unknown
generating software: PIConGPU (version: 0.7.0-dev)
generating software dependencies: unknown

number of iterations: 101 (fileBased)
  all iterations: 2800 2802 2804 2806 2808 2810 2812 2814 2816 2818 2820 2822 2824 2826 2828 2830 2832 2834 2836 2838 2840 2842 2844 2846 2848 2850 2852 2854 2856 2858 2860 2862 2864 2866 2868 2870 2872 2874 2876 2878 2880 2882 2884 2886 2888 2890 2892 2894 2896 2898 2900 2902 2904 2906 2908 2910 2912 2914 2916 2918 2920 2922 2924 2926 2928 2930 2932 2934 2936 2938 2940 2942 2944 2946 2948 2950 2952 2954 2956 2958 2960 2962 2964 2966 2968 2970 2972 2974 2976 2978 2980 2982 2984 2986 2988 2990 2992 2994 2996 2998 3000 

number of meshes: 4
  all meshes:
    Amplitude
    Amplitude_distributed
    DetectorDirection
    DetectorFrequency

number of particle species: 0

@PrometheusPi
Copy link
Member

@franzpoeschel that is strange. I doubt that the only difference in running it, that O was using tbg and thus that my call of picongpu looks slightly different.
I will have a closer look at what other difference there are ...

@PrometheusPi
Copy link
Member

I quickly had a look at my k80 data. Everything looks plausible.
I am having a constant factor offset, but this most likely my fault. I will check the code in detail.

Copy link
Member

@PrometheusPi PrometheusPi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look all correct to me. You seem to use the same data for the new method as for the previous lastRad output.
Nevertheless, the spectra have a significant difference in magnitude and range (thus it is not just a factor missing somewhere). I will add details below.

include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
include/picongpu/plugins/radiation/Radiation.hpp Outdated Show resolved Hide resolved
@PrometheusPi
Copy link
Member

Details on differences:

If we take from the default Bunch example the last output at 3000, we get the final radiation as folllows:

series = io.Series("../runs/003_bunch_k80/simOutput/radiationOpenPMD/e_radAmplitudes_%T_0_0_0.h5", access=io.Access_Type.read_only)
it = series.iterations[3000]

data_all_x_Im = it.meshes["Amplitude"]["x_Im"].load_chunk()
data_all_x_Re = it.meshes["Amplitude"]["x_Re"].load_chunk()

data_all_y_Im = it.meshes["Amplitude"]["y_Im"].load_chunk()
data_all_y_Re = it.meshes["Amplitude"]["y_Re"].load_chunk()

data_all_z_Im = it.meshes["Amplitude"]["z_Im"].load_chunk()
data_all_z_Re = it.meshes["Amplitude"]["z_Re"].load_chunk()

series.flush()

# convert to complex numbers
data_all_x = (data_all_x_Re * 1j * data_all_x_Im)[:,:,0]
data_all_y = (data_all_y_Re * 1j * data_all_y_Im)[:,:,0]
data_all_z = (data_all_z_Re * 1j * data_all_z_Im)[:,:,0]

The data is still in PIConGPU units [sqrt{Js}].

To get the intensity in x-polarization, we need to compute the absolute square of the complex amplitude in x-polarization.

tmp_old = np.abs((data_all_x)**2)

Ploting the data as:

plt.pcolormesh(tmp_old, norm=LogNorm())
plt.colorbar()

results in
grafik

The maximum is $>10^{29}$ the minimum is $<10^{1}$, thus the range is $>29$ orders of magnitude.

@PrometheusPi
Copy link
Member

If we want to just check whether the new per-MPI-rank results in the same final result, we need to sum over all MPI ranks and sum over all times.
This can be done as follows:

N_t = 201 # number of openPMD radiation plugin outputs
data_overTime_x = np.zeros((N_t, 32, 128, 1024), dtype=np.complex128)
data_overTime_y = np.zeros((N_t, 32, 128, 1024), dtype=np.complex128)
data_overTime_z = np.zeros((N_t, 32, 128, 1024), dtype=np.complex128)

for i, it in enumerate(series.iterations):
    it = series.iterations[it]
    data_dist_x = it.meshes["Amplitude_distributed"]['x'].load_chunk()
    data_dist_y = it.meshes["Amplitude_distributed"]['y'].load_chunk()
    data_dist_z = it.meshes["Amplitude_distributed"]['z'].load_chunk()
    series.flush()
    data_overTime_x[i, :, :, : ] = data_dist_x
    data_overTime_y[i, :, :, : ] = data_dist_y
    data_overTime_z[i, :, :, : ] = data_dist_z

We sum over time and MI ranks and then compute the absolute square to convert to intensity (again just x-component)

tmp_new = np.abs(np.sum(np.sum(data_overTime_x[:, :, :, :], axis=0)[:, :, :], axis=0)**2)

Plotting this via:

plt.pcolormesh(tmp_new, norm=LogNorm())
plt.colorbar()

gives
grafik

and results in a maximum of $>10^{16}$, a minimum of $<10^2$ and thus a range of $>14$ orders of magnitude.

There seems to be no trivial factor missing.

@PrometheusPi
Copy link
Member

If we plot the relative difference as follows

plt.pcolormesh(tmp_new / tmp_old, norm=LogNorm())
plt.colorbar()

we get:
grafik

The peak radiation is underestimated while the in-between peaks radiation is overestimated by the MPI distributed version.
This would indicate not enough time integration. But the number of side bands and the peak with agree (which are a different metric for duration.)

@PrometheusPi
Copy link
Member

@franzpoeschel found a bug in my python script - his code was/is right - just my analysis is wrong.
Real and Complex value should be added

data_all_x = (data_all_x_Re + 1j * data_all_x_Im)[:,:,0]
data_all_y = (data_all_y_Re + 1j * data_all_y_Im)[:,:,0]
data_all_z = (data_all_z_Re + 1j * data_all_z_Im)[:,:,0]

This results in the following plot
grafik

and close to no !!! difference at all:
grafik

@franzpoeschel
Copy link
Contributor Author

Great! Thanks for checking this :)

@PrometheusPi
Copy link
Member

Regarding your open bulet points @franzpoeschel:
I think the naming is great 👍
I would testing considered done.
Do you want to add some more documentation? The one that is already in is good enough for me.

@franzpoeschel
Copy link
Contributor Author

Regarding your open bulet points @franzpoeschel: I think the naming is great +1 I would testing considered done. Do you want to add some more documentation? The one that is already in is good enough for me.

I only forgot checking the bullet, the documentation is there I'd say

@PrometheusPi
Copy link
Member

@steindev just ran into the same crash on hemera v100 as I did.

@franzpoeschel
Copy link
Contributor Author

I shall try again with tbg

@PrometheusPi
Copy link
Member

@franzpoeschel with your script it works for me - with tbg it doesn't. I do not understand 🤯

@franzpoeschel
Copy link
Contributor Author

Yep, I also see the crash with tbg.....................??????
Also, reducing this to a single nodes still shows the bug

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Apr 27, 2023

It's this line in the template script triggering the crash:

# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
# fallback ROMIO backend instead.
#   see bug https://github.com/open-mpi/ompi/issues/6285
export OMPI_MCA_io=^ompio

@franzpoeschel
Copy link
Contributor Author

This seems to be the same bug that we already saw with chunking enabled in the normal openPMD plugin. ROMIO can be used, but --e_radiation.openPMDConfig '{"hdf5":{"dataset":{"chunks":"none"}}}' must be specified.

Complete error backtrace is:

[gv020:134413] *** Process received signal ***
[gv020:134413] Signal: Segmentation fault (11)
[gv020:134413] Signal code: Address not mapped (1)
[gv020:134413] Failing at address: (nil)
[gv020:134413] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2aaaacde6630]
[gv020:134413] [ 1] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/openmpi/mca_io_romio321.so(ADIOI_Flatten+0x952)[0x2aaae5402df2]
[gv020:134413] [ 2] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/openmpi/mca_io_romio321.so(ADIOI_Flatten_datatype+0x107)[0x2aaae5404617]
[gv020:134413] [ 3] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/openmpi/mca_io_romio321.so(ADIO_Set_view+0x1f5)[0x2aaae53fad25]
[gv020:134413] [ 4] [gv024:114504] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2aaaacde6630]
[gv020:134413] [ 5] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/openmpi/mca_io_romio321.so(mca_io_romio321_file_set_view+0xcc)[0x2aaae53db55c]
[gv020:134413] [ 6] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/libmpi.so.40(MPI_File_set_view+0x108)[0x2aaaaad430c8]
[gv020:134413] [ 7] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/libmpi.so.40(MPI_File_set_view+0x108)[0x2aaaaad430c8]
[gv020:134413] [ 8] /trinity/shared/pkg/mpi/openmpi/4.1.1-cuda115/gcc/11.2.0/lib/libmpi.so.40(MPI_File_set_view+0x108)[0x2aaaaad430c8]
[gv020:134413] [ 9] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5FD_write+0xed)[0x2aaaae9aee17]
[gv020:134413] [10] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5F__accum_write+0x110e)[0x2aaaae975997]
[gv020:134413] [11] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5PB_write+0xd6)[0x2aaaaeb1c707]
[gv020:134413] [12] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x10811a)[0x2aaaae8f611a]
[gv020:134413] [13] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5D__chunk_allocate+0x163f)[0x2aaaae9029ed]
[gv020:134413] [14] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x12ee36)[0x2aaaae91ce36]
[gv020:134413] [15] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5D__alloc_storage+0x275)[0x2aaaae92352c]
[gv020:134413] [16] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5D__alloc_storage+0x275)[0x2aaaae92352c]
[gv020:134413] [17] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x12dadf)[0x2aaaae91badf]
[gv020:134413] [18] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5D__create+0xca2)[0x2aaaae91e91b]
[gv020:134413] [19] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x141635)[0x2aaaae92f635]
[gv020:134413] [20] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x25981f)[0x2aaaaea4781f]
[gv020:134413] [21] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x25981f)[0x2aaaaea4781f]
[gv020:134413] [22] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5G_traverse+0xc7)[0x2aaaae9fae1b]
[gv020:134413] [23] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5G_traverse+0xc7)[0x2aaaae9fae1b]
[gv020:134413] [24] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x2518ef)[0x2aaaaea3f8ef]
[gv020:134413] [25] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5L_link_object+0x6a)[0x2aaaaea49a68]
[gv020:134413] [26] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(+0x45b9a3)[0x2aaaaec499a3]
[gv020:134413] [27] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5VL__native_dataset_create+0x1ac)[0x2aaaaec6e922]
[gv020:134413] [28] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5VL_dataset_create+0xc1)[0x2aaaaec53aa0]
[gv020:134413] [29] /trinity/shared/pkg/filelib/hdf5-parallel/1.12.0-cuda115/gcc/11.2.0/openmpi/4.1.1-cuda115/lib/libhdf5.so.200(H5VL_dataset_create+0xc1)[0x2aaaaec53aa0]

And it seems to be this issue: open-mpi/ompi#7795

@PrometheusPi
Copy link
Member

Thanks's @franzpoeschel for investigating this. Since this is not an issue with your code and since it is avoidable with proper settings, I will merge your pull request now.

@PrometheusPi PrometheusPi merged commit 15e29ce into ComputationalRadiationPhysics:dev May 2, 2023
@franzpoeschel
Copy link
Contributor Author

Since this is not an issue with your code and since it is avoidable with proper settings, I will merge your pull request now.

As a note: This was very likely triggered by making this plugin parallel, as it was formerly serial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants