-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex Numbers: define touple of two as integral? #29
Comments
Careful: HDF5 Fortran support for compound data types is pretty bad. Link moved: archive |
Additionally other formats may not support directly complex numbers at all. To maximize portability, I would vote for the standard mandating complex storage as real and imaginary datasets or amplitude and phase datasets. If using amplitude and phase, should the conversion factor to SI convert to radians or radians/2pi? |
Side note: ADIOS2 has first-class complex support.
I am not entirely decided on that yet, maybe some formats just don't support complex and I am really intrigued to push those to improve instead. We don't have "required" complex records or attributes yet besides the geometry |
I'm not sure what you mean by "intrigued". One way to handle this would be to say that if the format supports complex numbers than that should be used and if not then use real/imaginary or amp/phase sub datasets.
Actually for the beam physics extension there is the photon polarization which is complex. |
@ax3l If the format supports complex numbers as a native type then use the native type otherwise store the complex numbers in sub-groups as real and imaginary with labels I would not mandate that the phase be in any specific interval like [-pi, pi] since the integer part of the phase may be important. Question: If the format supports complex numbers should a writer have the option to not use the native complex and instead store with re/im or amp/phase sub-groups? |
Note: Since the beam physics extension is using complex numbers in more than one place, if a convention for complex numbers does not get into the main standard, then a convention will have to be put into the extension. |
It looks like pytables and h5py both use 'r' and 'i' as names for the compound dataset. Actually, FORTRAN has had a complex datatype since at least FORTRAN 77. I don't like also specifying phase and amplitude - the user should convert. |
Yes using 'r' and 'i' to be compatible with pytables and h5py would be a good idea.
Fine with me. |
Yes, we should also stay by default what C11 and C++ do: a simple POD datatype of two The problem is less how to layout the complex but more how to describe it on the low level in file formats without too much introduction of conventions we have to carry. ADIOS' The Fortran issue with querying complex numbers mentioned above remains, have you by coincidence tried reading/writing a |
Chris has been using h5py. I have not. |
@ChristopherMayes did writing a data set and/or attribute with complex numbers in As far as C++ and Python is concerned, we will just add it here: openPMD/openPMD-api#203 and relax potential wording in the standard. |
How would this new standard definition effect ℂ3 datasets as used in PIConGPU? |
Here there are three complex numbers (x, y, z). PIConGPU uses a single level to store all 6 (real and imaginary) arrays. The convention for openPMD would be to have x, y, and z groups and then store "r" and "i" arrays as datasets of these groups. That is there would be two levels. |
@ax3l @ChristopherMayes |
For a complex array Z, having a group |
@ChristopherMayes |
You could then store them as regular record with record components as ℂ.
I think generally extensions can do this but it makes parsing by a pure base-standard reader more complicated. My idea is actually, like the title of the issue, to just allow integral complex types (just like float) since I think HDF5 and ADIOS support this well for pairs of single and double precision. Yes, compatible to the C11 complex (, That's at least something that one can implement in base standard readers and writers and that's exactly what This is how this is handled in HDF5 by convention of import h5py as h5
import numpy as np
x = np.array([1.+2j, 3.+4j, 5.+6j], np.cdouble)
x
# array([1.+2.j, 3.+4.j, 5.+6.j])
x.dtype
# dtype('complex128')
f = h5.File('complex.h5', 'w')
f['/mydata'] = x
f.close() Creates a $ h5ls complex.h5
mydata Dataset {3}
$ h5ls -d -r complex.h5
/ Group
/mydata Dataset {3}
Data:
(0) {1, 2}, {3, 4}, {5, 6}
$ h5dump complex.h5
HDF5 "complex.h5" {
GROUP "/" {
DATASET "mydata" {
DATATYPE H5T_COMPOUND {
H5T_IEEE_F64LE "r";
H5T_IEEE_F64LE "i";
}
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): {
1,
2
},
(1): {
3,
4
},
(2): {
5,
6
}
}
}
}
} ADIOS1 and ADIOS2 have native type support for single and double complex. |
@ax3l Also with your example above do you have non h5py code that can read this in. I would be curious to see this (or how to write this using non-h5py code). @ChristopherMayes |
I would propose to just allow complex floating point types as integral types. HDF5, ADIOS1, and ADIOS2 support at least
I am trying to draft this as an example implementation in openPMD-api right now. It's similar to how we (and |
Actually I would say that the purpose of any standard is to improve portability at the expense of creativity. So how about this for the standard: 1) If a file format supports complex that is to be used. 2) If a file format does not support complex but supports structs then store complex as a struct with component labels "r" and "i". 3) If a file format does not support complex nor structs then store complex numbers in two sub-datasets labeled "r" and "i".
Hmmm. As far as I know HDF5 does not support complex numbers. Do you have a reference for this? |
Yes, there ways to express complex numbers in HDF5 which are broadly enough supported to be used (aka Why not to specify fallbacks: it complicates a bit the parsing which makes auto-upgrades harder in the future and slows us down. We do not really have a format right now that does not support complex numbers nor a workflow that is blocked by using a non-scientific data format that would need that. Therefore, better leave fallbacks unstandardized, because we should not standardize things we do not already have concrete, working examples for. KISS principle. |
Actually it does not look like the h5py way is broadly supported. For example see:
So, to be portable, the standard needs to lay out some guidelines on how complex numbers are to be stored. How would you word this in the standard? How about something like: 1) If a file format supports complex that is to be used. 2) For HDF5, be compatible with h5py and store complex as a struct with component labels "r" and "i". 3) For other formats contact the openPMD maintainers. :-). Another thing is I don't see how the fallbacks complicates parsing. Can you provide an example? |
Another issue I thought of is that in the case where, say, the complex component is zero, If you follow the h5py way then you double the storage size needed as compared to using two sub-datasets. Is this penalty really worth it? |
I think generally we can add a note for file formats. Such as "HDF5 bool and complex type representations must be h5py compatible."
I think that's generally the easiest way: for other formats, contact the reference implementation in
If we implement sub-groups then we have to implement this sub-group parsing in all readers and upgrade scripts, which I try to avoid unless we have a strong use case.
I think there are several solutions to that without a pre-mature optimization that complicates the standard. First of all, if a user knows an openPMD record component is purely real (or imaginary), then they can take a real data type and name the component accordingly. Second, every light-weight compression that can be opaquely added on the file format layer (HDF5, ADIOS1/2) will strongly compress zeroed-out components. This falls under pre-mature optimization, I think. |
(Sorry, clicked wrong button.) |
OK but let me write something first since I have a new proposal... |
Upon further reflection I propose this for the standard: Complex data types are to be stored in a group with datasets (or constant record components if the values are constant) labeled "re" for the real part and "im" for the imaginary part. Using "re" and "im" storage is mandated even if the storage format supports native complex numbers. Motivation. The advantages are:
|
As reasoned above, we do not have a file format that does not support complex in our production workflows. Due to the reasons above (maintenance, reference implementations, upgrades, etc.) I would not standardize this until someone has a concrete example workflow where this is needed. You are of course welcome to provided PRs to the reference implementations (viewer, API, validator scripts, upgrader, VisIt, ...) with concrete challenges that cannot be addressed otherwise and we can move the priority up. But I do not see the need right now with the tools and file formats we are using. I think this is a file format task and we intentionally rely on file formats for such things, because it's the scalable and maintainable thing to do. We can have a VC in new year if you like to talk more about this. |
This is not true. Let me quote again from https://mathematica.stackexchange.com/questions/63983/hdf5-and-complex-numbers
In terms of maintenance, reference implementations, upgrades, etc., my proposal is actually superior to using a struct since there is less code to write. |
I agree with the @DavidSagan 's post above for labeling 're' and 'im' datasets. |
Let's have a VC on this in the new year.
Adding our own conventions neither adds compatibility to armadillos nor other tools. Adopting a widely used convention such as h5py's adds native support to many tools and is not more complicated for all others adapters, we just document the used convention for hdf5 in our standard.
|
??? The quote I give above indicates that this is not true. What evidence do you have to the contrary?
What non-h5py based native tools are you thinking of?
Actually it is more complicated if you are using the standard C/C++/Fortran HDF5 interface. Not much more but with the convention I suggested it basically takes two lines to read in complex data. With the h5py convention I have to setup the struct which is definitely more work. Also h5py does not do any compression at all for complex numbers. Even if all the numbers are zero. |
I am suggesting to continue this discussion in a VC after the holidays, with concrete examples and a simple compatibility matrix.
I heard your points, you can already in 1.* implement this for all scalar records without any change and I mentioned a minimally invasive, non-overhead alternative above. Creating a HDF5 compound type is a four-liner, and I need to see concrete workflows how you read data to understand more of your concrete needs after the break.
For example: are you using armadillo or is this just an example out of tons of Lin alg libraries that we pick here. Is armadillo more used than h5py? Our own convention instead of adopting one will add native support to none of those. Also, do you even need complex numbers for non-scalar records? All questions we have to consider.
The compression argument is technically wrong, hdf5 can compress on the byte level of any type. And so does h5py, just activate the flag on variable creation. I am happy to help with that when I am not in an airport and not on holidays.
|
I agree and concrete examples are a good thing to clarify the discussion. No I don't use armadillio. It was just part of the quote discussing compatibility. Right now complex fields are stored in, for example, Bx, By, Bz scalar records. Perhaps this should be changed to B_field/x,y,z? And if you could give me an example of using compression that would help. |
Meeting minutes from Sep 2, 10:30 AM PT
Action items:
Related question: show an HDF5 Fortran compression example for @DavidSagan. This affects write API logic only and will be automatically decompressed on reads, opaque to the user reading the data as usual. This question came up as an example for complex types if one already knows that the imaginary part is zero, e.g. for specific time steps of a simulation. |
@ax3l @ChristopherMayes OK I removed the paragraph in the Beam Physics extension on how to store complex numbers. The extension is now all set to become official. |
HDF5 code example to define a new, h5py-compatible complex hid_t m_H5T_CFLOAT = H5Tcreate(H5T_COMPOUND, sizeof(float) * 2);
if(m_H5T_CFLOAT < 0)
throw std::runtime_error("[HDF5] Internal error: Failed to create complex float");
H5Tinsert(m_H5T_CFLOAT, "r", 0, H5T_NATIVE_FLOAT);
H5Tinsert(m_H5T_CFLOAT, "i", sizeof(float), H5T_NATIVE_FLOAT);
// at this point, m_H5T_CFLOAT can be used like any other pre-defined HDF5 type for reading and writing
// use std::complex< float/double/long double> as C++ types
// testing a type before a read can use H5Tequal:
// - H5Tget_class(read_dataset_type) == H5T_COMPOUND &&
// H5Tget_nmembers(read_dataset_type) == 2 ) &&
// 2x H5Tget_member_name
// ... or simpler:
// - read_dataset_type_id is a hid_t, too and needs to be read first via H5Dget_type
// if( H5Tequal(read_dataset_type, m_H5T_CFLOAT) ) ...
// before closing the HDF5 file, the type has to be freed again to avoid a resource leak
// using the type referenced by m_H5T_CFLOAT after H5Tclose is not allowed
herr_t status = H5Tclose(m_H5T_CFLOAT);
if( status < 0 )
throw std::runtime_error("[HDF5] Internal error: Failed to close complex float type"); h5py example: import h5py as h5
iport numpy as np
data = np.array([4.5 + 1.1j, 6.7 - 2.2j], dtype=np.complex64)
# complex64 is 2*float, complex128 is 2*double, clongdouble is 2*long double
# ... store to h5py as usual
# ... read will be returning a dtype=np.complex<N> numpy array automatically USE hdf5
INTEGER :: hdferr ! HDF5 error flag
INTEGER(HID_T) :: my_h5_complex_real ! Compound datatype identifier for complex REAL
INTEGER(SIZE_T) :: type_size ! Size of the datatype
INTEGER(SIZE_T) :: offset ! Member's offset
LOGICAL :: same_types ! TRUE/FALSE flag to indicate if two datatypes are equal
! ... open the file as usual
CALL h5tcreate_f(H5T_COMPOUND_F, SIZEOF(H5T_NATIVE_REAL)*2, my_h5_complex_real, hdferr)
offset = 0
CALL h5tinsert_f(my_h5_complex_real, "r", offset, H5T_NATIVE_REAL, hdferr)
offset = SIZEOF(H5T_NATIVE_REAL)
CALL h5tinsert_f(my_h5_complex_real, "i", offset, H5T_NATIVE_REAL, hdferr)
! ... use my_h5_complex_real just like pre-defined HDF5 types for write and read, check error codes as usual
! use COMPLEX as data types
! probing prior to a read: testing if a type is of complex real
! - HDF5 type check if this is a H5T_COMPOUND_F type
! - HDF5 compound check for labels "r" and "i"
! This can be simplified by calling h5tequal_f:
! - read_dataset_type_id is an INTEGER(HID_T), too and needs to be read first via h5dget_type_f
! CALL h5tequal_f(read_dataset_type_id, my_h5_complex_real, same_types, hdferr)
! before closing the HDF5 file, the type has to be freed again to avoid a resource leak
! using the type referenced by my_h5_complex_real after H5Tclose is not allowed
CALL h5tclose_f(my_h5_complex_real, hdferr)
! ... close file |
@ax3l Could you expand the above example to include reading and writing? Thanks. |
Offline discussed: yes. @DavidSagan Can you please provide a minimal read/write example that just writes/reads a simple scalar field on which I can extent on? (Sorry for my weak Fortran.) |
@ax3l Examples posted at Slack... |
We made it 🎉 ! This is the F2003 version of the h5_compound.c example source code.
! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
! Copyright by the Board of Trustees of the University of Illinois. *
! All rights reserved. *
! *
! This file is part of HDF5. The full HDF5 copyright notice, including *
! terms governing use, modification, and redistribution, is contained in *
! the files COPYING and Copyright.html. COPYING can be found at the root *
! of the source code distribution tree; Copyright.html can be found at the *
! root level of an installed copy of the electronic HDF5 document set and *
! is linked from the top-level documents page. It can also be found at *
! http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html. If you do not have *
! access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. *
! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
!
! This example shows how to create an array of a compound datatype which
! contains an array of type complex and how to write it to hdf5
! and how to read it back into a compound datatype for hdf5.
!
PROGRAM compound_complex_fortran2003
USE hdf5
USE ISO_C_BINDING
IMPLICIT NONE
INTEGER, PARAMETER :: r_k8 = KIND(0.0d0)
INTEGER, PARAMETER :: NMAX = 3
INTEGER(HID_T) :: sample_type_id, dset_id, dspace_id, file_id
INTEGER(HSIZE_T) :: dims(1) = (/NMAX/)
INTEGER :: error
COMPLEX(KIND=r_k8), DIMENSION(1:NMAX), TARGET :: samples, read_samples
INTEGER :: i
TYPE(C_PTR) :: f_ptr
INTEGER(8) :: real_size, real_complex_size
real_size = storage_size(1_r_k8, r_k8) / 8
real_complex_size = real_size * 2_8 ! a complex is (real,real)
! Initialize data
DO i=1,NMAX
samples(1:NMAX) = (3.14159_r_k8, 2.71828_r_k8)
END DO
! Initialize FORTRAN interface.
CALL h5open_f(error)
! Create a new file using default properties.
CALL h5fcreate_f("test.h5", H5F_ACC_TRUNC_F, file_id, error)
!
! Create the memory data type.
!
CALL H5Tcreate_f(H5T_COMPOUND_F, real_complex_size, sample_type_id, error)
! Create the array type
! Then use that array type to insert values into
CALL H5Tinsert_f( sample_type_id, "r", &
0_8, h5kind_to_type(r_k8,H5_REAL_KIND), error)
CALL H5Tinsert_f( sample_type_id, "i", &
real_size, h5kind_to_type(r_k8,H5_REAL_KIND), error)
!
! Create dataspace
!
CALL h5screate_simple_f(1, dims, dspace_id, error)
!
! Create the dataset.
!
CALL H5Dcreate_f(file_id, "samples", sample_type_id, dspace_id, dset_id, error)
!
! Write data to the dataset
!
f_ptr = C_LOC(samples(1))
CALL H5Dwrite_f(dset_id, sample_type_id, f_ptr, error)
! Close up
CALL h5dclose_f(dset_id, error)
CALL h5sclose_f(dspace_id, error)
CALL H5Tclose_f(sample_type_id, error)
CALL h5fclose_f(file_id, error)
!
! Open the file and the dataset.
!
CALL H5Fopen_f("test.h5", H5F_ACC_RDONLY_F, file_id, error)
CALL H5Dopen_f(file_id, "samples", dset_id, error)
!
! Create the memory data type.
!
CALL H5Tcreate_f(H5T_COMPOUND_F, 16_8, sample_type_id,error)
CALL H5Tinsert_f( sample_type_id, "r", &
0_8, h5kind_to_type(r_k8,H5_REAL_KIND), error)
CALL H5Tinsert_f( sample_type_id, "i", &
real_size, h5kind_to_type(r_k8,H5_REAL_KIND), error)
f_ptr = C_LOC(read_samples(1))
CALL H5Dread_f(dset_id, sample_type_id, f_ptr, error)
!
! Display the fields
!
DO i=1,NMAX
WRITE(*,'(A,3(" (",F8.5,",",F8.5,")"))') "SAMPLES =",read_samples(1:NMAX)
END DO
CALL H5Tclose_f(sample_type_id, error)
CALL H5Dclose_f(dset_id, error)
CALL H5Fclose_f(file_id, error)
END PROGRAM compound_complex_fortran2003 Output analysis: $ h5ls -r -v test.h5
Opened "test.h5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/samples Dataset {3/3}
Location: 1:800
Links: 1
Storage: 48 logical bytes, 48 allocated bytes, 100.00% utilization
Type: struct {
"r" +0 native double
"i" +8 native double
} 16 bytes
$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
DATASET "samples" {
DATATYPE H5T_COMPOUND {
H5T_IEEE_F64LE "r";
H5T_IEEE_F64LE "i";
}
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): {
3.14159,
2.71828
},
(1): {
3.14159,
2.71828
},
(2): {
3.14159,
2.71828
}
}
}
}
} And with import h5py as h5
f = h5.File("test.h5", "r")
f['samples'].dtype
# dtype('complex128')
f['samples'][()]
# array([3.14159+2.71828j, 3.14159+2.71828j, 3.14159+2.71828j]) |
I have now read all your suggestions. I have battled this myself and really find a complex data-type useful and required. Some comments to @DavidSagan comment:
The above comments does not mention whether they are stored in as [(re, im), (re, im)] in a compound type, or as separate variables. This is important for performance critical read/writes. The following comments suggests a compound type which I find the best option since the data is consecutive. For all C/C++/Fortran one may optionally cast to the corresponding real variable (of twice the size) to accomplish this. Then attributes/compound information should trivially handle the complex specification. I would be really positive if this found its way into the standard. |
It might be useful to allow touples of two numbers to be defined as integral type for complex components of records.
alternatively, one would need a consistent naming convention for the components.
numpy complex types - how does h5py (FAQ: HDF5 struct) and the official HDF5 standard handle these?
The text was updated successfully, but these errors were encountered: