Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed waveforms bank workflow #4969

Merged

Conversation

GarethCabournDavies
Copy link
Contributor

In order to make the generation of the bank with compressed waveforms much easier, and making this more straightforward for future developers/users, I've put this into a workflow.

I've also made changes which mean that when running pycbc_compress_bank, we can exclude certain waveforms; e.g. SPAtmplt, and these will be ignored when making the compressed bank.

When fetching the compressed bank, the fact that the template hash key isn't present will then mean that the waveform is just generated as normal.

Standard information about the request

This is a new feature
This change affects the offline search
This change changes nothing for scientific output, just makes things easier
This change: has appropriate unit tests, follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines

This change will require additional dependencies for the CI: sbank
sbank will also need updated from its curent form as it needs a tweak i made to the hdf5 bank combiner there (I will link PR when I have made it)

Contents

I'll split this into subsections:

Minor changes to compression

  • I've put in some changes to how the compression is done/used.
    • Dont make a compressed waveform group for certain waveforms according to a new command line option
    • Make the waveform as normal if the compressed waveform group isn't in the bank

Making a workflow to generated the compressed waveform bank(s)

So that we can parallelise nicely, I've put this into a workflow which:

  • Splits the bank
  • Adds compressed waveforms to each split bank
  • Recombines these banks into any number of banks
    • Note that the offline search only uses compressed waveforms for the inspiral jobs, i.e. where the bank is split anyway.
  • Plot the waveform compression in histograms / scatter points and add to a results page

The plots are similar to:

image

the number of / types of these plots can be chosen in the config file

Links to any issues or associated PRs

sbank PR:

Testing performed

The workflow was run, and the results page can be seen at https://ldas-jobs.ligo.caltech.edu/~gareth.cabourndavies/testoutput/compress_bank_workflow/testing/compress_bank_results_page_new/

I will add more tests (e.g. profiling) soon

  • The author of this pull request confirms they will adhere to the code of conduct

@@ -50,5 +50,8 @@ pip install -r companion.txt
echo -e ">> [`date`] installing mpi4py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this script anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm - that might be why Im having trouble getting the CI to work! I'll have a look once i am back from holiday

@GarethCabournDavies
Copy link
Contributor Author

Note this needs gwastro/sbank#64

I'm not sure whether to temporarily point to my branch, or to wait for a release

@GarethCabournDavies GarethCabournDavies force-pushed the compress_bank_workflow branch 2 times, most recently from a725f05 to 867d6ff Compare December 16, 2024 16:37
Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change requested.


try :
if not (self.has_compressed_waveforms and self.enable_compressed_waveforms):
raise ValueError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that we could have a situation where get_decompressed_waveform isn't working, but we don't know because we're always falling back on the slower method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a UserWarning so that if this happens, at least there is evidence, and the message of which includes the approximant as well, so that it is immediately obvious if something is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that as the message changes, it will not be supressed for different approximants when repeated, see https://docs.python.org/3/library/warnings.html#repeated-warning-suppression-criteria)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we should distinguish between self.has_compressed_waveforms being false, where we should silently fall back, and a call to self.get_decompressed_waveform( failing, which would indicate a genuine problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not clear why self.get_decompressed_waveform failures would not be ValueErrors???)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. I've moved to a combined if/else with try/except which doesn't rely on the get_decompressed_waveform function not raising a ValueError

@@ -1122,12 +1122,13 @@ def __init__(self, ifos, exe_name, segs, file_url=None,
exe_name: string
A short description of the executable description, tagging
only the program that ran this job.
segs : igwn_segments.segment or igwn_segments.segmentlist
segs : igwn_segments.segment or igwn_segments.segmentlist or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is rejected (or at least musn't be bundled in with this patch). Following other workflows, we provide a "dummy" segment if there is not a logical one to give, but a segment should still be given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been reverted - it means that the files in the workflow are all named "...-0-0.ext" or similar, but that is a minor thing

Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change requested.

@GarethCabournDavies GarethCabournDavies changed the title Draft: Compressed waveforms bank workflow Compressed waveforms bank workflow Jan 17, 2025
Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused about the fallback logic around calling the compressed waveforms, but will approve at this stage anyway. I don't need to see this again before it's merged if you choose to make any further changes to that logic.

@GarethCabournDavies
Copy link
Contributor Author

CI failure looks to be a Segfault in the 3-IFO coinc jobs - I will double-check that the inspiral output is the same before and after this change before I merge

@GarethCabournDavies GarethCabournDavies merged commit 91db678 into gwastro:master Jan 22, 2025
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants