-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packet formats from the LWA352 OVRO bifrost branch #206
Merged
jaycedowell
merged 112 commits into
ledatelescope:ibverb-support
from
realtimeradio:lwa352
Apr 19, 2024
Merged
Packet formats from the LWA352 OVRO bifrost branch #206
jaycedowell
merged 112 commits into
ledatelescope:ibverb-support
from
realtimeradio:lwa352
Apr 19, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This allows the DP4A library to be used, which is way faster
I.e., call xGPU passing pointers to already transferred data on the device. This gives up xGPU's pipelining abilities, but makes it easier to use the xGPU kernel alongside other consumers also using the same GPU input buffer.
Add some checking for proper pointer spaces. More checking required
Inspired by the CUBLAS usage in https://github.com/devincody/DSAbeamformer Operates in 3 steps -- 1. Tranpose data and promote to float 2. Compute beams 3. Compute beam dynamic spectra, and sum to (in LWA352's case) 1ms Assumes no polarization ordering of input, but relies on user to upload beamforming coefficients which create X-pol and Y-pol beams. This is an easy way to deal with the arbitrary input ordering at runtime, but isn't very efficient (half the beamforming coeffs are zero). The kernel assumes the beams are constructed like this and uses the fact to generate averaged dynamic spectra (XX, YY, XY_r, XY_i). May well have synchronization bugs which make the benchmarks meaningless, but currently obtains ~50 Gbps throughput (~9MHz bandwidth for 4-bit inputs) with NANTS = 352 NPOLS = 2 NCHANS = 192 (4.4 MHz for LWA352) NBEAMS = 32 (16 x 2-pols) NTIMES = 480 NTIMES_SUM = 24 (1ms)
Move IB verbs receiving class to a dedicated C file. When using the hashpipe_ibverbs library within packet_capture.hpp directly something in that file messes up the compatibility of the ibverbs structs (their sizes are different) to those interpretted by hashpipe. Odd, but working around for now.
Evidently, this is the trigger to make `like_bmon` work its magic
Increase RX packet depth of IB verbs interface to 32k (this seems to be the maximum). Make packet handler use AVX stream store instructions. 1. The receiver is currently hard coded for 64 pols per packet. It would be trivial to parameterize this, but it may have some small performance implication. 2. Code loads 64-bit values into a 256-bit AVX register before writing to memory. If the IBV interface can be tweaked to enforce alignment (talk to DM about this) the first stage won't be necessary. 3. 64 pols per packet = 512 bits per memory write (1 freq channel of data). Newer machines supporting AVX512 could probably run faster than the current code by using _mm512_stream_si512 in place of _mm256_stream_si256
The behaviour of the traceback library has changed in py3, so remove the now nonexistent call. Tweak error handling to properly pass an exception to the cleanup print Fix missing decode()
Time metrics for processing / waiting for input/output data are helpful for figuringout the bottlenecks in the pipeline, but aren't particularly intuitive (IMO) measures of whether things are "fast enough"
This is a gross thing to hardcode, so FIXME. But, having new sequences periodically means that the header timestamps can be used as actual timestamps, rather than just counting bytes in some infinite data stream (which doesn't seem like a good idea when the input stream is from a network, and could conceivably behave strangely). Having timestamps derived from actual packet headers periodically seems sensible(?)
Lwa352 ibv
Allow an option to beamform and integrate in one hit by passing ntime_blocks>0 when initializing the library. Otherwise don't transpose or integrate the data. This change allows multiple downstream processes to use raw beamformer data for their own, different purposes -- (eg) one generating integrated dynamic spectra, and one generating VLBI voltage beams
Reaches 27Gbps on LWA352 pipeline
Replace JH's libhashpipeibverbs IBV capture code with JD's dedicated bifrost source. Remove the philosophical quirk of having bifrost depend on hashpipe.
Sequence only changes if out-of-order packets indicate the upstream transmitters have reset
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## ibverb-support #206 +/- ##
==================================================
- Coverage 67.79% 67.77% -0.02%
==================================================
Files 66 66
Lines 5744 5747 +3
==================================================
+ Hits 3894 3895 +1
- Misses 1850 1852 +2 ☔ View full report in Codecov by Sentry. |
…nerated vs received.
Fix off-by-one error in the power beam header
jaycedowell
merged commit Apr 19, 2024
9d18f89
into
ledatelescope:ibverb-support
12 of 14 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
LWA352 OVRO bifrost branch, including SNAP2 F-Engine packet format