Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet formats from the LWA352 OVRO bifrost branch #206

Merged
merged 112 commits into from
Apr 19, 2024

Conversation

jack-h
Copy link
Contributor

@jack-h jack-h commented Jun 1, 2023

LWA352 OVRO bifrost branch, including SNAP2 F-Engine packet format

jack-h and others added 30 commits June 11, 2020 03:32
This allows the DP4A library to be used, which is way faster
I.e., call xGPU passing pointers to already transferred
data on the device. This gives up xGPU's pipelining abilities,
but makes it easier to use the xGPU kernel alongside other consumers
also using the same GPU input buffer.
Add some checking for proper pointer spaces. More checking required
Inspired by the CUBLAS usage in https://github.com/devincody/DSAbeamformer

Operates in 3 steps --
1. Tranpose data and promote to float
2. Compute beams
3. Compute beam dynamic spectra, and sum to (in LWA352's case) 1ms

Assumes no polarization ordering of input, but relies on user to
upload beamforming coefficients which create X-pol and Y-pol beams.
This is an easy way to deal with the arbitrary input ordering at runtime,
but isn't very efficient (half the beamforming coeffs are zero).
The kernel assumes the beams are constructed like this and uses the
fact to generate averaged dynamic spectra (XX, YY, XY_r, XY_i).

May well have synchronization bugs which make the benchmarks meaningless,
but currently obtains ~50 Gbps throughput (~9MHz bandwidth for 4-bit inputs) with

NANTS = 352
NPOLS = 2
NCHANS = 192 (4.4 MHz for LWA352)
NBEAMS = 32 (16 x 2-pols)
NTIMES = 480
NTIMES_SUM = 24 (1ms)
Move IB verbs receiving class to a dedicated C file. When
using the hashpipe_ibverbs library within packet_capture.hpp
directly something in that file messes up the compatibility
of the ibverbs structs (their sizes are different) to those
interpretted by hashpipe. Odd, but working around for now.
Evidently, this is the trigger to make `like_bmon` work
its magic
Increase RX packet depth of IB verbs interface to 32k (this
seems to be the maximum).

Make packet handler use AVX stream store instructions.

1. The receiver is currently hard coded for 64 pols per packet.
   It would be trivial to parameterize this, but it may have some
   small performance implication.
2. Code loads 64-bit values into a 256-bit AVX register before writing
   to memory. If the IBV interface can be tweaked to enforce alignment
   (talk to DM about this) the first stage won't be necessary.
3. 64 pols per packet = 512 bits per memory write (1 freq channel of data).
   Newer machines supporting AVX512 could probably run faster than the current
   code by using _mm512_stream_si512 in place of _mm256_stream_si256
The behaviour of the traceback library has changed in py3, so remove
the now nonexistent call.

Tweak error handling to properly pass an exception to the cleanup print

Fix missing decode()
Time metrics for processing / waiting for input/output
data are helpful for figuringout the bottlenecks in the
pipeline, but aren't particularly intuitive (IMO)
measures of whether things are "fast enough"
This is a gross thing to hardcode, so FIXME.
But, having new sequences periodically means that the header
timestamps can be used as actual timestamps, rather
than just counting bytes in some infinite data stream (which
doesn't seem like a good idea when the input stream is from a
network, and could conceivably behave strangely).

Having timestamps derived from actual packet headers periodically
seems sensible(?)
Allow an option to beamform and integrate in one hit by
passing ntime_blocks>0 when initializing the library. Otherwise
don't transpose or integrate the data.

This change allows multiple downstream processes to use raw beamformer
data for their own, different purposes -- (eg) one generating integrated
dynamic spectra, and one generating VLBI voltage beams
Reaches 27Gbps on LWA352 pipeline
Replace JH's libhashpipeibverbs IBV capture code with JD's dedicated bifrost
source. Remove the philosophical quirk of having bifrost depend on hashpipe.
Sequence only changes if out-of-order packets indicate
the upstream transmitters have reset
@jaycedowell jaycedowell changed the title Lwa352 Packet formats from the LWA352 OVRO bifrost branch Jun 1, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2023

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 67.77%. Comparing base (96bc19a) to head (e32a32b).
Report is 1 commits behind head on ibverb-support.

Files Patch % Lines
python/bifrost/packet_capture.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           ibverb-support     #206      +/-   ##
==================================================
- Coverage           67.79%   67.77%   -0.02%     
==================================================
  Files                  66       66              
  Lines                5744     5747       +3     
==================================================
+ Hits                 3894     3895       +1     
- Misses               1850     1852       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaycedowell jaycedowell merged commit 9d18f89 into ledatelescope:ibverb-support Apr 19, 2024
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants