-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frombuffer byte order #306
Comments
I am not sure, I see the problem here. First, this is the
Second, the
Do you mean you tried Would it help you, if we implemented Finally, if you find out how this is done in |
@ubIQio I think the cleanest solution is the
This would still break strict |
Thanks for the quick response, and also for creating a landmark piece of open source software for the microcontroller world. The Numpy docs on byteswapping give an overview of the problem. I am in the case 'Data and dtype endianness don’t match, change dtype to match data'. So in Numpy I would just use dtype='>H' to give the correct endianness, which does not change the array in memory. This numpy code shows the problem and solution: buffer = b'\x40\x00\xda\xff\xd6\xc1\x2e\x0d\x40\x00\xde\xff\xca\xc1\x3c\x0d\x40\x00\xe6\xff\xc4\xc1\x40\x0d\x40\x00\xe0\xff\xb8\xc1\x32\xc1\xff'
print('buffer: ', buffer)
a = np.frombuffer(buffer, dtype=np.uint16, offset=1)
print('\nCurrent:\t', a)
b = np.frombuffer(buffer, dtype='>H', offset=1)
print('\nDesired:\t', b)
I can then perform my FFT before passing the result over the wire The byteswap method is not preferred as this will change the data in memory, thus is slower and more memory intensive, which is to be avoided as these frames are arriving frequently, and as you point out is not numpy compliant. I can imagine more sophisticated dtype is not an easy task to implement, but for reading streams from hardware, a key use case of the frombuffer method, it would be extremely useful. However, supporting either method would be greatly appreciated. |
If you mean that we have to operate on the RAM, then the statement is correct, but it doesn't imply that we would have to use a lot of RAM: you can swap the two bytes in-place. Also, while
I agree, and I also see that this is going to be a recurring issue. The only question is, how we should pull this off. Here are the implementation difficulties:
Let me know your thoughts! |
Ignoring the problem of possibly moving an element past the end of the buffer, I think "offset=1" is wrong but looks plausible much of the time. Plausible: >>> buf = struct.pack("<4h", 1020, 1021, 1020, 1021)
>>> struct.unpack(">3h", buf[1:-1])
(1021, 1020, 1021) clearly incorrect: >>> buf = struct.pack("<4h", 1020, 1030, 1020, 1030)
>>> struct.unpack(">3h", buf[1:-1])
(774, 1276, 774) If the underlying buffer is immutable according to python (e.g., item assignment would raise an exception) then I doubt ulab shold go modifying it. Adding 4 more data types seems like it's not good for firmware size. I think either ulab needs to take a copy before modifying the byte order, or else the original process producing the buffer should be modified so that it can provide a mutable buffer as an output (or swap the bytes itself!) |
Hi Zoltán,
Indeed; I'm far from being a compiler expert but since at least Tensilica Xtensa ISA and CortexM0 ISA have byte swaps, and that the ndarray may be in PSRAM, then in the general case, I would think it would be more efficient to perform the swap in registers than main memory, ie on read; however, as you say
Presumably would affect execution speed only for the degenerate case when byte swapping is required, since dtype lookup of some sort always performed?
Agreed, in this use case the byteswap is the likely the best way to go. You would know much better than me how frequently this is done. So in the end, I guess both options (as provided by numpy) will be required ; and so a project grows... I'd be happy with an extra byteswap keyword argument just for FFT, but that is certainly not numpy compliant... Maybe if a 'swap dtype' was allowed as a type, since the dtype is known to functions, then gradually functions could be enabled to support it, starting with FFT of course :) |
@jepler yes the offset is a bit of a hack in order to align the bytes of interest from the hardware, not part of the byte swap issue. (I pad the capture buffer by 1 at the end to allow for this, so a 8192 byte buffer of 1024*8 bytes is passed to frombuffer as a 8193 byte buffer. ie
## create fake sample buffer
n=1024*8 # must be multiple of 8
buf = np.array(range(n+1), dtype=np.uint8) # create buffer with pad byte
a= np.frombuffer(buf, dtype='>H', offset=1).reshape(n>>3,4)[:,0:3] results in buf: [ 0 1 2 ... 254 255 0]
a:[[ 258 772 1286]
[ 2314 2828 3342]
[ 4370 4884 5398]
...
[59882 60396 60910]
[61938 62452 62966]
[63994 64508 65022]]
|
The problem is not necessarily where the swapping happens, but that you still have to treat both cases. As Jeff pointed out above, what you are, in effect, asking for is 3 extra data types, and that is expensive. To put this in perspective, the firmware size for many of the functions, and all binary operators scales with the square of the number of data types. Currently we have 5, which means 25 possible combinations. In certain cases, you can take shortcuts, so e.g., the associative operators can be condensed into 13 cases, but the others can't. Now, if you add 3 more data types, then you end up with 64 combinations. That is really too many. And on top of that, you could then ask, why the complex
That is right, but you still have to implement and compile the code for the degenerate case.
The main reason for developing RAM is another concern. I can assure you that, when implementing a new function, I evaluate many possible implementations in terms of speed, RAM, and firmware size. It would be really foolish to throw everything into the wind in a haphazard manner. (As a side note: for me, this is what makes the whole project so exciting. The various constraints, and how you can negotiate your way around them.)
No, it is not.
Partial implementations are a nightmare. Just think about the documentation effort, and the number of cases, when it would trip people. I think I wouldn't like to go down that path. |
I think this is a
It is definitely a no-go.
What we could do is add the reversed byte order only in @ubIQio I think this is the solution. Let's keep the issue open. I will set out to work out the implementation details. It shouldn't take too long. I hope we can wrap this up in a week, or less. I really appreciate that you brought this up. @CallumJHays Cal, it seems that we should add keyword arguments to the |
Sounds good, it will be interesting to see what the time penalty will be for the byte swapped buffer copy. It is a shame that many ADCs and worst of all I2S use bigendian data, as the data is arriving fast (for a microcontroller) and is in the wrong format and has a real-time requirement. Of course there are those who would say, if real time is a concern, why are you using Python anyway? Many thanks again for your quick response. |
It is next to impossible to predict. Results from https://github.com/thiagofe/ulab_samples scatter a great deal, even if you normalise by the clock rate. |
That would be awesome! |
@ubIQio Could you, please, check out #317, and see if does, what it is supposed to, and if the performance is acceptable. You can now do something like from ulab import numpy as np
a = np.array([1, 2, 3, 4, 5, 6], dtype=np.uint16)
b = a.byteswap(inplace=False)
print(a)
print(b)
c = a.byteswap(inplace=True)
print(a)
print(c) I haven't yet had time to write up the docs. Do you think you could whip up a small test suite? If you do that, then request a pull against the byteswap branch! Thanks! |
Truly awesome. fyi on my platform (ESP32 w. SPIRAM) a raw 8193byte frombuffer takes 1.1ms, and with inplace byteswap it's only 1.9ms. (I was surprised that the inplace ran a lot faster than the copy which is about 4.5ms) You may be interested to know that the entire graph s=spy.signal.spectrogram(np.linalg.norm(np.frombuffer(buf, dtype=np.uint16, offset=1).byteswap(inplace=True).reshape((1024,4))[:,0:3], axis=1)).tobytes() takes 9ms, which is absolutely amazing. Thanks v.much Zoltán, I'll go generate a test suite for that. |
Well, that is the overhead of allocating that much extra memory. However,
I am glad it worked out for you, and especially that I could prove you wrong 😉 You were concerned that we might consume too much time shifting the bytes to and fro. Since
Thanks! I'll delete the |
#325 completes the implementation. |
I like to think that it was my hunch that was wrong, not me 😉 The buffer is mutable in this case; it is created once and repeatedly fill in FIFO sized chunks by I2C.readfrom_mem_into using memoryview, until a complete frame is gathered (it's actually an accelerometer, used as a vibration sensor) On the speed front, the 'implementation glitch' fix has made a dramatic improvement on my platform: the raw frombuffer is now ~120us, the inplace byteswap ~750us and my graph is now down to ~6.8ms. Again, a big thank you Zoltan. |
I have a buffer of data from an ADC, which i am trying to access as an ndarray
The 1024 byte buffer is organised in 'frames' of 8 bytes, the bytes of interest in each frame being 1 through 6, being three pairs of 16 bit samples.
If I ignore the last byte of the buffer, I can get the buffer as an ndarray uising
but the result is byteswapped values e.g
dtype = np.>uint16 causes a reboot, any ideas how this can be achieved?
The text was updated successfully, but these errors were encountered: