-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for an audio transport sequence #1
Comments
To answer your question from the Mintty thread, the format I'm talking about has already been registered - that's why I was suggesting it. I just posted some links the docs that I found. |
Thx for the update - yeah we posted somewhat interleaved. Will have a look at that. |
@j4james Small update on things - the VT525 is still powering up 😸. So far no capacitor exploded (fingers crossed, dont want to get into soldering). But I need to get some usb to serial plug/cable first, to really work with it. |
Another interim update: Bought an FTDI usb-serial adapter, thats supposed to work with a vt420 following several resources on the internet. It should work with another MMJ adapter at My best hope currently is to get the MMJ way working, prolly soldering the adapter myself. (The equipment is again buried at home somewhere, will not get there again before mid November.) On the sequence side of things I can test at least the local echo mode of the vt525 with very weird results. DECPS plays once a short beep at different frequencies following |
Thanks for the update. I'm afraid hardware was never my strong point, and it's been years since I've messed with cabling, so I won't be of much help to you. But @hackerb9 has made a bunch of notes on the cabling he is using for his VT340 which might be useful (see here). And if you get stuck, he may be able to advise you.
Yeah, I wouldn't want to risk that if I were you.
That's a bit disappointing. Testing with local echo was going to be my backup plan if I managed to get hold of a VT525 and couldn't get the serial connection working. It would be really annoying if that's as a good as it gets, but hopefully you'll be able to get a proper connection setup eventually. Take your time though - I don't want you to blow anything up! 😉
Yeah, I'm not particularly concerned about that either. |
Thx, this looks like a helpful resource, esp. regarding the flow control options. I checked the setup screens again and the vt525 should support all flow control options - XON/XOFF (software) and DTR/DSR (vt400+ line) and even RTS/CTS ("modern" serial devices). Since it already supports RTS/CTS, I hope to get away with null modem crossing at the DB-25 port, so soldering the MMJ adapter is now my second best option. Update on sequence handling (local echo):
After resetting to factory defaults it now sometimes allows me to input multiple DECPS, but have not yet figured out under which circumstances. For some reason my best success was with 32 in duration, while all other values there worked only once (again I hear no duration difference on the beep). But still the 32 did not work reliable, it would also block beeps randomly. Imho the local echo mode is not helpful to get DECPS explained. |
@hackerb9 Can test your snippet once I got the null modem cable in place. The vt525 supports one more setting from the options - some sort of modem emulation. But I have not figured yet, how thats supposed to work. My inital guess was, that it would make the device operate in modem mode, thus I could skip the null modem crossing. But I was not able to get it working with that, prolly due to handshake issues (I have no oscilloscope to check 😞, also that modem mode is not described in detail anywhere). |
If you've got GNU/Linux, just run However, I'm guessing "modem emulation" has nothing to do with a null modem. The VT340 text programming manual has a detailed section on "Modem Control Modes" in the Communication chapter. (Pages 275–280). There are three options: disabled, VT220, and V.25-bis. Any of them will work with a properly wired cable and and the right settings on the host side, but the easiest thing to do would be to set it to "disabled" so the terminal doesn't require hardware handshaking before allowing communication. If it is what I think it is, I do not see any advantage to enabling modem control unless you are actually using the terminal with a modem. |
In case you haven't seen it, chapter 9 of the VT520/VT525 programmers manual (EK-VT520-RM) also has a bunch of technical information on the communications ports. |
Oops, thx for the pointers in the manuals, yeah I did not see those chapters (was under the impression they would deal with connectivity in those early setup chapters 😊). Edit: @hackerb9 Yes youre right, the modem thingy seems to be that modem control to drive it behind a "modem layer", not a null modem itself. |
Short update on things - after 3 weeks of waiting I got the nullmodem adapter, still waiting on the MMJ adapter @hackerb9 pointed me to above (prolly stuck in a container somewhere around the world). With the nullmodem adapter the connection works under these conditions:
Still have to mess with all connection details, and prolly will do some writeup of it. Some first impressions on DECPS:
For DECPS I will try to record sound + screen update clips from test cases, so everyone can check for timings, frequencies or wave forms. @j4james I was able to fix your "happy birthday" snippet by extending the multiple note sequences to single ones. Seems thats the only supported sequence form. Edit: |
I suppose this is understandable, considering they only mentioned the multiple note support in the setup, and not in any of the other places that the
But this doesn't make any sense then. Why would they talk about the sound buffer storing 16 notes, if it blocks after 1? I could understand blocking after 1 And when you say it block at every single sequence, do that mean you won't see any output following the
Do you see Another thought that occurred to me - is it possible it's sending an XOFF which causes the client side to pause its output, but if you ignored the flow control you could potentially send more output while the note was still playing? I'm probably grasping at straws, but I have to admit I find this behavior a bit disappointing. Not being able to play anything in the background at all kind of limits the usefulness of this sequence. |
Yes this seem not in line with the docs, or we read something completely different into that. It also possible that different flow control modes have different side effects here (note I only got XON/XOFF working atm). Your example does this:
In general I would not be surprised if software flow control behaves much different than hardware flow control.
I hear you. It still would be possible to do that, but would be very limited in "background play" - you basically need to split screen updates in very small chunks perfectly timed at the note changes. Thats really nasty, and also does not allow bigger updates, as big transmissions would create perceivable sound delays. Still hoping that changing the flow control settings will reveal the audio buffer behavior as described in the docs. Will do more tests once I got the connection / env details sorted out. |
@j4james Took a closer look on your ECMA35 based suggestion for audio data transport and the corresponding audio ISO spec. Several notes from my sides (without final judgement yet, whether thats good or bad): Overall the situation regarding possible failing devices/software with that ECMA35 encoding might not be that bad. So I would not exclude the idea right from the beginning. Whether we can gain bandwidth with it (compared to a fully inlined base64 sequence), mainly depends on the translation mode we need to go with (note that the more complex translations replace certain bytes with 2 7bit bytes, which leads to 1.5x bandwidth for full 7bit translation, where base64 is only at 1.33x). Correction: Looking throught the modes more in detail, imho these 3 are worth a closer look:
Currently I'd favor mode 5, as I think that this is the most straight forward one to implement on TE parser side, while not penalizing bandwidth much. The additional translation needs are fully within the data, thus TEs not implementing this are not affected at all (only needs DOCS start/end recognition). If it turns out we have to go with 7 bit only for some reason, I would ditch the ECMA35 approach in favor of an DCS/OSC sequence with base64 payload. |
I haven't looked at this stuff very closely, but my impression was that the data was self terminating, i.e. the packet headers have length indicators, and there is some concept of a "last block" which I figured would let you know once you'd reached the end of the stream.
Hard to say until I've actually tried it, but I wouldn't have thought the C0 handling was the problem. Once the mode is activated I'd just redirect all the data to a separate parser until terminated (I imagine something like Tektronix mode would work the same way). The tricky part for us is the UTF-8 parsing which is handled at a higher level. The way we currently deal with that (when switching to an 8-bit ISO-2022 encoding) is by resetting the code page to ISO-1252, essentially passing the data through as-is. However, that does require an app to flush the output immediately after switching modes to ensure that you don't have data with different encodings arriving in the same packet. If that became a problem, it's probably fixable, but hasn't been a priority for us so far. I believe XTerm may have a similar limitation.
This wouldn't be my first choice, but it's up to you. If you do go this route, though, I'd recommend sixel over base64. It's maybe not that big a deal, but I think base64 introduces an unnecessary level of complexity when you aren't actually constrained by the limitations of 7-bit email. |
Back on the previous topic of
But I don't see how that would work. Based on the test case you ran for me, my understanding is that nothing else is going to happen while a note is playing. So it doesn't matter how small your screen updates are, everything is going to come to a standstill as soon as you play a note. If you could play at least one note in the background, while still continuing to update the screen, that would be fine, but that doesn't seem to be the case. Or have I misunderstood what's happening?
Thinking about this some more, I'm inclined to leave in support for multiple notes even if the VT525 didn't actually support that. It is at least part of the official documentation, it's not likely to break backwards compatibility with apps that are limiting themselves to one note at a time, and there are already a number of modern terminals supporting multiple notes. |
Yes it is defined that way, but self termination does not work with TEs, that dont implement the details? They would not "understand" the termination from the data within?
Yeah the "transport layer-like" utf-8 handling will certainly cause issues. Isnt xterm fully utf-8 only internally since several years (would always need that
Ah well, the alphabet to use is quite the least concern for me there. I think sixel's alphabet would be easier to translate due to its continuous character space? About DECPS:
Nope, thats exactly what happens. What I tried to illustrate was - you can still get notes coupled to screen updates through, if you clever split that. But yes, both - sound and screen will have pauses from the other taking its time. No real background playing possible. At least with XON/XOFF flow control (others yet to be tested).
Yes, was acually thinking the same. In the end it depends on how closely you want to emulate a certain device. |
No, but regardless of the format, I wouldn't want to send a large chunk of audio data to a TE without first confirming that it could actually handle that. Even with a With That said, I don't mean to pressure you to go this route this if you don't like idea. It was just something I thought worth investigating before inventing your own thing.
No. You can get 8-bit ISO-2022 support in XTerm either using the
Yeah, that's what I was getting at. It doesn't need a dictionary lookup, and you also don't need to worry about all that
Certainly not that strict by default, otherwise I'd also be disabling all the modern |
Yeah the TE choking issue will hunt us in every case without prior testing for support. I hate this situation, I'd love to use more DCS for newer stuff (as I see DECs DCS variant as more capable than OSC), but support is so lousy across the board, geez. Software is, other than real devices, easy to be fixed, still the field does not move much. Idc much, if the kernel consoles dont make any ground beyond vt100 emulation, but isn't basic sequence type support is somewhat mandatory after being specified for >30ys?
Well to make it blunt - in my world the ECMA35 encoding juggling is almost dead (mainly due to utf8 forcing all into higher level "transport encoding"), and unicode being capable to replace those encodings from its bigger codepoint space. With pulling a binary payload out of the hat we lose that unicode "symmetry". Furthermore sequence payload types (like OSC/DCS/APC) offer enough functionality with proper encapsulation (given the parser implements at least recognition to skip them). So thats where my sentiments against reviving DOCS specs come from. (Not even sure, if reviving is the right term here - do you know any applications that used this audio spec?) Mind you - when I saw Annex F I had to laugh, it even mentions JPEG. Did only a quick scan over it - it does not mention sixels anywhere, does it?
Ah ok - thought, it gets internally mapped to unicode chars. (time to check source, just have read this somewhere in the past)
In SIMD code thats prolly only 3 instructions (63-126 range check + subtract 63 + a shuffle extraction). While standard base64 is even in SIMD quite expensive. |
Well that ITU audio spec is at least 25 years old, but I doubt you'd argue that's a reason for it to be mandatory. TE's have a right to choose what protocols they want to implement, and if they've chosen to emulate a VT100, then it's hardly surprising that they don't support DCS when the VT100 never did either. But the problem isn't just lack of DCS support. You've also got to worry about terminals that do support DCS, but put arbitrary limits on it. For example, Kitty will stop processing the sequence after a certain length, and it doesn't just ignore anything over that limit - it dumps the rest of the content out to the screen. At least that was the case for the last version I tested. Bottom line: don't expect a newly invented protocol to just work without first querying the terminal to see if it's actually supported.
No. This would have been long after the days of sixel images. |
Ofc not. The difference is, that ECMA-48 is something as a fundamental standard existing since mid 70s, where the sequence types are specified, while the audio spec is totally optional and much younger.
Thats the sad part - ofc they are free to do as they please, but thats one of the reasons why moving the terminal interface forward is almost a futile subject. With that thinking we are basically stalled in the early 80s forever. Also it is not the full truth, since things like unicode or basic color support also made its hacky way into the slowpokes.
Then it operates outside of ECMA-48, as far as I can tell. There is no length limit specified anywhere, so not covered by the specs. |
I think you're misinterpreting the purpose of ECMA-48. It was never intended to dictate what controls a device should support - it was about standardizing the escape sequences to use for functionality that you may or may not choose to implement. And I don't think there's anything in the standard that requires a device to hide the content of a DCS sequence. And it's not like a terminal emulator can "emulate" ECMA-48 anyway - that's just not a thing. They're typically going to be emulating one or more hardware terminals, which might possibly have conformed to ECMA-48. But matching the device's actual behavior is the main requirement
I disagree. There's nothing stopping you using sequences that some terminals can't handle, or even inventing new sequences if you think that's necessary. It just requires that the sender and receiver first reach an agreement on which sequences are actually supported. DEC terminals achieved this with things like DA reports, conformance levels, and mode queries. I don't see why modern terminals couldn't do the same thing.
I hate this behavior, so I don't want to seem like I'm defending it, but technically there is nothing in ECMA-48 (as far as I've seen) that says a device couldn't do something like that. That said, if a terminal is claiming to emulate a VT220 (which is what Kitty identifies as), then I would expect it to match the VT220's interpretation of DCS, and that has no such limit AFAIK. So my criticism would be that it's a poor VT220 emulation. |
ECMA-48, as I read it, is a huge grab bag of possible options. Each terminal will implement a different subset and no terminal will implement all of it. As usual, I may be mistaken, but I see one of the benefits of ECMA-48 (and the subsequent ANSI standard) as having created a way for terminals to ignore the subsets they don't handle, including ones that didn't exist at the time they were manufactured. While I think DEC's labeling of sixels as "ANSI-compliant" is mostly marketing, it did have some meaning: an ANSI compatible terminal that doesn't support sixels can silently ignore them. Of course, it is completely allowed by the spec to not ignore unknown sequences and instead barf characters all over the screen. But, ECMA-48 at least gives terminals a chance to know what might happen. For example:
On the face, it is simply saying that ECMA-48 does not define anything about the maximum number of parameters. But a careful programmer would see that as saying there is no limit and their code should handle arbitrary long sequences. I'm disappointed to hear that Kitty failed that test, but yes, its behavior is within the ECMA-48 standard. (On the other hand, a device which explodes upon receiving DCS would also be within spec.) |
Since you mentioned it here above (and also in some terminal-wg thread) - you are totally right with your idea about using the sixel alphabet for a 6-bit encoding, instead of normal base64. Did some first decoder tests, it is massively faster than any base64 algo out there (ofc values are only for my machine):
compared to base64:
The 17 GB/s is my machine's single channel memory bus limit (yes oldish laptop here), so it basically runs at memcopy speed for me. Thats not possible with any base64 SIMD trickery (can only run up to AVX2 algos with this machine), there is only one method known coming close to memcopy speeds on AVX512 machines (https://arxiv.org/abs/1910.05109). But AVX512 is very power hungry and only available on big CPUs. So I stand corrected and think, that all new sequences with the need of payload encoding should use base64-sixel (well thats how I called it for now). Base64 is a very poor choice compared to base64-sixel. So thx for bringing this up, and not following "industry-standards" blindly. |
Coming from mintty/mintty#1122.
The text was updated successfully, but these errors were encountered: