Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct writing mode #292

Open
occasionallydavid opened this issue Nov 18, 2024 · 3 comments
Open

Direct writing mode #292

occasionallydavid opened this issue Nov 18, 2024 · 3 comments

Comments

@occasionallydavid
Copy link

occasionallydavid commented Nov 18, 2024

Hey there,

I absolutely love imap-codec, it's made it possible to pull together an IMAP server in really record time, even as a Rust newcomer. I'm a little less loving of imap-next though, the interface is causing some headaches and I was wondering if you could offer suggestions (or consider mine).

For example, when producing a large FETCH response, in the simplest use of the API, it is necessary to temporarily cease calling enqueue_....() and pump next() / stream.flush() on a regular basis, otherwise entire response will become buffered in memory. Deciding when to pump next/flush itself creates a new headache.. calling it for each response item causes large CPU overhead, perhaps as a result of heavy syscall use writing small messages. Finding a balance is hard because there doesn't appear to be much of any information to estimate the current size of the output buffer. At present I have hard-wired "if >10 responses sent with no corresponding ResponseSent, loop flushing until <= 10" loop inside the FETCH response handler, which is not ideal. It also does not account for example tiny responses (e.g. "(UID)") vs. large responses fetching the whole message body and headers.

Not just for memory' sake, but also response latency, it is necessary for a dance like above. Fetching my largest folder of 200k items has 6 seconds of raw CPU usage just to build the response (quite a reasonable overhead, I think), but without the above loop that turns into a 6 second delay before the client sees the first byte of the response.

Finally, it is necessary to continuously call next()/flush() during large writes to detect client state: there is no point burning all CPU producing a large response for a low bandwidth client, or indeed continuing to generate a response for a client that has hung or disconnected.

I like how imap-next is abstracting away all the details of the protocol, but what I really wish for is some interface like: server.write_data(&data).await where all the internal buffering and parallel world to the underlying network state is avoided. The other possibility of blocking the calling function is enabling sharing large message bodies rather than needing to copy them just to enter a queue they will almost immediately leave. This would ideally help to completely disconnect resident memory usage from the actual size of messages being sent. Is that something that might be possible?

Thanks

@soywod
Copy link
Contributor

soywod commented Nov 18, 2024

imap-next is still considered relatively low-level, you may be interested in the higher-level client suite imap-client. For even higher level of abstraction you have email-lib. Then you have applications at the top level like Himalaya CLI.

Regarding the CPU usage, #290 may drastically improve it.

@occasionallydavid
Copy link
Author

Thank you for the pointers @soywod, actually this is a component in an email app, to expose its custom storage. The linked PR looks very relevant, I will test shortly

@jakoschiko
Copy link
Collaborator

Hi, thanks for giving us feedback!

I'm a little less loving of imap-next though, the interface is causing some headaches

Don't worry, we are feeling the same.

For example, when producing a large FETCH response, in the simplest use of the API, it is necessary to temporarily cease calling enqueue_....() and pump next() / stream.flush() on a regular basis, otherwise entire response will become buffered in memory. Deciding when to pump next/flush itself creates a new headache.. calling it for each response item causes large CPU overhead, perhaps as a result of heavy syscall use writing small messages. Finding a balance is hard because there doesn't appear to be much of any information to estimate the current size of the output buffer. At present I have hard-wired "if >10 responses sent with no corresponding ResponseSent, loop flushing until <= 10" loop inside the FETCH response handler, which is not ideal. It also does not account for example tiny responses (e.g. "(UID)") vs. large responses fetching the whole message body and headers.

Features like batching are definitely out of scope for imap-next. But I agree that the current API makes easy tasks like this very complicated.

Not just for memory' sake, but also response latency, it is necessary for a dance like above. Fetching my largest folder of 200k items has 6 seconds of raw CPU usage just to build the response (quite a reasonable overhead, I think), but without the above loop that turns into a 6 second delay before the client sees the first byte of the response.

Finally, it is necessary to continuously call next()/flush() during large writes to detect client state: there is no point burning all CPU producing a large response for a low bandwidth client, or indeed continuing to generate a response for a client that has hung or disconnected.

I have no experience with this. Is 6s really slow for processing 200k items on a single thread? Anyway, I would love to see a flamegraph for your case.

I like how imap-next is abstracting away all the details of the protocol, but what I really wish for is some interface like: server.write_data(&data).await where all the internal buffering and parallel world to the underlying network state is avoided. The other possibility of blocking the calling function is enabling sharing large message bodies rather than needing to copy them just to enter a queue they will almost immediately leave. This would ideally help to completely disconnect resident memory usage from the actual size of messages being sent. Is that something that might be possible?

Using async for imap-next is difficult. We tried it out at first, but it forced us to do I/O inside the server/client state. The code was really complicated. Sans I/O improved the maintainability a lot.

We intentionally tried to implement as few features in imap-next as possible. Also we extracted much code into other libraries so that it can be re-used without imap-next. E.g. Fragmentizer is now part of imap-codec. Now the server side of imap-next has less than 1000 lines of code. So it's not that complicated to try out different alternative APIs.

I have the impression that the API you have in mind would be rather opinionated. We wanted to keep imap-next as unopinionated as possible. We wanted it to be the base for more opinionated libraries like imap-client. Not sure if we succeeded. Especially the problem with enqueue and ownership is really painful and I don't know how to solve it.

To be honest I'm not sure how to continue. We tried out different APIs and the current API is the least worst one regarding maintainability and usability. I don't expect big changes in the near future. Unless someone has a brilliant proposal :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants