Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line-wise transforms #695

Closed
ehmicky opened this issue Jan 16, 2024 · 2 comments · Fixed by #699
Closed

Line-wise transforms #695

ehmicky opened this issue Jan 16, 2024 · 2 comments · Fixed by #699

Comments

@ehmicky
Copy link
Collaborator

ehmicky commented Jan 16, 2024

Background

This feature was suggested by both #21 and #121.
Also mentioned by @sindresorhus in #693 (comment).

Rationale

Users can now transform stdin/stdout/stderr/stdio[*] by passing generators to those options.

The vast majority of processes' input and output is text-based. When transforming text-based data, it is very convenient to map one line at a time. For example, this is required for #121.

Currently, each chunk being transformed might be broken in the middle of a line. For example, the following does not currently work because the word secret might be split into 2 different chunks.

const transform = async function * (chunks) {
	for await (const chunk of chunks) {
		if (!chunk.includes('secret')) {
			yield chunk;
		}
	}
};

const {stdout} = await execa('echo', ['This is a secret.'], {stdout: transform});

Solution

Transforms should iterate over chunks line by line. We would use \n as an end delimiter (which works for \r\n too).

However, I think we should still allow users to opt out of this behavior because, although less common, some processes' input/output is in fact binary. Splitting binary input/output line by line would result in chunks of varying size, some potentially very large if the binary input/output does not contain any newline. This would take away the main benefits of streaming (incremental CPU load + low memory).

API

The users would get this behavior by default:

execa('...', {stdout: transform});

To allow users to opt-out, I would suggest the following format.

execa('...', {stdout: { transform, binary: true }});

In other words, a transform can be either AsyncGeneratorFunction or { transform: AsyncGeneractorFunction, binary: boolean }.

This format is also extensible to other options.

Alternatives

One other possible API would be:

execa('...', {stdout: ['binary', transform]});

However, some users might want to use multiple transforms for stdout, with some of them being binary, others not. For example, one transform might convert a binary stdout to a text-based one. Then a second transform might operate line-wise. The above approach would not allow this.

execa('...', {stdout: ['binary', binaryToTextTransform, secretFilterTransform]});

Yet another possible API would be:

execa('...', {stdout: transform, binary: true});

However, this has the same problem. It also has an additional one: it would apply to all streams (stdin/stdout/stderr/stdio[*]). For example, some processes might produce a binary stdout but a text-based stderr.

execa('...', {stdout: binaryTransform, stderr: textTransform, binary: true});

Implementing this should be rather simple, since splitting chunks line-wise can be done using the same kind of generator users can now use themselves. In other words, we just need to add an internal generator, like we already do for encoding the chunks.

const encodingStartStringGenerator = async function * (chunks) {
const textDecoder = new TextDecoder();
for await (const chunk of chunks) {
yield textDecoder.decode(chunk, {stream: true});
}
const lastChunk = textDecoder.decode();
if (lastChunk !== '') {
yield lastChunk;
}
};

What do you think @sindresorhus?

@sindresorhus
Copy link
Owner

execa('...', {stdout: { transform, binary: true }});

👍

@ehmicky
Copy link
Collaborator Author

ehmicky commented Jan 17, 2024

Done in #699.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants