bytes, strings: add iterator forms of existing functions #61901

rsc · 2023-08-09T16:04:30Z

We propose to add the following functions to package bytes and package strings, to allow callers to iterate over these results without having to allocate the entire result slice. This text shows only the string package form.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Iterating over lines is an incredibly common operation that we’ve resisted adding only because we didn’t want to encourage allocation of a potentially large slice. Iterators provide a way to finally add it.

// Lines returns an iterator over the newline-terminated lines in the string s.
// The lines yielded by the iterator include their terminating newlines.
// If s is empty, the iterator yields no lines at all.
// If s does not end in a newline, the final yielded line will not end in a newline.
func Lines(s string) iter.Seq[string] {
	return func(yield func(string)bool) bool {
		for s != "" {
			var line string
			if i := strings.Index(s, "\n"); i >= 0 {
				line, s = s[:i+1], s[i+1:]
			} else {
				line, s = s, ""
			}
			if !yield(line) {
				return false
			}
		}
		return true
	}
}

Iterating over bytes in a string is common and too difficult, since range ranges over runes. This function will inline to the obvious for loop (because we will make sure it does):

// Bytes returns an iterator over bytes in s, yielding both the index and the byte.
func Bytes(s string) iter.Seq2[int, byte] {
	return func(yield func(int, byte) bool) bool {
		for i := range len(s) {
			if !yield(i, s[i]) {
				return false
			}
		}
		return true
	}
}

Iterating over runes is served by a regular range loop, but like slices.All and maps.All, it could be useful as an input to other iterator adapters. The name is Runes, not Seq or All, so that its clear at call sites what is being iterated over (runes not bytes).

// Runes returns an iterator over bytes in s, yielding both the start index and the rune.
func Runes(s string) iter.Seq2[int, rune] {
	return func(yield func(int, rune) bool) bool {
		for i, c := range s {
			if !yield(i, c) {
				return false
			}
		}
		return true
	}
}

Similar to Lines, there should be iterator forms of Split, Fields, and Runes, to avoid requiring the allocation of a slice when the caller only wants to iterate over the individual results. If we were writing the library from scratch, we might use the names Split, Fields, and Runes for the iterator-returning versions, and code that wanted the full slice could use slices.Collect. But that's not an option here, so we add a distinguishing Seq suffix. We do not expect that new functions will use the Seq suffix. For example the function above is Lines, not LinesSeq.

// SplitSeq returns an iterator over all substrings of s separated by sep.
// The iterator yields the same strings that would be returned by Split(s, sep),
// but without constructing the slice.
func SplitSeq(s, sep string) iter.Seq[string] {
	if sep == "" {
		return runeSplitSeq(s)
	}
	return func(yield func(string)bool) bool {
		for {
			i := strings.Index(s, sep)
			if i < 0 {
				break
			}
			frag := s[:i]
			if !yield(frag) {
				return false
			}
			s = s[i+len(sep):]
		}
		return yield(s)
	}
}

func runeSplitSeq(s string) iter.Seq[string] {
	return func(yield func(string)bool) bool {
		for s != "" {
			_, size := utf8.DecodeRuneInString(s)
			if !yield(s[:size]) {
				return false
			}
			s = s[size:]
		}
	}
}

// SplitAfterSeq returns an iterator over substrings of s split after each instance of sep.
func SplitAfterSeq(s, sep string) iter.Seq[string]

// FieldsSeq returns an iterator over substrings of s split around runs of
// whitespace characters, as defined by unicode.IsSpace. ...
func FieldsSeq(s string) iter.Seq[string]

// FieldsFuncSeq returns an iterator over substrings of s split around runs of
// Unicode code points satisfying f(c). ...
func FieldsFuncSeq(s string, f func(rune) bool) iter.Seq[string]

The text was updated successfully, but these errors were encountered:

bcmills · 2023-08-09T16:14:25Z

Iterating over runes is served by a regular range loop

Will Bytes and Runes be added to the bytes package as well? Runes in particular is not entirely trivial today for byte-slices.

jimmyfrasche · 2023-08-09T18:32:42Z

Should Lines also yield the 1-based line number? It's not always needed but it'd be easy enough to ignore when you don't.

earthboundkid · 2023-08-09T20:16:02Z

Iterating over bytes in a string is common and too difficult, since range ranges over runes.

Is i, c := range []byte(s) too difficult? I would say the advantage is that strings.Bytes is more discoverable, but the []byte() version is preferred when you don't need an iterator for some reason.

rsc · 2023-08-09T21:41:26Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

DeedleFake · 2023-08-09T22:42:30Z

Iterating over bytes in a string is common and too difficult, since range ranges over runes.

Is i, c := range []byte(s) too difficult? I would say the advantage is that strings.Bytes is more discoverable, but the []byte() version is preferred when you don't need an iterator for some reason.

That relies on specialized optimizations to avoid an allocation and a copy. I'm not sure if those optimizations exist right now. The current best way to do it, as far as I know, is

for i := 0; i < len(s); i++ {
  c := s[i]
  // ...
}

I don't think that's particularly difficult, personally, but I think that strings.Bytes() could still be useful as a feed in to a transformative iterator.

ianlancetaylor · 2023-08-09T23:37:43Z

Similar to Lines, there should be iterator forms of Split, Fields, and Runes

Unless I misunderstand I think you just mean Split and Fields. And there is another misplaced reference to Runes a few lines down.

benhoyt · 2023-08-10T05:03:46Z

I like this overall. Two things I wonder about:

It seems to me that Lines should trim the line ending from the string. That's how the existing similar tool bufio.ScanLines works, and it's almost always what you want, otherwise you have to call strings.TrimSpace or strings.TrimRight before you use the string. It seems to me for the few cases you really need the line ending (or the ability to rebuild the exact input text) you can use strings.Index manually.
I wonder if SplitSeq and FieldsSeq will pay their way. I often need the length of the split result, for example with ad-hoc parsing of separator-separated values, you often want to know the number of columns/fields. (Though I admit that since strings.Cut, that takes care of quite a few previous use cases.) Should we do some analysis of this in real-world code to determine whether it's worth adding these?

andig · 2023-08-10T20:15:07Z

Is i, c := range []byte(s) too difficult?

Don‘t think so. And shouldn‘t there be only one way of doing things in go?

jimmyfrasche · 2023-08-10T20:30:27Z

You can pass iterators to other functions instead of just immediately using it in a range so being able to make iterators for built in things is still useful.

aarzilli · 2023-08-11T08:16:26Z

ISTM that strings.Lines encourages people to write:

buf, _ := os.ReadFile(
for _, line := range strings.Lines(string(buf)) { ... }

I'd rather see an iterator version of bufio.Scanner.

Merovius · 2023-08-11T09:41:01Z

@aarzilli I think it's pretty trivial to add an All() method to bufio.Scanner. strings.Lines would still be useful. Note that s := bufio.NewScanner(strings.NewReader(str)); for _, line := s.All() { … } requires to copy data around a bunch (because of how io.Reader works), while for _, line := strings.Lines(str) { … } doesn't copy anything.

willfaught · 2023-08-14T04:19:46Z

Bytes(s string) iter.Seq2[int, byte]

I wonder if it's necessary to return the index for bytes, since it's trivial to count them yourself. I guess it's nice to mirror the for/range statement.

func SplitSeq(s, sep string) iter.Seq[string]

Should this return the start index of the string too?

Similar to Lines, there should be iterator forms of Split, Fields, and Runes

Should we have SplitAfterNSeq and SplitNSeq too? If not, why? These are the only ones left out.

I wonder if SplitSeq and FieldsSeq will pay their way. I often need the length of the split result, for example with ad-hoc parsing of separator-separated values, you often want to know the number of columns/fields. (Though I admit that since strings.Cut, that takes care of quite a few previous use cases.) Should we do some analysis of this in real-world code to determine whether it's worth adding these?

If you need to know the length, the existing functions can handle that case.

earthboundkid · 2023-08-15T15:30:44Z

Join would be useful to have. For example, just now I wanted to make a String() method for a named slice. The easy way to do it would be strings.JoinAll(iterutil.Map(slices.Values(s), Element.String), ", ").

rsc · 2023-08-30T17:24:19Z

Finishing this proposal discussion is blocked on #61405.

gopherbot · 2024-01-26T04:35:43Z

Change https://go.dev/cl/558735 mentions this issue: bytes, strings: add SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq

rsc · 2024-02-08T23:59:30Z

Have all remaining concerns about this proposal been addressed?

The proposal is to add these to both bytes and strings:

// Lines returns an iterator over the newline-terminated lines in the string s.
// The lines yielded by the iterator include their terminating newlines.
// If s is empty, the iterator yields no lines at all.
// If s does not end in a newline, the final yielded line will not end in a newline.
func Lines(s string) iter.Seq[string]

// SplitSeq returns an iterator over all substrings of s separated by sep.
// The iterator yields the same strings that would be returned by Split(s, sep),
// but without constructing the slice.
func SplitSeq(s, sep string) iter.Seq[string]

// SplitAfterSeq returns an iterator over substrings of s split after each instance of sep.
// The iterator yields the same strings that would be returned by SplitAfter(s, sep),
// but without constructing the slice.
func SplitAfterSeq(s, sep string) iter.Seq[string]

// FieldsSeq returns an iterator over substrings of s split around runs of
// whitespace characters, as defined by unicode.IsSpace.
// The iterator yields the same strings that would be returned by Fields(s),
// but without constructing the slice.
func FieldsSeq(s string) iter.Seq[string]

// FieldsFuncSeq returns an iterator over substrings of s split around runs of
// Unicode code points satisfying f(c).
// The iterator yields the same strings that would be returned by FieldsFunc(s),
// but without constructing the slice.
func FieldsFuncSeq(s string, f func(rune) bool) iter.Seq[string]

fzipp · 2024-02-09T00:06:44Z

// Lines returns an iterator over the newline-terminated lines in the string s.
// The lines yielded by the iterator include their terminating newlines.

I would not expect them to include the terminating newlines if I didn't read the documentation.

rsc · 2024-02-09T03:32:27Z

I would not expect them to include the terminating newlines if I didn't read the documentation.

Good reason to read the documentation! 😄

If the newlines are not included, then the user of the iterator cannot distinguish Lines("abc") from Lines("abc\n"). It is often important to know whether a file has a non-terminated final line.

DeedleFake · 2024-02-09T04:42:06Z

If the 'last' line does end in a newline, does Lines() yield an empty line without a newline after it? In other words, would Lines("example\n") yield just "example\n" or would it yield "example\n" and then ""?

Edit: And if it does yield that last line, how is Lines(str) any different from SplitAfterSeq(str, "\n") besides having a cleaner, shorter name?

jimmyfrasche · 2024-02-09T20:10:37Z

I know there are cases where it matters if the file ends with a newline or not but I don't think I have ever had to deal with that in any code I have ever written.

earthboundkid · 2024-02-09T20:43:23Z

I think the reason to keep the \n is then you don't have to document what happens to \r\n.

rsc · 2024-02-14T18:07:16Z

If the 'last' line does end in a newline, does Lines() yield an empty line without a newline after it? In other words, would Lines("example\n") yield just "example\n" or would it yield "example\n" and then ""?

It does not yield a final "".

myitcv · 2024-03-02T09:56:03Z

Just a minor thought on the docs for Lines:

// Lines returns an iterator over the newline-terminated lines in the string s.
// The lines yielded by the iterator include their terminating newlines.
// If s is empty, the iterator yields no lines at all.
// If s does not end in a newline, the final yielded line will not end in a newline.

If s does not contain any newline characters, a strict reading of the first sentence seems to imply I will not iterate over anything:

// Lines returns an iterator over the newline-terminated lines in the string s.

Because in that case there are no newline-terminated lines in the string s.

Similarly the last sentence does not quite sit right for me:

// If s does not end in a newline, the final yielded line will not end in a newline.

Because if it doesn't end in a newline, it surely falls outside the definition of "an iterator over the newline-terminated lines".

All that said, I haven't managed to conjure up anything better! Because capturing the edge cases in pithy documentation here seems particularly hard.

rsc · 2024-03-08T03:05:10Z

The accepted version of Lines is what is in the accept message: the newlines are included in the lines, for the reasons given above. In essentially all cases I can think of, the newline chopping is easily done when parsing the line inside the loop over the lines. As others have noted, it is easy to do s = strings.TrimSuffix(s, "\n").

@dsnet mentioned bufio.Reader.ReadLine, but that doc comment says "ReadLine is a low-level line-reading primitive. Most callers should use Reader.ReadBytes('\n') or Reader.ReadString('\n') instead or use a Scanner." ReadBytes and ReadString do include the delimiter, while Scanner by default does not. The standard library is as split as this discussion, but the basic primitives ReadBytes and ReadString preserve the \n, while the higher-level, more configurable API defaults to removing them but is easily reconfigured (with a different split function).

My general rule for parsing functionality is to avoid throwing away information, which is the justification for keeping the \n.

…FuncSeq Fixes golang#61901.

gopherbot · 2024-05-21T14:40:54Z

Change https://go.dev/cl/587095 mentions this issue: bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq

…FuncSeq Fixes golang#61901.

jub0bs · 2024-09-20T12:12:46Z

I'm late to the party, but I noticed that the doc comments (at tip) of Lines, SplitSeq, and SplitAtfterSeq (in both the bytes and strings packages) all contain the following statement:

It returns a single-use iterator.

By inspecting the implementation of those functions, I understand why the iterators they return are single-use. For example, the push iterator returned by strings.Lines is stateful because it closes over that function's string parameter and mutates it:

func Lines(s string) iter.Seq[string] {
	return func(yield func(string) bool) {
		for len(s) > 0 {
			var line string
			if i := IndexByte(s, '\n'); i >= 0 {
				line, s = s[:i+1], s[i+1:] // <-------------
			} else {
				line, s = s, ""
			}
			if !yield(line) {
				return
			}
		}
		return
	}
}

However, I don't understand why the resulting iterators have to be single-use. I may have missed something, but I don't think their single-use nature was discussed in this proposal. What's the rationale for it?

For convenience, shouldn't iterators be reusable whenever possible/inexpensive? If reusability is indeed desired, achieving it in this case would be straightforward; for instance, for strings.Lines, it's a one-line fix:

func Lines(s string) iter.Seq[string] {
	return func(yield func(string) bool) {
		s := s // <--------- local copy
		for len(s) > 0 {
			var line string
			if i := IndexByte(s, '\n'); i >= 0 {
				line, s = s[:i+1], s[i+1:]
			} else {
				line, s = s, ""
			}
			if !yield(line) {
				return
			}
		}
		return
	}
}

(playground)

ianlancetaylor · 2024-09-20T23:04:33Z

We wouldn't want to do that for bytes.Lines, because the byte slice might change. And there is something to be said for consistency between strings.Lines and bytes.Lines. So, maybe we should change it, but I'm not sure.

jub0bs · 2024-09-21T03:55:29Z

@ianlancetaylor Good point. I guess consistency between strings.Lines and bytes.Lines should prevail over the reusability of strings iterators.

jub0bs · 2024-10-05T09:38:02Z

I know the proposal has already been accepted (which rejoices me), but here is some anecdotal argument in favour of strings.SplitSeq's addition: rs/cors#170

earthboundkid · 2024-10-24T19:46:26Z

Today I wished I had strings.JoinSeq (and could have sworn it was already in Go 1.23), so I opened #70034.

gopherbot · 2024-12-17T17:14:21Z

Change https://go.dev/cl/637176 mentions this issue: bytes, strings: add cross-references in docstrings

For newly funcs SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq. Updates #61901. Change-Id: I3c97bfd9c2250de68aaea348c82a05635ee797af Reviewed-on: https://go-review.googlesource.com/c/go/+/637176 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

gopherbot · 2024-12-18T16:04:31Z

Change https://go.dev/cl/637358 mentions this issue: bytes, strings: examples for funcs added in Go 1.24

For newly funcs SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq. Updates golang#61901. Change-Id: I3c97bfd9c2250de68aaea348c82a05635ee797af Reviewed-on: https://go-review.googlesource.com/c/go/+/637176 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>

rsc added the Proposal label Aug 9, 2023

gopherbot added this to the Proposal milestone Aug 9, 2023

rsc mentioned this issue Aug 9, 2023

iter: new package for iterators #61897

Closed

ianlancetaylor added this to Proposals Aug 9, 2023

ianlancetaylor moved this to Incoming in Proposals Aug 9, 2023

rsc moved this from Incoming to Active in Proposals Aug 9, 2023

dominikh mentioned this issue Aug 10, 2023

staticcheck: recommend iterator-based alternatives from the standard library dominikh/go-tools#1432

Open

myaaaaaaaaa mentioned this issue Oct 19, 2023

spec: add range over int, range over func #61405

Closed

aimuz mentioned this issue May 21, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq #67543

Closed

aimuz added a commit to aimuz/go that referenced this issue May 21, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

a29818a

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue May 22, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

a97f8bd

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Jul 26, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

532c44e

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 5, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

3efbd09

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 5, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

259de87

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 6, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

6d44a30

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 6, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

68621e1

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 8, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

16cc8d4

…FuncSeq Fixes golang#61901.

aimuz added a commit to aimuz/go that referenced this issue Aug 9, 2024

bytes, strings: add Lines, SplitSeq, SplitAfterSeq, FieldsSeq, Fields…

ed3df56

…FuncSeq Fixes golang#61901.

gopherbot closed this as completed in 9becf40 Aug 14, 2024

jub0bs mentioned this issue Sep 26, 2024

proposal: errors: add All and AllAs iterators #66455

Closed

earthboundkid mentioned this issue Oct 24, 2024

proposal: strings, bytes: Add JoinSeq #70034

Open

dmitshur modified the milestones: Backlog, Go1.24 Nov 23, 2024

bep mentioned this issue Nov 27, 2024

Go 1.24 notes/TODOs gohugoio/hugo#13091

Open

10 tasks

gopherbot mentioned this issue Dec 5, 2024

api: audit for Go 1.24 #70701

Closed

database64128 mentioned this issue Dec 19, 2024

prefixset: add iterators over prefixes aromatt/netipds#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes, strings: add iterator forms of existing functions #61901

bytes, strings: add iterator forms of existing functions #61901

rsc commented Aug 9, 2023

bcmills commented Aug 9, 2023

jimmyfrasche commented Aug 9, 2023

earthboundkid commented Aug 9, 2023

rsc commented Aug 9, 2023

DeedleFake commented Aug 9, 2023 •

edited

Loading

ianlancetaylor commented Aug 9, 2023

benhoyt commented Aug 10, 2023

andig commented Aug 10, 2023

jimmyfrasche commented Aug 10, 2023

aarzilli commented Aug 11, 2023

Merovius commented Aug 11, 2023

willfaught commented Aug 14, 2023

earthboundkid commented Aug 15, 2023

rsc commented Aug 30, 2023

gopherbot commented Jan 26, 2024

rsc commented Feb 8, 2024

fzipp commented Feb 9, 2024

rsc commented Feb 9, 2024 •

edited

Loading

DeedleFake commented Feb 9, 2024 •

edited

Loading

jimmyfrasche commented Feb 9, 2024

earthboundkid commented Feb 9, 2024

rsc commented Feb 14, 2024

myitcv commented Mar 2, 2024

rsc commented Mar 8, 2024

gopherbot commented May 21, 2024

jub0bs commented Sep 20, 2024 •

edited

Loading

ianlancetaylor commented Sep 20, 2024

jub0bs commented Sep 21, 2024 •

edited

Loading

jub0bs commented Oct 5, 2024 •

edited

Loading

earthboundkid commented Oct 24, 2024

gopherbot commented Dec 17, 2024

gopherbot commented Dec 18, 2024

bytes, strings: add iterator forms of existing functions #61901

bytes, strings: add iterator forms of existing functions #61901

Comments

rsc commented Aug 9, 2023

bcmills commented Aug 9, 2023

jimmyfrasche commented Aug 9, 2023

earthboundkid commented Aug 9, 2023

rsc commented Aug 9, 2023

DeedleFake commented Aug 9, 2023 • edited Loading

ianlancetaylor commented Aug 9, 2023

benhoyt commented Aug 10, 2023

andig commented Aug 10, 2023

jimmyfrasche commented Aug 10, 2023

aarzilli commented Aug 11, 2023

Merovius commented Aug 11, 2023

willfaught commented Aug 14, 2023

earthboundkid commented Aug 15, 2023

rsc commented Aug 30, 2023

gopherbot commented Jan 26, 2024

rsc commented Feb 8, 2024

fzipp commented Feb 9, 2024

rsc commented Feb 9, 2024 • edited Loading

DeedleFake commented Feb 9, 2024 • edited Loading

jimmyfrasche commented Feb 9, 2024

earthboundkid commented Feb 9, 2024

rsc commented Feb 14, 2024

myitcv commented Mar 2, 2024

rsc commented Mar 8, 2024

gopherbot commented May 21, 2024

jub0bs commented Sep 20, 2024 • edited Loading

ianlancetaylor commented Sep 20, 2024

jub0bs commented Sep 21, 2024 • edited Loading

jub0bs commented Oct 5, 2024 • edited Loading

earthboundkid commented Oct 24, 2024

gopherbot commented Dec 17, 2024

gopherbot commented Dec 18, 2024

DeedleFake commented Aug 9, 2023 •

edited

Loading

rsc commented Feb 9, 2024 •

edited

Loading

DeedleFake commented Feb 9, 2024 •

edited

Loading

jub0bs commented Sep 20, 2024 •

edited

Loading

jub0bs commented Sep 21, 2024 •

edited

Loading

jub0bs commented Oct 5, 2024 •

edited

Loading