Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(array): less leaky string array #5483

Merged
merged 2 commits into from
May 23, 2024
Merged

Conversation

mhilton
Copy link
Contributor

@mhilton mhilton commented May 21, 2024

Change the behviour of the string array back to the old behaviour where accessing the Value function returns a string that is backed by the arrow memory buffer. This avoids data allocations to memory outside of the memory allocator.

The implementation of array.String has been simplified somewhat as part of the new behaviour.

There are a number of places where correct behviour relies on copies of the data being made. To avoid having to fix all of these in the same PR a temporary ValueCopy function has been added to maintain the old semantics. This is being used everywhere the Value function was previously, except for cases where the value is obviously immediately processed, then discarded.

The cases where the VisitCopy function is being used will be address one at a time until we can avoid significant levels of unaccounted memory.

Checklist

Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.

  • ✏️ Write a PR description, regardless of triviality, to include the value of this PR
  • 🔗 Reference related issues
  • 🏃 Test cases are included to exercise the new code
  • 🧪 If new packages are being introduced to stdlib, link to Working Group discussion notes and ensure it lands under experimental/
  • 📖 If language features are changing, ensure docs/Spec.md has been updated

Dear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.

Change the behviour of the string array back to the old behaviour
where accessing the Value function returns a string that is backed
by the arrow memory buffer. This avoids data allocations to memory
outside of the memory allocator.

The implementation of array.String has been simplified somewhat as
part of the new behaviour.

There are a number of places where correct behviour relies on copies
of the data being made. To avoid having to fix all of these in the
same PR a temporary ValueCopy function has been added to maintain
the old semantics. This is being used everywhere the Value function
was previously, except for cases where the value is obviously
immediately processed, then discarded.
@mhilton mhilton requested a review from a team as a code owner May 21, 2024 09:42
Copy link
Contributor

@appletreeisyellow appletreeisyellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a lot of files got touched in this PR, but the main change was adding the interface binaryArray and adapting it to different types. Many of the files changes are just refactoring the method name. I left some unblocking comments and questions ✅

I would prefer an other pair of eyes for review!

Comment on lines +176 to +183
// ValueCopy returns the value at the requested position copied into a
// new memory location. This value will remain valid after the array is
// released, but is not tracked by the memory allocator.
//
// This function is intended to be temporary while changes are being
// made to reduce the amount of unaccounted data memory.
func (a *String) ValueCopy(i int) string {
return string(a.ValueRef(i).Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. It is helpful! 👍

internal/arrowutil/iterator.gen.go.tmpl Outdated Show resolved Hide resolved
Comment on lines +198 to +200
// Buffer returns the memory buffer that contains the value.
func (r StringRef) Buffer() *arrowmem.Buffer {
return r.buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find anywhere uses Buffer() function. Is it still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used by the follow-up PRs in this series.

Co-authored-by: Chunchun Ye <14298407+appletreeisyellow@users.noreply.github.com>
@mhilton mhilton merged commit 96ae92b into master May 23, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants