ARROW-100: [C++] Computing RowBatch size #61

pcmoritz · 2016-04-14T05:16:17Z

Implement RowBatchWriter::DataHeaderSize and arrow::ipc::GetRowBatchSize. To achieve this, the Flatbuffer metadata is written to a temporary buffer and its size is determined. This commit also adds MockMemorySource, a new MemorySource that tracks the amount of memory written.

Author: Philipp Moritz pcmoritz@gmail.com

Implement RowBatchWriter::DataHeaderSize and arrow::ipc::GetRowBatchSize. To achieve this, the Flatbuffer metadata is written to a temporary buffer and its size is determined. This commit also adds MockMemorySource, a new MemorySource that tracks the amount of memory written. Author: Philipp Moritz <pcmoritz@gmail.com>

emkornfield · 2016-04-14T06:39:04Z

cpp/src/arrow/ipc/memory.h

@@ -121,6 +121,26 @@ class MemoryMappedSource : public MemorySource {
  std::unique_ptr<Impl> impl_;
 };

+// A MemorySource that tracks the size of allocations from a memory source


this probably belongs in test-common.h (along with the implementation, I'm not sure if it is worth creating a new .cc file or just inlining)

…tRowBatchSize, unify DataHeaderSize and TotalBytes into GetTotalSize

emkornfield · 2016-04-19T04:31:28Z

I'm sorry, it looks like my change did have some conflicts with yours (and it got merged first). Do you mind rebasing?

wesm · 2016-04-19T13:34:03Z

Sorry about that. I'll review/merge once this is rebased.

pcmoritz · 2016-04-19T22:22:12Z

Thanks, please hold off a little longer on that, I'd like to properly test it with all the other new IPC code that was added. I expect to finish this tonight.

pcmoritz · 2016-04-20T05:02:00Z

The PR should be ready now!

tnachen · 2016-04-20T14:26:51Z

cpp/src/arrow/ipc/memory.cc

+}
+
+Status MockMemorySource::Write(int64_t position, const uint8_t* data, int64_t nbytes) {
+  pos_ = std::max(pos_, position + nbytes);


Why only keep the max here?

The goal here is to determine how many bytes there are between the beginning of the buffer and the location where the last byte is being written; the function GetRowBatchSize will most of the time be used to determine how much shared memory should be allocated for IPC and then this is the quantity we care about; if memory is noncontiguous, it is not clear what the desired behaviour is.

See this comment at the beginning of GetRowBatchSize:
// Compute the precise number of bytes needed in a contiguous memory segment to
// write the row batch.

I think this is a variable naming and documentation problem. Can you change the variable name to extent_bytes_written_ or something similar and add a comment to Position (or rename Position) to indicate that it returns the smallest number of bytes containing the modified region of the MockMemorySource? Thanks

Thanks it makes sense now.

wesm · 2016-04-23T15:10:47Z

+1, thank you

wesm · 2016-04-23T15:13:20Z

@pcmoritz I've made you a Contributor on JIRA so you'll be able to assign yourself JIRAs going forward

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

Implements close on completion

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* Support casting boolean to bigint (apache#60) * remove log4j as it's not used (apache#61) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * Add stripe iteration support for batch_size reading in the ORC Scanner (apache#63) * Install re2 headers (apache#66) Co-authored-by: PHILO-HE <feilong.he@intel.com> Co-authored-by: zhixingheyi-tian <xiangxiang.shen@intel.com>

Implements close on completion

…o UnionVector (apache#61) When a DecimalVector is promoted to a UnionVector via a PromotableWriter, the UnionVector will have the decimal vector in it's internal struct vector, but the decimalVector field will not be set. If UnionReader.read is then used to read from the UnionVector, it will fail when it tries to read one of the promoted decimal values, due to decimalVector being null, and the exact decimal type not being provided. This failure is unnecessary though as we have a pre-existing decimal vector, the caller just does not know the exact type - and it shouldn't be required to. The change here is to check for a pre-existing decimal vector in the internal struct when getDecimalVector() is called. If one exists, set the decimalVector field and return. Otherwise, if none exists, throw the exception.

emkornfield reviewed Apr 14, 2016
View reviewed changes

factor out GetRowBatchSize test, use MockMemorySource to implement Ge…

9b69f12

…tRowBatchSize, unify DataHeaderSize and TotalBytes into GetTotalSize

pcmoritz added 2 commits April 19, 2016 14:57

merge GetRowBatchSize

67af8e1

fix maximum recursion depth

6b798f8

add tests for more datatypes

3484458

tnachen reviewed Apr 20, 2016
View reviewed changes

pcmoritz added 2 commits April 22, 2016 17:37

rename MockMemorySource methods to reflect better what they are doing

253c9f0

fix formating

e95fc5c

pcmoritz force-pushed the rowbatchsize branch from 2ba5d35 to e95fc5c Compare April 23, 2016 00:56

asfgit closed this in a541644 Apr 23, 2016

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

a766748

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

2984e64

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

a7cf7fc

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

0dcd35e

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 4, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

0ceefd4

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 10, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

3a59101

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 10, 2018

GDV-20: [Java] support varlen types in gandiva (apache#61)

05d57e0

- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)

vfraga pushed a commit to rafael-telles/arrow that referenced this pull request Dec 14, 2021

Merge pull request apache#61 from abenaru/close-on-completion

e1d6194

Implements close on completion

zhouyuan added a commit to zhouyuan/arrow that referenced this pull request Dec 23, 2021

remove log4j as it's not used (apache#61)

916fda6

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

rafael-telles pushed a commit to rafael-telles/arrow that referenced this pull request Jan 20, 2022

Merge pull request apache#61 from abenaru/close-on-completion

b3e3591

Implements close on completion

vfraga pushed a commit to rafael-telles/arrow that referenced this pull request Mar 29, 2022

Merge pull request apache#61 from abenaru/close-on-completion

880fbbe

Implements close on completion

github-actions bot mentioned this pull request Nov 26, 2024

GH-43769: [Java] Pin Java JNI CI build to llvm 16 #43770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-100: [C++] Computing RowBatch size #61

ARROW-100: [C++] Computing RowBatch size #61

pcmoritz commented Apr 14, 2016

emkornfield Apr 14, 2016

emkornfield commented Apr 19, 2016

wesm commented Apr 19, 2016

pcmoritz commented Apr 19, 2016

pcmoritz commented Apr 20, 2016

tnachen Apr 20, 2016

pcmoritz Apr 20, 2016 •

edited

Loading

wesm Apr 22, 2016 •

edited

Loading

pcmoritz Apr 23, 2016

tnachen Apr 23, 2016

wesm commented Apr 23, 2016

wesm commented Apr 23, 2016

ARROW-100: [C++] Computing RowBatch size #61

ARROW-100: [C++] Computing RowBatch size #61

Conversation

pcmoritz commented Apr 14, 2016

emkornfield Apr 14, 2016

Choose a reason for hiding this comment

emkornfield commented Apr 19, 2016

wesm commented Apr 19, 2016

pcmoritz commented Apr 19, 2016

pcmoritz commented Apr 20, 2016

tnachen Apr 20, 2016

Choose a reason for hiding this comment

pcmoritz Apr 20, 2016 • edited Loading

Choose a reason for hiding this comment

wesm Apr 22, 2016 • edited Loading

Choose a reason for hiding this comment

pcmoritz Apr 23, 2016

Choose a reason for hiding this comment

tnachen Apr 23, 2016

Choose a reason for hiding this comment

wesm commented Apr 23, 2016

wesm commented Apr 23, 2016

pcmoritz Apr 20, 2016 •

edited

Loading

wesm Apr 22, 2016 •

edited

Loading