You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FlatVector::copy(source, rows, toSourceRow) is faster.
Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.
The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.
Questions:
Should we optimize FlatVector::copyRanges by implementing a version of copyValuesAndNulls for ranges?
Should we optimize Array/MapVector::copy(source, rows, toSourceRow)
PR #6568 adds a benchmark and optimizes array_constructor to use copy(source, rows, toSourceRow) for primitive types and struct of these and copyRanges for everything else. This change speeds up array_constructor of primitive types and structs of these by about 5x.
Description
array_constructor is very slow: #5958 (comment)
array_constructor uses BaseVector::copyRanges, but this is very slow for flat vectors:
FlatVector::copy(source, rows, toSourceRow) is faster.
Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.
The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.
Questions:
CC: @laithsakka @bikramSingh91 @Yuhta @kevinwilfong
The text was updated successfully, but these errors were encountered: