Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int128 division implementation #2

Closed
wants to merge 2 commits into from

Conversation

karteekmurthys
Copy link

Arithmetic division operator implementation. This implementation is similar to
duckdb HugeInt division operation. Refer:
https://github.com/duckdb/duckdb/blob/master/src/common/types/hugeint.cpp#L257.

The algorithm is defined in Knuth's Art of Programming Vol2 chapter 4.3.1. This
implementaion is similar to performing division with paper and pen. Here is an
example:

lhs  1023
rhs  31

Split into 64-bit integers:

UPPER(int64) |    LOWER(uint64)
-------------------------------------
lhs  0         |  1023
rhs  0         |  31

        0000100001   <--- QUOTIENT = 33
       -------------------
11111 | 1111111111
      - 11111
       -------------
        000001
             11
             111
             1111
             11111
           - 11111
          --------------
             00000  <--- Remainder = 0

Testing:
Added unit tests.

Arithmetic division operator implementation. This implementation is similar to
duckdb HugeInt division operation. Refer:
https://github.com/duckdb/duckdb/blob/master/src/common/types/hugeint.cpp#L257.

The algorithm is defined in Knuth's Art of Programming Vol2 chapter 4.3.1. This
implementaion is similar to performing division with paper and pen. Here is an
example:

```
lhs  1023
rhs  31

Split into 64-bit integers:

UPPER(int64) |    LOWER(uint64)
-------------------------------------
lhs  0         |  1023
rhs  0         |  31

        0000100001   <--- QUOTIENT = 33
       -------------------
11111 | 1111111111
      - 11111
       -------------
        000001
             11
             111
             1111
             11111
           - 11111
          --------------
             00000  <--- Remainder = 0
```

Testing:
Added unit tests.
karteekmurthys pushed a commit that referenced this pull request May 1, 2022
…tor#1500)

Summary:
Enhance printExprWithStats to identify common-sub expressions.

For example, `c0 + c1` is a common sub-expression in
`"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and
there is a single Expr object that represents it. That object appears in the
expression tree twice. printExprWithStats does not show the runtime stats for
second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`,
where CSE stands for common sub-expression and 2 refers to the first instance
of the expression.

```
mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1]
   cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [#2]
      plus [cpu time: 51.84us, rows: 1024] -> INTEGER [#3]
         c0 [cpu time: 0ns, rows: 0] -> INTEGER [#4]
         c1 [cpu time: 0ns, rows: 0] -> INTEGER [#5]
   5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#6]

mod [cpu time: 49.29us, rows: 1024] -> BIGINT [#7]
   cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE #2]
   3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [#8]
```

Pull Request resolved: facebookincubator#1500

Reviewed By: Yuhta

Differential Revision: D35994836

Pulled By: mbasmanova

fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
@karteekmurthys karteekmurthys deleted the branch convert_decimal_string July 8, 2022 16:18
@karteekmurthys karteekmurthys deleted the int128_division branch July 8, 2022 16:21
majetideepak pushed a commit that referenced this pull request Sep 8, 2022
Summary:
Pull Request resolved: facebookincubator#2465

Fix data race exposed by TSAN:

```
WARNING: ThreadSanitizer: data race (pid=3333333)
  Write of size 8 at 0x7ffedeec8260 by main thread:
    #2 DriverTest_yield_Test::TestBody() velox/exec/tests/DriverTest.cpp:478 (velox_exec_test+0x5815ab)

  Previous read of size 8 at 0x7ffedeec8260 by thread T32:
    #2 DriverTest_yield_Test::TestBody()::$_3::operator()() const velox/exec/tests/DriverTest.cpp:480 (velox_exec_test+0x599aae)
```

Reviewed By: xiaoxmeng

Differential Revision: D39311571

fbshipit-source-id: 1062e97c874d59333d0534ce23660f2fbbd1ae18
majetideepak pushed a commit that referenced this pull request Sep 8, 2022
Summary:
Pull Request resolved: facebookincubator#2466

Fix data race exposed by TSAN:

```
WARNING: ThreadSanitizer: data race (pid=3539135)
  Write of size 8 at 0x7ffd6053de08 by main thread:
    #2 DriverTest_pauserNode_Test::TestBody() velox/exec/tests/DriverTest.cpp:677 (velox_exec_test+0x58205e)

  Previous read of size 8 at 0x7ffd6053de08 by thread T32:
    #2 DriverTest_pauserNode_Test::TestBody()::$_6::operator()() const velox/exec/tests/DriverTest.cpp:680 (velox_exec_test+0x5a4441)
```

Reviewed By: xiaoxmeng

Differential Revision: D39311986

fbshipit-source-id: c7f36bac377f243b879eb7c11bd8159e36b5fbee
pramodsatya pushed a commit that referenced this pull request Sep 20, 2023
Summary:
Pull Request resolved: facebookincubator#6568

array_constructor is very slow: facebookincubator#5958 (comment)

array_constructor uses BaseVector::copyRanges, which is somewhat fast for arrays and maps, but very slow for primitive types:

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

FlatVector<T>::copy(source, rows, toSourceRow) is faster.

Switching from copyRanges to copy speeds up array_constructor for primitive types and structs significantly. Yet, this change makes arrays and maps slower.

The slowness is due to ArrayVector and MapVector not having implementation for copy(source, rows, toSourceRow). They rely on BaseVector::copy to translate rows + toSourceRow to ranges. This extra processing causes perf regression.

Hence, we use copy for primitive types and structs of these and copyRanges for everything else.

```
Before:

array_constructor_ARRAY_nullfree##1                        16.80ms     59.53
array_constructor_ARRAY_nullfree##2                        27.02ms     37.01
array_constructor_ARRAY_nullfree##3                        38.03ms     26.30
array_constructor_ARRAY_nullfree##2_null                   52.86ms     18.92
array_constructor_ARRAY_nullfree##2_const                  54.97ms     18.19
array_constructor_ARRAY_nulls##1                           30.61ms     32.66
array_constructor_ARRAY_nulls##2                           55.01ms     18.18
array_constructor_ARRAY_nulls##3                           80.69ms     12.39
array_constructor_ARRAY_nulls##2_null                      69.10ms     14.47
array_constructor_ARRAY_nulls##2_const                    103.85ms      9.63

After:

array_constructor_ARRAY_nullfree##1                        15.25ms     65.58
array_constructor_ARRAY_nullfree##2                        25.11ms     39.82
array_constructor_ARRAY_nullfree##3                        34.59ms     28.91
array_constructor_ARRAY_nullfree##2_null                   53.61ms     18.65
array_constructor_ARRAY_nullfree##2_const                  51.48ms     19.42
array_constructor_ARRAY_nulls##1                           29.99ms     33.34
array_constructor_ARRAY_nulls##2                           55.91ms     17.89
array_constructor_ARRAY_nulls##3                           81.73ms     12.24
array_constructor_ARRAY_nulls##2_null                      66.97ms     14.93
array_constructor_ARRAY_nulls##2_const                     92.96ms     10.76

Before:

array_constructor_INTEGER_nullfree##1                      19.72ms     50.71
array_constructor_INTEGER_nullfree##2                      34.51ms     28.97
array_constructor_INTEGER_nullfree##3                      47.95ms     20.86
array_constructor_INTEGER_nullfree##2_null                 58.68ms     17.04
array_constructor_INTEGER_nullfree##2_const                45.15ms     22.15
array_constructor_INTEGER_nulls##1                         29.99ms     33.34
array_constructor_INTEGER_nulls##2                         55.32ms     18.08
array_constructor_INTEGER_nulls##3                         78.53ms     12.73
array_constructor_INTEGER_nulls##2_null                    72.24ms     13.84
array_constructor_INTEGER_nulls##2_const                   71.13ms     14.06

After:

array_constructor_INTEGER_nullfree##1                       3.39ms    294.89
array_constructor_INTEGER_nullfree##2                       7.35ms    136.10
array_constructor_INTEGER_nullfree##3                      10.78ms     92.74
array_constructor_INTEGER_nullfree##2_null                 11.29ms     88.57
array_constructor_INTEGER_nullfree##2_const                10.14ms     98.65
array_constructor_INTEGER_nulls##1                          4.49ms    222.53
array_constructor_INTEGER_nulls##2                          9.78ms    102.29
array_constructor_INTEGER_nulls##3                         14.69ms     68.08
array_constructor_INTEGER_nulls##2_null                    12.14ms     82.36
array_constructor_INTEGER_nulls##2_const                   12.27ms     81.53

Before:

array_constructor_MAP_nullfree##1                          17.34ms     57.65
array_constructor_MAP_nullfree##2                          29.84ms     33.51
array_constructor_MAP_nullfree##3                          41.51ms     24.09
array_constructor_MAP_nullfree##2_null                     56.57ms     17.68
array_constructor_MAP_nullfree##2_const                    71.68ms     13.95
array_constructor_MAP_nulls##1                             36.22ms     27.61
array_constructor_MAP_nulls##2                             68.18ms     14.67
array_constructor_MAP_nulls##3                             95.12ms     10.51
array_constructor_MAP_nulls##2_null                        86.42ms     11.57
array_constructor_MAP_nulls##2_const                      120.10ms      8.33

After:

array_constructor_MAP_nullfree##1                          17.05ms     58.66
array_constructor_MAP_nullfree##2                          28.42ms     35.18
array_constructor_MAP_nullfree##3                          36.96ms     27.06
array_constructor_MAP_nullfree##2_null                     55.64ms     17.97
array_constructor_MAP_nullfree##2_const                    67.53ms     14.81
array_constructor_MAP_nulls##1                             32.91ms     30.39
array_constructor_MAP_nulls##2                             64.50ms     15.50
array_constructor_MAP_nulls##3                             95.71ms     10.45
array_constructor_MAP_nulls##2_null                        77.22ms     12.95
array_constructor_MAP_nulls##2_const                      114.91ms      8.70

Before:

array_constructor_ROW_nullfree##1                          33.88ms     29.52
array_constructor_ROW_nullfree##2                          62.00ms     16.13
array_constructor_ROW_nullfree##3                          89.54ms     11.17
array_constructor_ROW_nullfree##2_null                     78.46ms     12.75
array_constructor_ROW_nullfree##2_const                    95.53ms     10.47
array_constructor_ROW_nulls##1                             44.11ms     22.67
array_constructor_ROW_nulls##2                            115.43ms      8.66
array_constructor_ROW_nulls##3                            173.61ms      5.76
array_constructor_ROW_nulls##2_null                       130.40ms      7.67
array_constructor_ROW_nulls##2_const                      169.97ms      5.88

After:

array_constructor_ROW_nullfree##1                           5.55ms    180.15
array_constructor_ROW_nullfree##2                          12.83ms     77.94
array_constructor_ROW_nullfree##3                          18.89ms     52.95
array_constructor_ROW_nullfree##2_null                     18.74ms     53.36
array_constructor_ROW_nullfree##2_const                    18.16ms     55.07
array_constructor_ROW_nulls##1                             11.29ms     88.61
array_constructor_ROW_nulls##2                             18.57ms     53.86
array_constructor_ROW_nulls##3                             34.20ms     29.24
array_constructor_ROW_nulls##2_null                        25.05ms     39.92
array_constructor_ROW_nulls##2_const                       25.15ms     39.77
```

Reviewed By: laithsakka

Differential Revision: D49272797

fbshipit-source-id: 55d83de7b69c7ae4b72b5a5ae62a7868f36b0e19
pramodsatya pushed a commit that referenced this pull request Sep 20, 2023
Summary:
Pull Request resolved: facebookincubator#6566

FlatVector::copyRanges is slow when there are many small ranges. I noticed this while investigating slowness of array_constructor which used created 1-row ranges: facebookincubator#5958 (comment)

```
FlatVector.h

  void copyRanges(
      const BaseVector* source,
      const folly::Range<const BaseVector::CopyRange*>& ranges) override {
    for (auto& range : ranges) {
      copy(source, range.targetIndex, range.sourceIndex, range.count);
    }
  }
```

This change optimizes FlatVector<T>::copyRanges using code from copy(source, targetIndex, sourceIndex, count) which copies one range. This change also replaces the latter call with a call to copyRanges, hence, the overall amount of code is about the same.

An earlier change optimized array_constructor to not use copyRanges for primitive types, but it is still used by Array/MapVector::copyRanges, although, these do not create 1-row ranges. Still the array_constructor benchmark shows some wins for arrays and maps.

```
Before:

array_constructor_ARRAY_nullfree##2_const                  54.31ms     18.41
array_constructor_ARRAY_nulls##1                           33.50ms     29.85
array_constructor_ARRAY_nulls##2                           59.05ms     16.93
array_constructor_ARRAY_nulls##3                           88.36ms     11.32
array_constructor_ARRAY_nulls##2_null                      74.53ms     13.42
array_constructor_ARRAY_nulls##2_const                    102.54ms      9.75

After:

array_constructor_ARRAY_nullfree##2_const                  41.36ms     24.18
array_constructor_ARRAY_nulls##1                           30.51ms     32.78
array_constructor_ARRAY_nulls##2                           55.13ms     18.14
array_constructor_ARRAY_nulls##3                           77.93ms     12.83
array_constructor_ARRAY_nulls##2_null                      68.84ms     14.53
array_constructor_ARRAY_nulls##2_const                     83.91ms     11.92

Before:

array_constructor_MAP_nullfree##2_const                    67.44ms     14.83
array_constructor_MAP_nulls##1                             37.00ms     27.03
array_constructor_MAP_nulls##2                             67.76ms     14.76
array_constructor_MAP_nulls##3                            100.88ms      9.91
array_constructor_MAP_nulls##2_null                        84.22ms     11.87
array_constructor_MAP_nulls##2_const                      122.55ms      8.16

After:

array_constructor_MAP_nullfree##2_const                    49.94ms     20.03
array_constructor_MAP_nulls##1                             34.34ms     29.12
array_constructor_MAP_nulls##2                             55.23ms     18.11
array_constructor_MAP_nulls##3                             82.64ms     12.10
array_constructor_MAP_nulls##2_null                        70.74ms     14.14
array_constructor_MAP_nulls##2_const                       88.13ms     11.35
```

Reviewed By: laithsakka

Differential Revision: D49269500

fbshipit-source-id: 7b702921202f4bb8d10a252eb0ab20f0e5792ae6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant