Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes to fork #1

Merged
merged 198 commits into from
May 13, 2022
Merged

Merge changes to fork #1

merged 198 commits into from
May 13, 2022

Conversation

Arty-Maly
Copy link
Owner

No description provided.

laithsakka and others added 30 commits April 13, 2022 15:33
Summary: title.

Reviewed By: mbasmanova

Differential Revision: D35623222

fbshipit-source-id: e26b5ac0a04c57c898db5afa1adb74e9bcb7cd0b
Summary:
`Feature Of The Month` was incorrectly tagged as a chapter and hence
 it appeared in the main `Monthly Updates` list. It is now tagged as a section.

Pull Request resolved: #1395

Reviewed By: laithsakka

Differential Revision: D35627299

Pulled By: mbasmanova

fbshipit-source-id: 73f2d96fa0f4a925807f1ececca7096bfe2b4b16
…Results() (#1419)

Summary:
Pull Request resolved: #1419

Some queries might have local exchanges before INTERMEDIATE aggregation. The local exchanges produce dictionaries, so we need to expect these in CountAggregate::addSingleGroupIntermediateResults()
Example query: https://www.internalfb.com/intern/presto/query/?query_id=20220411_214537_01209_dehmi#sql
The diff:
* Adds support for any encoded vector in CountAggregate::addSingleGroupIntermediateResults().
* Adds support for RoundRobin Local Partition Node in PlanBuilder.
* Removes the check Partial Aggregation before the Final Aggregation.

Reviewed By: mbasmanova

Differential Revision: D35603243

fbshipit-source-id: 273932df0c6faabf2e8135d9d4a33216bc9255cf
Summary:
Pull Request resolved: #1413

title

Reviewed By: kevinwilfong

Differential Revision: D35479812

fbshipit-source-id: 1de41765ac69939d43e408b8a9c4b22e73672ece
Summary:
Pull Request resolved: #1418

to be consistent with ArrayView and MapView, materialize is added as well to StringView.

Reviewed By: kevinwilfong

Differential Revision: D35623645

fbshipit-source-id: 900d6f16f2da7060206730faa2ee25ebd56ba235
Summary:
Start tracking CPU and wall time spent evaluating each expression. Also, track total number of rows processed.

Pull Request resolved: #1412

Reviewed By: pedroerp

Differential Revision: D35622286

Pulled By: mbasmanova

fbshipit-source-id: 3c2a9cbc377757cf08fb086ce0c3941cdcfa924c
Summary: Pull Request resolved: #1420

Reviewed By: spershin

Differential Revision: D35631589

Pulled By: mbasmanova

fbshipit-source-id: 0fb04f11ca44a22b92c72464be68fcd56fdbd752
Summary: Pull Request resolved: #1421

Reviewed By: spershin

Differential Revision: D35632552

Pulled By: mbasmanova

fbshipit-source-id: de9545cb6330caeecab3e62d67a252f058d70e69
Summary:
Use ContinuePromise instead of VeloxPromise<bool> throughout the code base for
consistency (with ContinueFuture) and readability.

Pull Request resolved: #1410

Reviewed By: kagamiori

Differential Revision: D35617219

Pulled By: mbasmanova

fbshipit-source-id: 5bce0aad8f7bb9f97d94bc688751c5594aa23783
Summary:
Parquet reader didn't use ScanSpec to figure out the output layout. In case of
q18 this resulted in the reader returning columns in the wrong order, i.e.
(custkey, orderkey) instead of (orderkey, custkey).

Also, added support for kBigintValuesUsingHashTable filter, a contribution from
Ge (gggrace14).

Pull Request resolved: #1423

Reviewed By: gggrace14

Differential Revision: D35635866

Pulled By: mbasmanova

fbshipit-source-id: ba6306a7539b2d5bbaa4b197f6f1b893ed8a0d0f
Summary:
Add a version of PlanBuilder::tableScan() that allows to specify table name and
column aliases in addition to SQL expressions for range and remaining filters.

This allows to write TPC-H query plans more easily. Here is an example from q6:

```
.tableScan(
    kLineitem,
    selectedRowType,
    fileColumnNames,
    {"l_discount between 0.05 and 0.07",
     "l_quantity < 24.0"})
```

Pull Request resolved: #1414

Reviewed By: kgpai

Differential Revision: D35628758

Pulled By: mbasmanova

fbshipit-source-id: 183fcbd836dbd505d5aee74b14c9c0a695adcefc
…#1424)

Summary:
Pull Request resolved: #1424

Some queries might have local exchanges before INTERMEDIATE aggregation. The local exchanges produce dictionaries, so we need to expect these in addSingleGroupIntermediateResults() of the array_agg, map_agg and checksum aggregations.

Reviewed By: mbasmanova

Differential Revision: D35636100

fbshipit-source-id: 2c18bf01faba407e4070feecaf7acd0c904c286a
Summary:
Add helper function to print expression tree annotation with runtime stats such
as cpu time and number of processed rows.

Also, add cpu time tracking for Cast expression. Tracking for other built-in
expressions such as AND, OR, SWITCH will come in follow-up PRs.

Here are some examples:

2 expressions evaluated on flat vectors of size 1024: `(c0 + 3) * c1`, `(c0 + c1) % 2 = 0`

```
multiply [cpu time: 53.70us, rows: 1024] -> BIGINT
   plus [cpu time: 70.69us, rows: 1024] -> BIGINT
      cast(c0 as BIGINT) [cpu time: 132.21us, rows: 1024] -> BIGINT
         c0 [cpu time: 0ns, rows: 0] -> INTEGER
      3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
   cast(c1 as BIGINT) [cpu time: 67.17us, rows: 1024] -> BIGINT
      c1 [cpu time: 0ns, rows: 0] -> INTEGER

eq [cpu time: 87.21us, rows: 1024] -> BOOLEAN
   mod [cpu time: 55.06us, rows: 1024] -> BIGINT
      cast(plus as BIGINT) [cpu time: 64.71us, rows: 1024] -> BIGINT
         plus [cpu time: 60.85us, rows: 1024] -> INTEGER
            c0 [cpu time: 0ns, rows: 0] -> INTEGER
            c1 [cpu time: 0ns, rows: 0] -> INTEGER
      2:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
   0:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
```

2 expressions evaluated on dictionary-encoded vectors of size 1024 where each
row is repeated 5 times: `(c0 + 3) * c1`, `(c0 + c1) % 2 = 0`. Notice that
evaluation happened only on 1/5 of the rows.

```
multiply [cpu time: 35.52us, rows: 205] -> BIGINT
   plus [cpu time: 46.66us, rows: 205] -> BIGINT
      cast(c0 as BIGINT) [cpu time: 80.15us, rows: 205] -> BIGINT
         c0 [cpu time: 0ns, rows: 0] -> INTEGER
      3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
   cast(c1 as BIGINT) [cpu time: 34.44us, rows: 205] -> BIGINT
      c1 [cpu time: 0ns, rows: 0] -> INTEGER

eq [cpu time: 43.46us, rows: 205] -> BOOLEAN
   mod [cpu time: 35.93us, rows: 205] -> BIGINT
      cast(plus as BIGINT) [cpu time: 34.83us, rows: 205] -> BIGINT
         plus [cpu time: 34.43us, rows: 205] -> INTEGER
            c0 [cpu time: 0ns, rows: 0] -> INTEGER
            c1 [cpu time: 0ns, rows: 0] -> INTEGER
      2:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
   0:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT
```

Pull Request resolved: #1425

Reviewed By: pedroerp

Differential Revision: D35643641

Pulled By: mbasmanova

fbshipit-source-id: f0f563776bb0249f29d30dde5c1e8efa4710cd24
…1347)

Summary:
Pull Request resolved: #1347

SimpleVector uses a string-ly typed map to store metadata.  This seems to be used primarily for storing min/max stats as well as the bias for BiasVectors.

This is inefficient because we end up de/serializing back and forth from strings, and doesn't work for all types (e.g. opaque types).  This causes issues in VectorMaker as it tries to initialize stats when creating vectors, so we can't create vectors of Opaque types.

Since SimpleVector is templated on the underlying type, I add a SimpleVectorStats object, also templated on this type, and pass this to the vector's constructor instead of the map of strings.  This way min/max don't need to be serialized/deserialized, and it works for any type.

Reviewed By: laithsakka

Differential Revision: D35363043

fbshipit-source-id: 674cb6d8c2ac4d47a94c9ac9a2138fae4d618c86
Summary:
Add a mechanism to register a listener to receive runtime statistics about
expression evaluation at destruction of ExprSet. This is useful for logging and
debugging. An earlier PR #1404 introduced a similar mechanism to receive
runtime statistics about Task execution.

Pull Request resolved: #1428

Reviewed By: pedroerp

Differential Revision: D35656774

Pulled By: mbasmanova

fbshipit-source-id: 6f18a75d622b44800ec6d8a28984d58cd592ac3f
…1415)

Summary:
Pull Request resolved: #1415

This part is step one towards deprecating old writers.
The following diff will replace ArrayWriterT and switch out_type<Array> meaning, the same for all other types.

Reviewed By: mbasmanova

Differential Revision: D35625851

fbshipit-source-id: 29b01bdbce75fa0f694d17495240981ad380b9c7
Summary: Upgrade libbpf to latest version in getdeps.

Reviewed By: sharmafb

Differential Revision: D35595843

fbshipit-source-id: 0baa7ad9ee031214990d25034bc71c18e69c564f
Reviewed By: YiqiuLiu

Differential Revision: D35679075

fbshipit-source-id: c778564a4a3d73b031222907119913481a4f57a1
Summary:
Pull Request resolved: #1432

Fulfilling promises under a lock runs a risk of dealocking in case when the promise callback also uses some locks and is run in the same executor.
The common practice is to move out promises from class members into the local variables unders the lock and then fullfill promises, using local variables, outside of the lock.

Reviewed By: tanjialiang

Differential Revision: D35682204

fbshipit-source-id: 4e3c4519c5d07d75df1ae1b6028fa1f58ae11826
Summary:
GitHub commits:

facebook/folly@851645e

Reviewed By: wittgenst

fbshipit-source-id: d6a3288e301a9afab5b2a99a9c1d3e802136a289
Summary:
Decom ZMQ from OSS build
{gif:p0tv3fvs}

Reviewed By: shih-hao-tseng

Differential Revision: D35223763

fbshipit-source-id: 11e559c1a26ae0b7d651fc60ddcd9c2f9ff1dc4c
Summary:
GitHub commits:

facebook/folly@39e5f2f

Reviewed By: wittgenst

fbshipit-source-id: 7a698a77f9e794f7e6b4020cddb048ccc1dd90e5
Summary:
GitHub commits:

facebook/folly@0534212

Reviewed By: wittgenst

fbshipit-source-id: 916b998b99413f61fa67ecd72144dae6eec31d45
Summary:
GitHub commits:

facebook/folly@54b24d0
facebook/mvfst@744ae1f

Reviewed By: wittgenst

fbshipit-source-id: 1e5204362eff2f019c14b2d21b230f3b431c7ee2
Summary:
GitHub commits:

facebook/folly@b7a60b2

Reviewed By: yns88

fbshipit-source-id: 4c469d1869e7d5b3a8212073cef22ce6a71d82d3
Summary:
Pull Request resolved: #1433

This method is being called under a lock.

Reviewed By: tanjialiang

Differential Revision: D35699012

fbshipit-source-id: 825fd3c44919cc7099a569051727a6f0b52d09b9
Summary:
Pull Request resolved: #1417

- This diff adds the support of type access and cast to generics.
- It introduces 4 functions that can be called on the GenericView:
   - type()
   - kind()
   -  castTo<T>: performs an unchecked cast and returns arg_type<T>. A safety debug time type
      check will happen.
   -  tryCastTo<T>: return std::optional<returns arg_type<T>>, performs  unchecked cast. return
      std::null opt if T does not match the type of the vector.

- **Cost**:
   - The first time we do the cast we create the readers corresponding to that type. Then for the
coming rows, the cost is a couple of instructions; checking reader is created,  accessing the reader
in the variant and returning the element at the row index. In some non-common cases there is additional
check that the type casted to is consistent across rows.
  - TryCastTo is more expensive, since it does a type check as well.
  - In general its not expensive to use it with complex types, but avoid using it with primitives by either
implementing a function specialized when input is primitive (see == function as example ).  Or casting to
complex types already specialized with primitive types, i.e. Array<int> Array<double> instead of Array<Any>
 and then casting Any to int for every element.

- **What can be casted to?**
This diff enabled cast to all basic types plus Array<Any> Map<Any, Any>, Row<Any>, Row<Any, Any> and
Row<Any, Any,..etc> up to 5.. This allow to recursively traverse complex types.

- **How to add a new casted to type?**
  - Any type except Generic<> it self, (Cast to self type) or Variadic<...>.

- The diff adds example function HasDuplicate, which checks if an array has duplicate items.

Reviewed By: kevinwilfong

Differential Revision: D35616634

fbshipit-source-id: 847af1dfc3af9b49ce32386b05bef585be955a81
Sergey Pershin and others added 28 commits May 11, 2022 09:46
Summary:
Pull Request resolved: #1571

X-link: facebookexternal/presto_cpp#739

Adding support of constant partitioning columns to HashPartitionFunction and HivePartitionFunction.
Few other cosmetic changes.

Reviewed By: mbasmanova

Differential Revision: D36273727

fbshipit-source-id: 12f6a92e638d6f013a4f6b088a9f78902c9f7398
Summary:
Pull Request resolved: #1578

Add a PlanBuilder.tableScane helper function to reduce the amount of
boilerplace when pulling data out of TPC-H connector.

Reviewed By: kagamiori

Differential Revision: D36303299

fbshipit-source-id: d5898559a94c97850a38711a90202caf24e8fca6
Summary:
GitHub commits:

facebook/fbthrift@199e05a
facebook/folly@a95ef47
facebook/litho@3a3cbb8
facebookresearch/FLSim@170043f

Reviewed By: wittgenst

fbshipit-source-id: 1298c60ab21d870a28b70c967f5119e5cad5cf85
Summary: Pull Request resolved: #1575

Reviewed By: kgpai

Differential Revision: D36282667

fbshipit-source-id: 47b8a98ae4253efd6011421928cc3bf0ae7dae11
…ctionAdapter (#1576)

Summary:
Pull Request resolved: #1576

This diff adds a reader optimized for reading primitive values that are encoded in
ConstantVectors or FlatVectors.

It avoids the expensive decoding done in DecodedVectors and branches needed to
access data in those vectors by recognizing that data in FlatVectors can be accessed
in rawValues/rawNulls directly using the index from the caller, and using index 0 in the
case of ConstantVectors.  This is equivalent to multiplying the index provided by the
caller by 1 or 0 respectively.

To keep the explosion factor of template functions in the SimpleFunctionAdapter down
from 3^n if we were to use a separate ConstantVectorReader and FlatVectorReader, or
2^n if we were to use a combined ConstantFlatVectorReader, the optimization is only
applied if all primitives are encoded using Constant or Flat Vectors, keeping the factor
to a constant 2.  This is a fairly common case with nested functions given that
SimpleFunctions always produce FlatVectors, and literals are always ConstantVectors.

Reviewed By: mbasmanova

Differential Revision: D36289776

fbshipit-source-id: cf99f9d3ae6710701cc8b33ea872effa7794e64b
Summary:
GitHub commits:

facebook/fbthrift@b77ce24
facebook/folly@43b2bd8
facebook/litho@e637f74

Reviewed By: wittgenst

fbshipit-source-id: 47381ea11f709f241c831e427b9251c75e513004
Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
Summary:
Updates QueryCtx.h to pass Mac CI build.

Pull Request resolved: #1532

Reviewed By: kgpai

Differential Revision: D36186319

Pulled By: oerling

fbshipit-source-id: 427f0b5d128082cbe98bde692708df6705276f68
Summary:
Pull Request resolved: #1586

X-link: facebookexternal/presto_cpp#743

Adding unit tests for HivePartitionFunction and VectorHasher.
Using typedef for some shared pointers.
Fixing bug in VectorHasher, when we used zero for precomputed hash for null values, should use 'kNullHash'.

Reviewed By: mbasmanova

Differential Revision: D36335387

fbshipit-source-id: 71aac0f5bdf4feb26bcc99d42968e38edb449b94
Summary:
X-link: facebookexternal/presto_cpp#723

Split VectorUdfTypeSystem.h into :
1. UdfTypeResolver.h
2. StringWriter.h
3. VectorReaders.h
4. VectorWriters.h

Why?
1. I need to access the resolver from ComplexViewTypes.h,
which would create a cycle.
2. Better organization and reduce header bloating.

Reviewed By: kevinwilfong

Differential Revision: D35908921

fbshipit-source-id: d60e93a7311b0772f056a80e49152ae19e282f44
Summary:
Pull Request resolved: #1485

The copy_from utilizes the vector copy_from which uses
memcpy when possible. And hence is faster than copying
element by element.

Reviewed By: kevinwilfong

Differential Revision: D35952258

fbshipit-source-id: d64dc783dc58f01add49afcb42d6b467edec160b
Summary:
Pull Request resolved: #1494

title

Reviewed By: kevinwilfong

Differential Revision: D35981026

fbshipit-source-id: dac1cdb6d24d07f977a8d148f5014be99879e5a0
…g. (#1495)

Summary:
Pull Request resolved: #1495

since vector writers increase vectors sizes exponentially, finish is called at the end of all rows processing to downsize to the actual needed size.

finish was not called recursively leaving
some unused vector space allocated.

Reviewed By: kevinwilfong

Differential Revision: D35986157

fbshipit-source-id: 00d254b6c1d17fc1d9c306eef0c59d4a48fe2245
Summary:
Pull Request resolved: #1496

title.

Reviewed By: kevinwilfong

Differential Revision: D35987071

fbshipit-source-id: 57f06c7d9c951e944ad009e4552aa0bf5aa7623e
Summary:
Pull Request resolved: #1507

- rename VectorOptionalValueAccessor to OptionalAccessor.
- template OptionalAccessor on the Velox type T instead of VectorReader<T>.

Reviewed By: kevinwilfong

Differential Revision: D36018419

fbshipit-source-id: 4b3400056ff9ce2b46e7db09ea47b74e5b1d8944
Summary: Pull Request resolved: #1593

Reviewed By: Yuhta

Differential Revision: D36344603

Pulled By: mbasmanova

fbshipit-source-id: 8dcafa0f5e4de93b45b5f546f8c9d79baf12ae1b
Summary: Pull Request resolved: #1591

Reviewed By: Yuhta

Differential Revision: D36343988

Pulled By: mbasmanova

fbshipit-source-id: 1cc5077dc71157546d7206c0e7f340f2ca63bad9
Summary:
Fix double counting for common sub-expressions (CTEs) when aggregating stats for
ExprSetListener.

Pull Request resolved: #1594

Reviewed By: Yuhta

Differential Revision: D36345242

Pulled By: mbasmanova

fbshipit-source-id: 7d635612f1d447b01de24fd916b1236f1189cad2
Summary:
Add support for more types in `HivePartitionFunction`:
- `TINYINT`
- `SMALLINT`
- `INTEGER`
- `REAL`
- `DOUBLE`
- `VARBINARY`
- `TIMESTAMP`
- `DATE`

Fixes #327

Pull Request resolved: #1466

Reviewed By: Yuhta

Differential Revision: D36345326

Pulled By: mbasmanova

fbshipit-source-id: a32b76e8608cac3a8044ef0d468a2c479f62037d
Summary: Pull Request resolved: #1597

Reviewed By: Yuhta

Differential Revision: D36347521

Pulled By: mbasmanova

fbshipit-source-id: de5d58f0963f7f8acba01081aae05cb550b56a18
Summary:
Do not hard-code "0" for plan node ID used to add splits to. Instead, check that
plan has a single leaf node and use it's ID.

Use PlanNodePtr and RowTypePtr for readability.

Refactor HiveConnectorTestBase to reduce copy-paste.

Pull Request resolved: #1596

Reviewed By: Yuhta

Differential Revision: D36347173

Pulled By: mbasmanova

fbshipit-source-id: 201f9a6f76472237548759efecb6cab6e3078a41
Summary:
Similar to #1574

It is useful to know the average batch / vector size used in expression
evaluation. We already collect total number of processed rows. This change is
to start collecting total number of processed batches / vectors. This allows to
compute the average batch size as num rows / num batches.

Pull Request resolved: #1595

Reviewed By: kagamiori

Differential Revision: D36345658

Pulled By: mbasmanova

fbshipit-source-id: 11ee568830b5e4925ffbb17934b819dc3620eda3
Summary:
GitHub commits:

facebook/folly@ff8474e
a4fe613
facebookresearch/beanmachine@9682522

Reviewed By: wittgenst

fbshipit-source-id: fba0199c03071fcb38d9d08ae949ee79a0ebe04a
Summary: Pull Request resolved: #1587

Reviewed By: laithsakka, kgpai

Differential Revision: D36361879

Pulled By: mbasmanova

fbshipit-source-id: 19c2a4e4f256995d7d3ad434e9f47cc18f7303af
@Arty-Maly Arty-Maly merged commit 0e4b9d3 into Arty-Maly:main May 13, 2022
Arty-Maly pushed a commit that referenced this pull request May 13, 2022
…tor#1500)

Summary:
Enhance printExprWithStats to identify common-sub expressions.

For example, `c0 + c1` is a common sub-expression in
`"(c0 + c1) % 5", " (c0 + c1) % 3"` expression set. It is evaluated only once and
there is a single Expr object that represents it. That object appears in the
expression tree twice. printExprWithStats does not show the runtime stats for
second instance of that expression and instead annotates it with `[CSE https://github.com/facebookincubator/velox/issues/2]`,
where CSE stands for common sub-expression and 2 refers to the first instance
of the expression.

```
mod [cpu time: 50.49us, rows: 1024] -> BIGINT [#1]
   cast(plus as BIGINT) [cpu time: 68.15us, rows: 1024] -> BIGINT [facebookincubator#2]
      plus [cpu time: 51.84us, rows: 1024] -> INTEGER [facebookincubator#3]
         c0 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#4]
         c1 [cpu time: 0ns, rows: 0] -> INTEGER [facebookincubator#5]
   5:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#6]

mod [cpu time: 49.29us, rows: 1024] -> BIGINT [facebookincubator#7]
   cast((plus(c0, c1)) as BIGINT) -> BIGINT [CSE facebookincubator#2]
   3:BIGINT [cpu time: 0ns, rows: 0] -> BIGINT [facebookincubator#8]
```

Pull Request resolved: facebookincubator#1500

Reviewed By: Yuhta

Differential Revision: D35994836

Pulled By: mbasmanova

fbshipit-source-id: 6bacbbe61b68dad97ce2fd5f99610c4ad55897be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.