-
Notifications
You must be signed in to change notification settings - Fork 178
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
12d3d12
commit 902df9c
Showing
1 changed file
with
93 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
|
||
## Arrangement Normal Form (ArrNF) | ||
|
||
Arrangement Normal Form (ArrNF) is a way to represent Materialize and Differential Dataflow computations. | ||
Each term in ArrNF is headed by an optional arrangement-forming operator, and the body of the term forms no other arrangements. | ||
|
||
The intent of ArrNF is to align the presentation of the planned work with the primary costs: arrangement formation and maintenance. | ||
ArrNF is meant to simplify the questions of "where", "what", and potentially "why" a computation requires arrangement resources. | ||
* **Where** does each arrangements occur? | ||
* **What** does each arrangement contain? | ||
* **Why** does each arrangement exist? | ||
|
||
The quality of the answers may vary, but ArrNF means to put their subjects as the subjects of its representation. | ||
|
||
ArrNF describes dataflows, which are comprised of a sequence of basic blocks, which are each an optional arrangement forming operator atop any number of arrangement-free operators. | ||
``` | ||
Dataflow ::= [ Block ]* | ||
Block ::= [ Arr ] Str | ||
Arr ::= ArrangeBy | Reduce | TopK | Threshold | ||
Str ::= Constant | Get | ||
::= [ Map | Filter | Project | FlatMap | Negate] Str | ||
::= Join [ Get ]* | ||
::= Union [ Str ]+ | ||
``` | ||
This grammar leaves various details unexplored, for example how `Get` references specific collections, how we determine whether a term is well-typed, and exactly what sort of behavior various operators have. | ||
Subtlely buried in here is that `Join` must not form arrangements, which means that "multi-way differential" joins are not covered, and they must be disassembled to their intermediate arrangement-forming parts. | ||
|
||
The `Str` term has four productions, corresponding to 0-ary "leaves", 1-ary linear operators, and variadic product (`Join`) and sum (`Union`) operators. | ||
These terms admit further normalization, if we are interested: as all the terms are linear, they can be re-arranged to form | ||
``` | ||
Str ::= Union [ [ Negate ] Unary Leaf ]+ | ||
Unary ::= [ Map | Filter | Project | FlatMap ]* | ||
Leaf ::= Constant | Get | Join [ Get ]+ | ||
``` | ||
Said aloud: all stream terms are `Union`s of (optionally negated) unary operators, applied to either a constant, another collection, or a join of other (arranged) collections. | ||
|
||
### An example | ||
|
||
Consider the following fragment, taken from a recent incident reproduction | ||
``` | ||
cte l4 = | ||
Project (#1..=#19) | ||
Union | ||
Negate | ||
Project (#1..=#20) | ||
Join on=(#0 = #21) type=differential | ||
Get l2 | ||
ArrangeBy keys=[[#0]] | ||
Distinct project=[#0] | ||
Project (#0) | ||
Get l3 | ||
Project (#1..=#20) | ||
Get l1 | ||
Project (#1..=#20) | ||
Get l3 | ||
``` | ||
We can rewrite this in ArrNF as two blocks, with a division forced at the `Distinct` operator. | ||
``` | ||
[l4.tmp0] | ||
Distinct Project(#0) Get l3 | ||
[l4] | ||
Union Negate Project(#2..=#20) Join on=(#0 = #21) Get(l2) l4.tmp0 | ||
Project(#2..=#20) Get l1 | ||
Project(#2..=#20) Get l3 | ||
``` | ||
The same information is present in both plans, but where the first provides nested scoping information about how the `Distinct` is used, the second calls out the moments where resources are used and what data must be captured. | ||
Arguably cosmetic differences like putting sequences of unary operators on one line reduce the cognitive load vs the same information in a tree representation. | ||
|
||
### Reversing out SQL | ||
|
||
One potential benefit of ArrNF is the ability to translate back to SQL. | ||
|
||
Each `Str` term, in the second and more opinionated form, can be reversed out to SQL without much work. | ||
The `Union` and optional `Negate` terms translate to `UNION ALL` and `EXCEPT ALL` (put the negative terms last). | ||
The `Unary Leaf` pair becomes a mostly standard "select-project-join" term, with some caveats around `FlatMap` and whether it fits in SQL at all. | ||
|
||
For example, the fragments above would become the SQL | ||
```sql | ||
CREATE VIEW l4 AS | ||
WITH l4.tmp0 AS SELECT DISTINCT (#0) FROM l3 | ||
-- Definition of l4 using l4.tmp0 | ||
SELECT #2 ..= #20 FROM l1 | ||
UNION ALL | ||
SELECT #2 ..= #20 FROM l3 | ||
EXCEPT ALL | ||
SELECT #2 ..= #20 FROM l2, l4.tmp WHERE #0 = #21; | ||
``` | ||
Although this SQL likely has little to do with the source SQL, it can be understood by most readers. | ||
It is a query that can potentially be run, modulo column referencing syntax, and its results counted and otherwise assessed by the user. | ||
|
||
In principle, one could go as far as rewriting input SQL into ArrNF SQL, as a SQL-centric form of plan pinning. | ||
ArrNF slices larger plans into basic blocks that are sufficiently simple that there is little to no further intra-block action for an optimizer, at least not that influences the arrangement resources used. |