Skip to content

Commit

Permalink
Update TPCH documentation to include schemas, table names, row counts
Browse files Browse the repository at this point in the history
  • Loading branch information
emily-chan authored and steveburnett committed Nov 20, 2024
1 parent 8c0b38b commit 12d69d2
Showing 1 changed file with 34 additions and 5 deletions.
39 changes: 34 additions & 5 deletions presto-docs/src/main/sphinx/connector/tpch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The TPCH connector supplies several schemas::
sf3000
sf30000
tiny
(11 rows)
(10 rows)
Ignore the standard schema ``information_schema`` which exists in every
catalog and is not directly provided by the TPCH connector.
Expand All @@ -51,10 +51,39 @@ Every TPCH schema provides the same set of tables. Some tables are
identical in all schemas. Other tables vary based on the *scale factor*
which is determined based on the schema name. For example, the schema
``sf1`` corresponds to scale factor ``1`` and the schema ``sf300``
corresponds to scale factor ``300``. The TPCH connector provides an
infinite number of schemas for any scale factor, not just the few common
ones listed by ``SHOW SCHEMAS``. The ``tiny`` schema is an alias for scale
factor ``0.01``, which is a very small data set useful for testing.
corresponds to scale factor ``300``. The scale factor represents the approximate size,
in bytes, of the entire set of tables when stored uncompressed. For example,
``sf1`` implies that writing all tables to disk uncompressed would require approximately 1GB.
The TPCH connector provides an infinite number of schemas
for any scale factor which includes floating-point values,
not just the few common ones listed by ``SHOW SCHEMAS``.
The ``tiny`` schema is an alias for scale factor ``0.01``,
which is a very small data set useful for testing.

For more information, review the `TPCH Specification document <https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf>`_.
In section 1.2 of the document, the TPCH schemas are provided.

Schema Scale Factors and Corresponding Table Row Counts
-------------------------------------------------------
Example query to return row counts from schema ``sf1`` and table ``customer``:

.. code-block:: sql
SELECT COUNT(*) FROM tpch.sf1.customer;
=============== ========== ========== =========== ============ ============= ============= ============ ============= =============
Schema ``tiny`` ``sf1`` ``sf100`` ``sf1000`` ``sf10000`` ``sf100000`` ``sf300`` ``sf3000`` ``sf30000``
Table Name
=============== ========== ========== =========== ============ ============= ============= ============ ============= =============
``customer`` 1.5K 150K 15M 150M 1.5B 15B 45M 450M 4.5B
``lineitem`` 60K 6M 600M 6B 60B 600B 1.8B 18B 180B
``nation`` 25 25 25 25 25 25 25 25 25
``orders`` 15K 1.5M 150M 1.5B 15B 150B 450M 4.5B 45B
``part`` 2K 200K 20M 200M 2B 20B 60M 600M 6B
``partsupp`` 8K 800K 80M 800M 8B 80B 240M 2.4B 24B
``region`` 5 5 5 5 5 5 5 5 5
``supplier`` 100 10K 1M 10M 100M 1B 3M 30M 300M
=============== ========== ========== =========== ============ ============= ============= ============ ============= =============

General Configuration Properties
---------------------------------
Expand Down

0 comments on commit 12d69d2

Please sign in to comment.