From 12d69d2f2bdcc4ceee141e92211b8a0738fe945c Mon Sep 17 00:00:00 2001 From: Emily Chan Date: Wed, 13 Nov 2024 16:25:04 -0800 Subject: [PATCH] Update TPCH documentation to include schemas, table names, row counts --- .../src/main/sphinx/connector/tpch.rst | 39 ++++++++++++++++--- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/presto-docs/src/main/sphinx/connector/tpch.rst b/presto-docs/src/main/sphinx/connector/tpch.rst index 6f0a346715984..28f40897b1fd3 100644 --- a/presto-docs/src/main/sphinx/connector/tpch.rst +++ b/presto-docs/src/main/sphinx/connector/tpch.rst @@ -42,7 +42,7 @@ The TPCH connector supplies several schemas:: sf3000 sf30000 tiny - (11 rows) + (10 rows) Ignore the standard schema ``information_schema`` which exists in every catalog and is not directly provided by the TPCH connector. @@ -51,10 +51,39 @@ Every TPCH schema provides the same set of tables. Some tables are identical in all schemas. Other tables vary based on the *scale factor* which is determined based on the schema name. For example, the schema ``sf1`` corresponds to scale factor ``1`` and the schema ``sf300`` -corresponds to scale factor ``300``. The TPCH connector provides an -infinite number of schemas for any scale factor, not just the few common -ones listed by ``SHOW SCHEMAS``. The ``tiny`` schema is an alias for scale -factor ``0.01``, which is a very small data set useful for testing. +corresponds to scale factor ``300``. The scale factor represents the approximate size, +in bytes, of the entire set of tables when stored uncompressed. For example, +``sf1`` implies that writing all tables to disk uncompressed would require approximately 1GB. +The TPCH connector provides an infinite number of schemas +for any scale factor which includes floating-point values, +not just the few common ones listed by ``SHOW SCHEMAS``. +The ``tiny`` schema is an alias for scale factor ``0.01``, +which is a very small data set useful for testing. + +For more information, review the `TPCH Specification document `_. +In section 1.2 of the document, the TPCH schemas are provided. + +Schema Scale Factors and Corresponding Table Row Counts +------------------------------------------------------- +Example query to return row counts from schema ``sf1`` and table ``customer``: + +.. code-block:: sql + + SELECT COUNT(*) FROM tpch.sf1.customer; + +=============== ========== ========== =========== ============ ============= ============= ============ ============= ============= +Schema ``tiny`` ``sf1`` ``sf100`` ``sf1000`` ``sf10000`` ``sf100000`` ``sf300`` ``sf3000`` ``sf30000`` +Table Name +=============== ========== ========== =========== ============ ============= ============= ============ ============= ============= +``customer`` 1.5K 150K 15M 150M 1.5B 15B 45M 450M 4.5B +``lineitem`` 60K 6M 600M 6B 60B 600B 1.8B 18B 180B +``nation`` 25 25 25 25 25 25 25 25 25 +``orders`` 15K 1.5M 150M 1.5B 15B 150B 450M 4.5B 45B +``part`` 2K 200K 20M 200M 2B 20B 60M 600M 6B +``partsupp`` 8K 800K 80M 800M 8B 80B 240M 2.4B 24B +``region`` 5 5 5 5 5 5 5 5 5 +``supplier`` 100 10K 1M 10M 100M 1B 3M 30M 300M +=============== ========== ========== =========== ============ ============= ============= ============ ============= ============= General Configuration Properties ---------------------------------