Skip to content

Commit

Permalink
Add documentation for file-based metastore
Browse files Browse the repository at this point in the history
  • Loading branch information
steveburnett committed Feb 7, 2025
1 parent a6d82fe commit 35be659
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 0 deletions.
1 change: 1 addition & 0 deletions presto-docs/src/main/sphinx/connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ from different data sources.
connector/deltalake
connector/druid
connector/elasticsearch
connector/file-based-metastore
connector/googlesheets
connector/hana
connector/hive
Expand Down
78 changes: 78 additions & 0 deletions presto-docs/src/main/sphinx/connector/file-based-metastore.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
====================
File-Based Metastore
====================

.. contents::
:local:
:backlinks: none
:depth: 1

Overview
^^^^^^^^

For testing or developing purposes, Presto can be configured to use a filesystem
directory as a Hive Metastore. This can be a directory on the local filesystem
or a non-local file system such as Amazon S3.

The file-based metastore works only with the following connectors:

* :doc:`/connector/deltalake`
* :doc:`/connector/hive`
* :doc:`/connector/hudi`
* :doc:`/connector/iceberg`

Configuring a File-Based Metastore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. In ``etc/catalog/``, find the catalog properties file for the supported
connector.

2. In the catalog properties file, set the following properties:

.. code-block:: none
hive.metastore=file
hive.metastore.catalog.dir=file:///<catalog-dir>
Replace ``<catalog-dir>`` in the example with the path to a directory on an
accessible filesystem.

Using a File-Based Warehouse
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Create a schema

.. code-block:: none
CREATE SCHEMA hive.warehouse;
This query creates a folder as ``/data/hive_data/warehouse``.

Create a table with any connector-supported file formats. For example, if the
Hive connector is being configured:

.. code-block:: none
CREATE TABLE hive.warehouse.orders_csv("order_name" varchar, "quantity" varchar) WITH (format = 'CSV');
CREATE TABLE hive.warehouse.orders_parquet("order_name" varchar, "quantity" int) WITH (format = 'PARQUET');
These queries create folders as ``/data/hive_data/warehouse/orders_csv`` and
``/data/hive_data/warehouse/orders_parquet``. Users can insert and query
from these tables.

Reading Existing Data Files with a File-based Metastore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To read existing data files, the metastore needs to know the file schema. File
formats such as Parquet that contain the schema need no additional work, but
for other file formats such as CSV, the user must either

* manually specify the schema as shown in the example above
* provide ``.prestoSchema`` and ``.prestoPermissions`` files

Once the table is created with the required schema, users can move existing
data files to the table folder.

For example, a CSV file ``orders.csv`` with contents ``books, 100`` can be
moved to ``/data/hive_data/warehouse/orders_csv`` and can be queried with Presto.

2 changes: 2 additions & 0 deletions presto-docs/src/main/sphinx/installation/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Deploying Presto
:backlinks: none
:depth: 1

.. _Installing Presto:

Installing Presto
-----------------

Expand Down

0 comments on commit 35be659

Please sign in to comment.