Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Apache Drill #6610

Merged
merged 7 commits into from
May 29, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,12 @@ Here's a list of some of the recommended packages.
| Pinot | ``pip install pinotdb`` | ``pinot+http://controller:5436/`` |
| | | ``query?server=http://controller:5983/`` |
+---------------+-------------------------------------+-------------------------------------------------+
| Apache Drill | | For the REST API:`` |
| | | ``drill+sadrill://`` |
| | | For JDBC |
| | | ``drill+jdbc://`` |
+---------------+-------------------------------------+-------------------------------------------------+


Note that many other databases are supported, the main criteria being the
existence of a functional SqlAlchemy dialect and Python driver. Googling
Expand Down Expand Up @@ -449,6 +455,31 @@ Required environment variables: ::

See `Teradata SQLAlchemy <https://github.com/Teradata/sqlalchemy-teradata>`_.

Apache Drill
---------
At the time of writing, the SQLAlchemy Dialect is not available on pypi and must be downloaded here:
`SQLAlchemy Drill <https://github.com/JohnOmernik/sqlalchemy-drill>`_

Alternatively, you can install it completely from the command line as follows: ::

git clone https://github.com/JohnOmernik/sqlalchemy-drill
cd sqlalchemy-drill
python3 setup.py install

Once that is done, you can connect to Drill in two ways, either via the REST interface or by JDBC. If you are connecting via JDBC, you must have the
Drill JDBC Driver installed.

The basic connection string for Drill looks like this ::

drill+sadrill://{username}:{password}@{host}:{port}/{storage_plugin}?use_ssl=True

If you are using JDBC to connect to Drill, the connection string looks like this: ::

drill+jdbc://{username}:{password}@{host}:{port}/{storage_plugin}

For a complete tutorial about how to use Apache Drill with Superset, see this tutorial:
`Visualize Anything with Superset and Drill <http://thedataist.com/visualize-anything-with-superset-and-drill/>`_

Caching
-------

Expand Down
60 changes: 60 additions & 0 deletions superset/db_engine_specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -724,6 +724,66 @@ def get_table_names(cls, inspector, schema):
return sorted(inspector.get_table_names())


class DrillEngineSpec(BaseEngineSpec):
"""Engine spec for Apache Drill"""
engine = 'drill'

time_grain_functions = {
None: '{col}',
'PT1S': "nearestDate({col}, 'SECOND')",
'PT1M': "nearestDate({col}, 'MINUTE')",
'PT15M': "nearestDate({col}, 'QUARTER_HOUR')",
'PT0.5H': "nearestDate({col}, 'HALF_HOUR')",
'PT1H': "nearestDate({col}, 'HOUR')",
'P1D': 'TO_DATE({col})',
'P1W': "nearestDate({col}, 'WEEK_SUNDAY')",
'P1M': "nearestDate({col}, 'MONTH')",
'P0.25Y': "nearestDate({col}, 'QUARTER')",
'P1Y': "nearestDate({col}, 'YEAR')",
}

# Returns a function to convert a Unix timestamp in milliseconds to a date
@classmethod
def epoch_to_dttm(cls):
return 'TO_DATE({col})'

@classmethod
def convert_dttm(cls, target_type, dttm):
tt = target_type.upper()
if tt == 'DATE':
return "CAST('{}' AS DATE)".format(dttm.isoformat()[:10])
elif tt == 'TIMESTAMP':
return "CAST('{}' AS TIMESTAMP)".format(
dttm.strftime('%Y-%m-%d %H:%M:%S'))
return "'{}'".format(dttm.strftime('%Y-%m-%d %H:%M:%S'))

@classmethod
def adjust_database_uri(cls, uri, selected_schema):
database = uri.database
if '/' in uri.database:
database = uri.database.split('/')[0]
if selected_schema:
uri.database = database + '/' + selected_schema
return uri

@classmethod
def select_star(cls, my_db, table_name, engine, schema=None, limit=100,
show_cols=False, indent=True, latest_partition=True,
cols=None):

return super().select_star(my_db, table_name, engine,
schema, limit,
show_cols, indent, latest_partition, cols)

@classmethod
def get_table_names(cls, inspector, schema):
return sorted(inspector.get_table_names(schema))

@classmethod
def get_view_names(cls, inspector, schema):
return sorted(inspector.get_view_names(schema))


class MySQLEngineSpec(BaseEngineSpec):
engine = 'mysql'
max_column_name_length = 64
Expand Down