Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Apache Doris support #24714

Merged
merged 12 commits into from
Nov 21, 2023
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ Here are some of the major database solutions that are supported:
<img src="superset-frontend/src/assets/images/teradata.png" alt="teradata" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/yugabyte.png" alt="yugabyte" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/starrocks.png" alt="starrocks" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/doris.png" alt="doris" border="0" width="200" height="80"/>
</p>

**A more comprehensive list of supported databases** along with the configuration instructions can be found [here](https://superset.apache.org/docs/databases/installing-database-drivers).
Expand Down
27 changes: 27 additions & 0 deletions docs/docs/databases/doris.mdx
rusackas marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Apache Doris
hide_title: true
sidebar_position: 5
version: 1
---

## Doris

The [sqlalchemy-doris](https://pypi.org/project/pydoris/) library is the recommended
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [sqlalchemy-doris](https://pypi.org/project/pydoris/) library is the recommended
The [sqlalchemy-doris](https://pypi.org/project/pydoris/) library is the recommended way to connect to Apache Doris through SQLAlchemy.

way to connect to Apache Doris through SQLAlchemy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
way to connect to Apache Doris through SQLAlchemy.


You'll need the following setting values to form the connection string:

- **User**: User Name
- **Password**: Password
- **Host**: Doris FE Host
- **Port**: Doris FE port
- **Catalog**: Catalog Name
- **Database**: Database Name


Here's what the connection string looks like:

```
doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>
```
1 change: 1 addition & 0 deletions docs/docs/databases/installing-database-drivers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Some of the recommended packages are shown below. Please refer to [setup.py](htt
| [Trino](/docs/databases/trino) | `pip install trino` | `trino://{username}:{password}@{hostname}:{port}/{catalog}` |
| [Vertica](/docs/databases/vertica) | `pip install sqlalchemy-vertica-python` | `vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
| [YugabyteDB](/docs/databases/yugabytedb) | `pip install psycopg2` | `postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>` |
| [Doris](/docs/databases/doris) | `pip install pydoris` | `doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>` |
---

Note that many other databases are supported, the main criteria being the existence of a functional
Expand Down
5 changes: 5 additions & 0 deletions docs/src/resources/data.js
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,9 @@ export const Databases = [
href: 'https://www.microsoft.com/en-us/sql-server',
imgName: 'msql.png',
},
{
title: 'Apache Doris',
href: 'https://doris.apache.org/',
imgName: 'doris.png',
},
];
Binary file added docs/static/img/databases/doris.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ def get_git_sha() -> str:
"vertica": ["sqlalchemy-vertica-python>=0.5.9, < 0.6"],
"netezza": ["nzalchemy>=11.0.2"],
"starrocks": ["starrocks>=1.0.0"],
"doris": ["pydoris>=1.0.0"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include an upper bound of 2.0.0 to prevent potenitial breaking changes when the dependencies are bumped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will modify it, thanks for your suggestion

},
python_requires="~=3.9",
author="Apache Software Foundation",
Expand Down
Binary file added superset-frontend/src/assets/images/doris.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
285 changes: 285 additions & 0 deletions superset/db_engine_specs/doris.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import logging
import re
from re import Pattern
from typing import Any, Optional
from urllib import parse
from flask_babel import gettext as __
from sqlalchemy import Float, Integer, Numeric, types, String, TEXT
from sqlalchemy.engine.url import URL
from sqlalchemy.sql.type_api import TypeEngine
from superset.db_engine_specs.mysql import MySQLEngineSpec
from superset.errors import SupersetErrorType
from superset.utils.core import GenericDataType


# Regular expressions to catch custom errors
CONNECTION_ACCESS_DENIED_REGEX = re.compile(
"Access denied for user '(?P<username>.*?)'"
)
CONNECTION_INVALID_HOSTNAME_REGEX = re.compile(
"Unknown Doris server host '(?P<hostname>.*?)'"
)
CONNECTION_UNKNOWN_DATABASE_REGEX = re.compile(
"Unknown database '(?P<database>.*?)'"
)
CONNECTION_HOST_DOWN_REGEX = re.compile(
"Can't connect to Doris server on '(?P<hostname>.*?)'"
)
SYNTAX_ERROR_REGEX = re.compile(
"check the manual that corresponds to your MySQL server "
"version for the right syntax to use near '(?P<server_error>.*)"
)

logger = logging.getLogger(__name__)


class TINYINT(Integer):
__visit_name__ = "TINYINT"


class LARGEINT(Integer):
__visit_name__ = "LARGEINT"


class DOUBLE(Float):
__visit_name__ = "DOUBLE"


class HLL(Numeric):
__visit_name__ = "HLL"


class BITMAP(Numeric):
__visit_name__ = "BITMAP"


class QUANTILE_STATE(Numeric):
__visit_name__ = "QUANTILE_STATE"

class AGG_STATE(Numeric):
__visit_name__ = "AGG_STATE"

class ARRAY(TypeEngine): # pylint: disable=no-init
__visit_name__ = "ARRAY"

@property
def python_type(self) -> Optional[type[list[Any]]]:
return list


class MAP(TypeEngine): # pylint: disable=no-init
__visit_name__ = "MAP"

@property
def python_type(self) -> Optional[type[dict[Any, Any]]]:
return dict


class STRUCT(TypeEngine): # pylint: disable=no-init
__visit_name__ = "STRUCT"

@property
def python_type(self) -> Optional[type[Any]]:
return None


class DorisEngineSpec(MySQLEngineSpec):
engine = "pydoris"
engine_aliases = "doris"
engine_name = "Apache Doris"
max_column_name_length = 64
default_driver = "pydoris"
sqlalchemy_uri_placeholder = (
"doris://user:password@host:port/catalog.db[?key=value&key=value...]"
)
encryption_parameters = {"ssl": "0"}
supports_dynamic_schema = True

column_type_mappings = ( # type: ignore
(
re.compile(r"^tinyint", re.IGNORECASE),
TINYINT(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^largeint", re.IGNORECASE),
LARGEINT(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^decimal.*", re.IGNORECASE),
types.DECIMAL(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^double", re.IGNORECASE),
DOUBLE(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^varchar(\((\d+)\))*$", re.IGNORECASE),
types.VARCHAR(),
GenericDataType.STRING,
),
(
re.compile(r"^char(\((\d+)\))*$", re.IGNORECASE),
types.CHAR(),
GenericDataType.STRING,
),
(
re.compile(r"^json.*", re.IGNORECASE),
types.JSON(),
GenericDataType.STRING,
),
(
re.compile(r"^binary.*", re.IGNORECASE),
types.BINARY(),
GenericDataType.STRING,
),
(
re.compile(r"^quantile_state", re.IGNORECASE),
QUANTILE_STATE(),
GenericDataType.STRING,
),
(
re.compile(r"^agg_state.*", re.IGNORECASE),
AGG_STATE(),
GenericDataType.STRING,
),
(
re.compile(r"^hll", re.IGNORECASE),
HLL(),
GenericDataType.STRING
),
(
re.compile(r"^bitmap", re.IGNORECASE),
BITMAP(),
GenericDataType.STRING,
),
(
re.compile(r"^array.*", re.IGNORECASE),
ARRAY(),
GenericDataType.STRING,
),
(
re.compile(r"^map.*", re.IGNORECASE),
MAP(),
GenericDataType.STRING,
),
(
re.compile(r"^struct.*", re.IGNORECASE),
STRUCT(),
GenericDataType.STRING,
),
(
re.compile(r"^datetime.*", re.IGNORECASE),
types.DATETIME(),
GenericDataType.STRING,
),
(
re.compile(r"^date.*", re.IGNORECASE),
types.DATE(),
GenericDataType.STRING,
),
(
re.compile(r"^text.*", re.IGNORECASE),
TEXT(),
GenericDataType.STRING,
),
(
re.compile(r"^string.*", re.IGNORECASE),
String(),
GenericDataType.STRING,
),

rusackas marked this conversation as resolved.
Show resolved Hide resolved


)


custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = {
CONNECTION_ACCESS_DENIED_REGEX: (
__('Either the username "%(username)s" or the password is incorrect.'),
SupersetErrorType.CONNECTION_ACCESS_DENIED_ERROR,
{"invalid": ["username", "password"]},
),
CONNECTION_INVALID_HOSTNAME_REGEX: (
__('Unknown Doris server host "%(hostname)s".'),
SupersetErrorType.CONNECTION_INVALID_HOSTNAME_ERROR,
{"invalid": ["host"]},
),
CONNECTION_HOST_DOWN_REGEX: (
__('The host "%(hostname)s" might be down and can\'t be reached.'),
SupersetErrorType.CONNECTION_HOST_DOWN_ERROR,
{"invalid": ["host", "port"]},
),
CONNECTION_UNKNOWN_DATABASE_REGEX: (
__('Unable to connect to database "%(database)s".'),
SupersetErrorType.CONNECTION_UNKNOWN_DATABASE_ERROR,
{"invalid": ["database"]},
),
SYNTAX_ERROR_REGEX: (
__(
'Please check your query for syntax errors near "%(server_error)s". '
"Then, try running your query again."
),
SupersetErrorType.SYNTAX_ERROR,
{},
),
}

@classmethod
def adjust_engine_params(
cls,
uri: URL,
connect_args: dict[str, Any],
catalog: Optional[str] = None,
schema: Optional[str] = None,
) -> tuple[URL, dict[str, Any]]:
database = uri.database
if schema and database:
schema = parse.quote(schema, safe="")
if "." in database:
database = database.split(".")[0] + "." + schema
else:
database = "internal." + schema
uri = uri.set(database=database)

return uri, connect_args

@classmethod
def get_schema_from_engine_params(
cls,
sqlalchemy_uri: URL,
connect_args: dict[str, Any],
) -> Optional[str]:
"""
Return the configured schema.

For doris the SQLAlchemy URI looks like this:

doris://localhost:9030/catalog.database

"""
database = sqlalchemy_uri.database.strip("/")

if "." not in database:
return None

return parse.unquote(database.split(".")[1])
Loading
Loading