Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib/sql): add SAP HANA db support #3098

Merged
merged 15 commits into from
Oct 6, 2020

Conversation

alespour
Copy link
Contributor

@alespour alespour commented Aug 11, 2020

This PR add SAP HANA db support to Flux.

It uses older v0.14.1 Go driver (Mar 16, 2019), because it is the last version that works with Go 1.12.

Done checklist

  • Test cases written

@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2020

Codecov Report

Merging #3098 into master will increase coverage by 0.01%.
The diff coverage is 54.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3098      +/-   ##
==========================================
+ Coverage   49.58%   49.60%   +0.01%     
==========================================
  Files         341      342       +1     
  Lines       35727    35844     +117     
==========================================
+ Hits        17716    17781      +65     
- Misses      15543    15587      +44     
- Partials     2468     2476       +8     
Impacted Files Coverage Δ
stdlib/sql/to.go 45.41% <40.00%> (+0.12%) ⬆️
stdlib/sql/source_validator.go 75.00% <50.00%> (-1.93%) ⬇️
stdlib/sql/hdb.go 55.23% <55.23%> (ø)
stdlib/sql/from.go 33.65% <100.00%> (+1.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57a7373...1c09486. Read the comment docs.

@alespour alespour marked this pull request as ready for review August 17, 2020 16:02
@nathanielc nathanielc requested review from a team and wolffcm and removed request for a team September 9, 2020 16:29
@alespour alespour force-pushed the feat/sql-sap-hana branch 2 times, most recently from 570db57 to 1afba6a Compare September 16, 2020 11:57
Copy link

@wolffcm wolffcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great as usual, I just had some questions.

} else { // table in user default schema
where = fmt.Sprintf("WHERE TABLE_NAME=UPPER('%s')", table)
}
return fmt.Sprintf(hdbDoIfTableNotExistsTemplate, where, query)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that someone could craft a table name that contains characters like '; ... or something and do something nefarious. example

Do you have any thoughts on this? For some of the places where we use %s I imagine that we could use parameters ? instead?

Copy link
Contributor Author

@alespour alespour Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get ? placeholder to work with the SQL DO script (SQL Error 1287 - identifier must be declared), but I at least changed it to use declared variables for schema and table. I hope it is safer now...

The resulting query now looks like eg.

DO
BEGIN
    DECLARE SCHEMA_NAME NVARCHAR(11) = 'bike_stores';
    DECLARE TABLE_NAME NVARCHAR(5) = 'xcopy';
    DECLARE X_EXISTS INT = 0;
    SELECT COUNT(*) INTO X_EXISTS FROM TABLES WHERE SCHEMA_NAME=UPPER(:SCHEMA_NAME) AND TABLE_NAME=UPPER(:TABLE_NAME);
    IF :X_EXISTS = 0
    THEN
        CREATE TABLE bike_stores.xcopy (NOTE NVARCHAR(5000),...);
    END IF;
END;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better but it still seems like the CREATE TABLE statement allows for an injection.

If I were to do something like sql.to(table: "foo (i INTEGER); DROP TABLE bar; ..."), then there's still a problem, if I'm understanding right.

What if you always quoted the identifiers in the CREATE TABLE statement:

CREATE TABLE "bike_stores"."xcopy" ("NOTE" NVARCHAR(5000),...);

And if any of the identifiers contain a ", you can either escape it, or produce a user error. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved, I hope :) Since by default object names are converted to uppercase by HDB unless quoted, and quotes are used to escape possibly malign SQL code here and not to mark the value as case-sensitive, uppercase is assumed and used for new output table name, which I think is conforming and safe.

Table created like bellow then allows a user to insert data both with any of the following statements:

INSERT INTO bike_stores.orders_copy_q2 ... (note, ...) VALUES (...)
INSERT INTO BIKE_STORES.ORDERS_COPY_Q2 ... (NOTE, ...) VALUES (...).
INSERT INTO "BIKE_STORES"."ORDERS_COPY_Q2" ... ("NOTE", ...) VALUES (...).

The if-not-exist-then-create now looks like this:

BEGIN
    DECLARE SCHEMA_NAME NVARCHAR(11) = 'BIKE_STORES';
    DECLARE TABLE_NAME NVARCHAR(14) = 'ORDERS_COPY_Q2';
    DECLARE X_EXISTS INT = 0;
    SELECT COUNT(*) INTO X_EXISTS FROM TABLES WHERE SCHEMA_NAME=ESCAPE_DOUBLE_QUOTES(:SCHEMA_NAME) AND TABLE_NAME=ESCAPE_DOUBLE_QUOTES(:TABLE_NAME);
    IF :X_EXISTS = 0
    THEN
        CREATE TABLE "BIKE_STORES"."ORDERS_COPY_Q2" ("NOTE" NVARCHAR(5000),...);
    END IF;
END;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better. Thanks for the attention to this.

@wolffcm wolffcm merged commit cef010b into influxdata:master Oct 6, 2020
onelson added a commit that referenced this pull request Dec 22, 2021
This diff closes the opening for SQL injection via column identifiers in
hdb, but highlights the inconsistency between the quote/escape strategy
used for table names and column names. Table names were already being
quoted/escaped, but were also being transformed to uppercase.

It could be argued we should do the same transformation with column
identifiers, but I would advocate _the other way_ (to not transform the
case at all anywhere).

The issue is automatically forcing identifiers to uppercase, then
quoting them, means tables/columns created via DDL in the db engine
effectively become unreachable from Flux when they:

- happen to be quoted (preserving case)
- happen to _not be uppercase_

By adding a perceived convenience with the transformation, we reduce
control and possibly make valid identifiers impossible to
reference.

For the original discussion on the strategy, see:
<#3098>
onelson added a commit that referenced this pull request Dec 22, 2021
This diff closes the opening for SQL injection via column identifiers in
hdb, but highlights the inconsistency between the quote/escape strategy
used for table names and column names. Table names were already being
quoted/escaped, but were also being transformed to uppercase.

It could be argued we should do the same transformation with column
identifiers, but I would advocate _the other way_ (to not transform the
case at all anywhere).

The issue is automatically forcing identifiers to uppercase, then
quoting them, means tables/columns created via DDL in the db engine
effectively become unreachable from Flux when they:

- happen to be quoted (preserving case)
- happen to _not be uppercase_

By adding a perceived convenience with the transformation, we reduce
control and possibly make valid identifiers impossible to
reference.

For the original discussion on the strategy, see:
<#3098>
onelson added a commit that referenced this pull request Dec 27, 2021
This diff closes the opening for SQL injection via column identifiers in
hdb, but highlights the inconsistency between the quote/escape strategy
used for table names and column names. Table names were already being
quoted/escaped, but were also being transformed to uppercase.

It could be argued we should do the same transformation with column
identifiers, but I would advocate _the other way_ (to not transform the
case at all anywhere).

The issue is automatically forcing identifiers to uppercase, then
quoting them, means tables/columns created via DDL in the db engine
effectively become unreachable from Flux when they:

- happen to be quoted (preserving case)
- happen to _not be uppercase_

By adding a perceived convenience with the transformation, we reduce
control and possibly make valid identifiers impossible to
reference.

For the original discussion on the strategy, see:
<#3098>
onelson added a commit that referenced this pull request Jan 11, 2022
This diff closes the opening for SQL injection via column identifiers in
hdb, but highlights the inconsistency between the quote/escape strategy
used for table names and column names. Table names were already being
quoted/escaped, but were also being transformed to uppercase.

It could be argued we should do the same transformation with column
identifiers, but I would advocate _the other way_ (to not transform the
case at all anywhere).

The issue is automatically forcing identifiers to uppercase, then
quoting them, means tables/columns created via DDL in the db engine
effectively become unreachable from Flux when they:

- happen to be quoted (preserving case)
- happen to _not be uppercase_

By adding a perceived convenience with the transformation, we reduce
control and possibly make valid identifiers impossible to
reference.

For the original discussion on the strategy, see:
<#3098>
onelson added a commit that referenced this pull request Jan 12, 2022
* fix(stdlib/sql): quote/escape table and column identifiers

* refactor: update hdb to use quoted column identifiers

This diff closes the opening for SQL injection via column identifiers in
hdb, but highlights the inconsistency between the quote/escape strategy
used for table names and column names. Table names were already being
quoted/escaped, but were also being transformed to uppercase.

It could be argued we should do the same transformation with column
identifiers, but I would advocate _the other way_ (to not transform the
case at all anywhere).

The issue is automatically forcing identifiers to uppercase, then
quoting them, means tables/columns created via DDL in the db engine
effectively become unreachable from Flux when they:

- happen to be quoted (preserving case)
- happen to _not be uppercase_

By adding a perceived convenience with the transformation, we reduce
control and possibly make valid identifiers impossible to
reference.

For the original discussion on the strategy, see:
<#3098>

* fix: resolve issue with identifier casing in hdb

The HDB code path will try to force identifiers to all be UPPER CASE.
The proc for automatically creating the target table (when missing) only
uppercased the table name in certain situations (when specified with a
schema name stem). In the other case, the table name was incorrectly
left as-is.

HDB also handled quoting/escaping column names as a part of the
"translate func" for this driver, so the initial work to handle this
closer to the top of the `to` impl resulted in "stacked quoting" which
caused problems in SQL generation.

If we move all column escaping/quoting to happen during translation, we
can unify things a bit.

* refactor: quote idents closer to where they are templated

The column translate func now escapes column names for DDL, quoteIdent for
insert. Feels silly to do this in two places.

If the translate func returned pairs of (escaped) column name + type, then
we could use the translate func to generate identifiers for both cases.

Currently it returns a string containing both together.

* docs: add remark on ident quoting/escaping being tranlate func's job

* refactor: include quote-reliant column name in DDL

Expand the surface area of the integration tests to include a column
name that has a space in it, requiring it to be quoted.

This change is "double duty" in so much as it also introduces a nullable
column into the mix. This complicates the tables we assert against since
it forces the use of `union` and `debug.opaque` to be able to represent
the table we expect to get back from the database.

* test: update column translate tests to assert quoted DDL

The column translate functions should now be returning quoted column
identifiers.

* fix: quote/escape string literals in SQL DDL

Mitigate risk of SQL injection by escaping interior single quotes when
generating string literals for SQL DDL.

* fix: quote/escape another string literal, add comments

* refactor: mark mysql and postgres quote helpers as private

* chore: favor docker stop instead of docker rm for automatic volume cleanup

* fix: string literals escape ' as ''

* fix: identifiers escape " as ""

* test: add sql injection attempt tests

* fix: escape ` as ``, fixup hdb escaping

* test: comment out vertica injection attempt, possible driver bug

* chore: make fmt

* refactor: tighten up test code with custom assertion helpers

Many of the assertions in the `sql.to` tests contain duplicate blocks
which were originally written in a very verbose style.

Slim the tests down by pulling the repetion out into helper functions.

* refactor: hoist "seed want" values up to top of acceptance tests

* chore: make generate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants