Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use a SQLAlchemy to generate an insert statement #2843

Merged

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Jan 28, 2025

Summary by Sourcery

Update insert statements to use SQLAlchemy executables.

Bug Fixes:

  • Fix insert statements to use SQLAlchemy to prevent SQL injection vulnerabilities.

Enhancements:

  • Refactor SQL insert statement generation to use SQLAlchemy.

Tests:

  • Update tests to reflect the changes in insert statement generation.

📚 Documentation preview 📚: https://meltano-sdk--2843.org.readthedocs.build/en/2843/

Summary by Sourcery

Update insert statements to use SQLAlchemy executables.

Bug Fixes:

  • Prevent SQL injection vulnerabilities by using SQLAlchemy to generate insert statements.

Enhancements:

  • Refactor SQL insert statement generation to use SQLAlchemy executables.

Tests:

  • Update tests to reflect the changes in the insert statement generation.

Copy link
Contributor

sourcery-ai bot commented Jan 28, 2025

Reviewer's Guide by Sourcery

This pull request refactors the SQL insert statement generation to use SQLAlchemy, which prevents SQL injection vulnerabilities and improves code maintainability. The changes include modifying the generate_insert_statement method to return an SQLAlchemy Insert object instead of a raw SQL string, and updating tests to reflect this change.

Sequence diagram for SQLAlchemy insert statement generation

sequenceDiagram
    participant Client
    participant SQLSink
    participant SQLAlchemy
    participant Database

    Client->>SQLSink: bulk_insert_records()
    activate SQLSink
    SQLSink->>SQLSink: generate_insert_statement()
    SQLSink->>SQLAlchemy: Create Table object
    SQLAlchemy-->>SQLSink: Return Table
    SQLSink->>SQLAlchemy: Generate Insert statement
    SQLAlchemy-->>SQLSink: Return Insert object
    SQLSink->>Database: Execute Insert
    Database-->>SQLSink: Confirm insertion
    SQLSink-->>Client: Complete
    deactivate SQLSink
Loading

Class diagram showing SQL sink changes

classDiagram
    class SQLSink {
        +generate_insert_statement(full_table_name, schema)
        +bulk_insert_records(records, schema)
        -conform_schema(schema)
        -conform_record(record)
    }

    class SQLConnector {
        +create_engine()
        +quote(name) deprecated
        +parse_full_table_name(full_table_name)
    }

    class SQLAlchemy {
        +Table
        +Column
        +MetaData
        +Insert
    }

    SQLSink --> SQLConnector
    SQLSink --> SQLAlchemy

    note for SQLSink "Now uses SQLAlchemy for insert statements"
    note for SQLConnector "quote() method deprecated"
Loading

Flow diagram of insert statement generation

graph TD
    A[Start] --> B[Get Schema Properties]
    B --> C[Parse Table Name]
    C --> D[Create SQLAlchemy Table]
    D --> E[Create Columns]
    E --> F[Generate Insert Statement]
    F --> G[Return SQLAlchemy Insert]

    style D fill:#f9f,stroke:#333,stroke-width:2px
    style F fill:#f9f,stroke:#333,stroke-width:2px
Loading

File-Level Changes

Change Details Files
Refactor SQL insert statement generation to use SQLAlchemy.
  • Modified generate_insert_statement to return an SQLAlchemy Insert object.
  • Removed the manual construction of the SQL insert statement string.
  • Added logic to create an SQLAlchemy table object based on the schema.
  • Added a deprecation warning for returning a string from generate_insert_statement.
singer_sdk/sinks/sql.py
Update tests to reflect the changes in insert statement generation.
  • Updated tests to assert that generate_insert_statement returns an SQLAlchemy Insert object.
  • Updated tests to compare the rendered SQL string of the SQLAlchemy Insert object.
  • Removed string comparison of the generated SQL statement.
tests/core/sinks/test_sql_sink.py
tests/samples/test_target_sqlite.py
Deprecate the quote method in SQLConnector.
  • Added a deprecation warning to the quote method.
  • Suggested using or overriding FullyQualifiedName instead.
singer_sdk/connectors/sql.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.33%. Comparing base (e0deb8f) to head (06a2f92).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2843      +/-   ##
==========================================
+ Coverage   91.31%   91.33%   +0.01%     
==========================================
  Files          62       62              
  Lines        5206     5205       -1     
  Branches      672      671       -1     
==========================================
  Hits         4754     4754              
  Misses        319      319              
+ Partials      133      132       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Jan 28, 2025

CodSpeed Performance Report

Merging #2843 will not alter performance

Comparing edgarrmondragon/fix/sqlalchemy-insert-statement (06a2f92) with main (57085dd)

Summary

✅ 7 untouched benchmarks

@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/fix/sqlalchemy-insert-statement branch 4 times, most recently from ac6d156 to db06328 Compare January 28, 2025 18:19
@edgarrmondragon
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider mapping JSON schema types to appropriate SQLAlchemy column types instead of assuming all columns are strings. This would provide better type safety and data integrity.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

tests/core/sinks/test_sql_sink.py Show resolved Hide resolved
@edgarrmondragon edgarrmondragon force-pushed the edgarrmondragon/fix/sqlalchemy-insert-statement branch from db06328 to 06a2f92 Compare January 28, 2025 18:24
@edgarrmondragon edgarrmondragon marked this pull request as ready for review January 28, 2025 18:29
@edgarrmondragon edgarrmondragon merged commit a4f663e into main Jan 28, 2025
36 checks passed
@edgarrmondragon edgarrmondragon deleted the edgarrmondragon/fix/sqlalchemy-insert-statement branch January 28, 2025 18:30
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @edgarrmondragon - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider using appropriate SQLAlchemy column types instead of assuming String for all columns. This would provide better type safety and potentially better performance.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +296 to +298
sa.Column(
name, sa.String
) # Assuming all columns are of type String for simplicity # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (performance): Using String type for all columns could cause data type mismatches and performance issues.

Consider mapping the schema types to appropriate SQLAlchemy types based on the conformed_schema['properties'] type definitions.

conformed_schema = self.conform_schema(schema)
property_names = list(conformed_schema["properties"])

_, schema_name, table_name = self.connector.parse_full_table_name(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Need to handle case where schema_name is None.

The Table constructor will fail if schema_name is None. Consider adding a conditional to handle this case.

@edgarrmondragon edgarrmondragon self-assigned this Jan 29, 2025
@edgarrmondragon edgarrmondragon added SQL Support for SQL taps and targets Type/Tap Singer taps labels Jan 29, 2025
@edgarrmondragon edgarrmondragon added this to the v0.44 milestone Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL Support for SQL taps and targets Type/Tap Singer taps
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant