Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parallel exports #52

Merged
merged 13 commits into from
Mar 28, 2019

Conversation

anish749
Copy link
Contributor

@anish749 anish749 commented Mar 8, 2019

First draft for adding support for parallel exports based on an int / long splitting column.

Implements #51

@anish749
Copy link
Contributor Author

anish749 commented Mar 8, 2019

PTAL @labianchin

@anish749 anish749 requested a review from labianchin March 8, 2019 15:27
@codecov
Copy link

codecov bot commented Mar 8, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@24872c7). Click here to learn what that means.
The diff coverage is 92.3%.

@@           Coverage Diff            @@
##             master     #52   +/-   ##
========================================
  Coverage          ?   88.9%           
  Complexity        ?     165           
========================================
  Files             ?      21           
  Lines             ?     622           
  Branches          ?      43           
========================================
  Hits              ?     553           
  Misses            ?      47           
  Partials          ?      22

return Lists.newArrayList(
String.format("SELECT * FROM %s%s%s", this.tableName(), where, limit));

if (parallelism().isPresent() && splitColumn().isPresent()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this if branch into separate method?

final ResultSet
resultSet =
statement.executeQuery(query);
resultSet.first();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[main] INFO com.spotify.dbeam.avro.BeamJdbcAvroSchema - Elapsed time to schema 0.452 seconds
[main] ERROR com.spotify.dbeam.jobs.ExceptionHandling - Failure:
org.postgresql.util.PSQLException: Operation requires a scrollable ResultSet, but this ResultSet is FORWARD_ONLY.
        at org.postgresql.jdbc.PgResultSet.checkScrollable(PgResultSet.java:280)
        at org.postgresql.jdbc.PgResultSet.first(PgResultSet.java:355)
        at com.spotify.dbeam.args.QueryBuilderArgs.findSplitLimits(QueryBuilderArgs.java:178)
        at com.spotify.dbeam.args.QueryBuilderArgs.buildQueries(QueryBuilderArgs.java:139)
        at com.spotify.dbeam.jobs.JdbcAvroJob.prepareExport(JdbcAvroJob.java:93)
        at com.spotify.dbeam.jobs.JdbcAvroJob.runExport(JdbcAvroJob.java:134)
        at com.spotify.dbeam.jobs.JdbcAvroJob.main(JdbcAvroJob.java:142)

Maybe use checkState(resultSet.next(), "Min/Max query returned empty results"); instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.. I'll add a check

@anish749
Copy link
Contributor Author

@labianchin this is now ready for merging. I would update the documentation and add best practices separately.

@labianchin
Copy link
Collaborator

:shipit:

Let's ship this, try a bit and then iterate a bit more on it. We might still be missing metrics and maybe some tests. But given this is a new feature and it does not break existing flows, we can ship and try.

@labianchin labianchin merged commit b43d2b0 into spotify:master Mar 28, 2019
@anish749 anish749 deleted the parallel-exports branch March 29, 2019 22:23
labianchin pushed a commit that referenced this pull request Apr 16, 2019
* add draft support for parallel exports

* separate functions for split queries

* fix doc

* fix logic and add test cases

* add check for -ve parallelism

* refactor metering for parallel exports

* check result set

* rename to query parallelism to not conflict with beam's targetParallelism

* fix record gauge reporting

* change order of e2e test

* fix metering for parallel queries

* add e2e test for parallel query
labianchin pushed a commit that referenced this pull request Apr 16, 2019
* add draft support for parallel exports

* separate functions for split queries

* fix doc

* fix logic and add test cases

* add check for -ve parallelism

* refactor metering for parallel exports

* check result set

* rename to query parallelism to not conflict with beam's targetParallelism

* fix record gauge reporting

* change order of e2e test

* fix metering for parallel queries

* add e2e test for parallel query
labianchin pushed a commit that referenced this pull request Apr 16, 2019
* add draft support for parallel exports

* separate functions for split queries

* fix doc

* fix logic and add test cases

* add check for -ve parallelism

* refactor metering for parallel exports

* check result set

* rename to query parallelism to not conflict with beam's targetParallelism

* fix record gauge reporting

* change order of e2e test

* fix metering for parallel queries

* add e2e test for parallel query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants