-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest workarounds for partitionBy in Spark 1.0.0 due to SPARK-1931 #908
Conversation
We encourage users to build the latest version of Spark from the master branch, which contains a fix. Alternatively, a workaround is to partition the edges before constructing the graph.
Merged build triggered. |
Merged build started. |
@pwendell Since this is a post-release doc change, I wasn't sure which branch to submit against -- let me know if I should rebase to another one. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
It's really nice of you to update the document with workaround. Ideally we should have tried to convince Patrick et al. the importance of this fix and why it should be in 1.0 . IMHO, partitionBY is one of the most important operations in GraphX for many algorithms.. |
@npanj It's unfortunate that there'll be a big known problem in the 1.0.0 release. Still, I think it's tolerable for the following reasons:
The only major problem would be if the third-party Spark distributions don't upgrade to 1.0.1 in a timely manner. |
Hey @npanj Matei and I talked with Ankur a bunch and this was a tough call but together we decided to release without it. This fix is already in the release branch so it will be available right away for those who want it. We'll likely have a 1.0.1 pretty quickly with a handful of fixes and this can be included. |
## Workaround for `Graph.partitionBy` in Spark 1.0.0 | ||
<a name="partitionBy_workaround"></a> | ||
|
||
The [`Graph.partitionBy`][Graph.partitionBy] operator allows users to choose the graph partitioning strategy, but due to [SPARK-1931](https://issues.apache.org/jira/browse/SPARK-1931), this method is broken in Spark 1.0.0. We encourage users to build the latest version of Spark from the master branch, which contains a fix. Alternatively, a workaround is to partition the edges before constructing the graph, as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually suggest they build spark from the 1.0 branch if this fix is the main thing they are interested in.
Looks good, made some minor comments. We just need to remember to revert this in 1.0.0 :) |
@pwendell Thanks for the comments - done. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Hey @ankurdave I'd suggest just updating the 1.0.0 docs on the website manually for now and then we can remove it for 1.0.1. What do you think? |
Sure, I'll do that now and then close this PR. People who rebuild the docs before 1.0.1 is out will need to include these changes though. |
Done. Closing |
Applied PR #908 to the generated docs: apache/spark#908
…pache#908) ### What changes were proposed in this pull request? This PR adds MERGE operations to `ReplaceNullWithFalseInPredicate`. ### Why are the changes needed? These changes are needed to optimize conditions of MERGE operations and match the existing logic for UPDATE and DELETE. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR comes with new tests.
…pache#913) * MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905) * [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907) * MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908) * MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909) * MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910) Co-authored-by: Dmitry Popkov <91957973+d-popkov@users.noreply.github.com> Co-authored-by: Andrew Khalymon <andrew.khalymon@hpe.com>
…pache#913) * MapR [SPARK-953] Investigate and add all needed changes for Spark services (apache#905) * [EZSPA-347] Find a way to pass sensitive configs in secure manner (apache#907) * MapR [SPARK-961] Spark job can't be properly killed using yarn API or CLI (apache#908) * MapR [SPARK-962] MSSQL can not handle SQL syntax which is used in Spark (apache#909) * MapR [SPARK-963] select from hbase table which was created via hive fails (apache#910) Co-authored-by: Dmitry Popkov <91957973+d-popkov@users.noreply.github.com> Co-authored-by: Andrew Khalymon <andrew.khalymon@hpe.com>
The Graph.partitionBy operator allows users to choose the graph partitioning strategy, but due to SPARK-1931, this method is broken in Spark 1.0.0. This PR updates the GraphX docs for Spark 1.0.0 to encourage users to build the latest version of Spark from branch-1.0, which contains a fix. Alternatively, it suggests a workaround involving partitioning the edges before constructing the graph.