Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3427] [GraphX] Avoid active vertex tracking in static PageRank #2308

Closed
wants to merge 1 commit into from

Conversation

ankurdave
Copy link
Contributor

GraphX's current implementation of static (fixed iteration count) PageRank uses the Pregel API. This unnecessarily tracks active vertices, even though in static PageRank all vertices are always active. Active vertex tracking incurs the following costs:

  1. A shuffle per iteration to ship the active sets to the edge partitions.
  2. A hash table creation per iteration at each partition to index the active sets for lookup.
  3. A hash lookup per edge to check whether the source vertex is active.

I reimplemented static PageRank using the lower-level GraphX API instead of the Pregel API. In benchmarks on a 16-node m2.4xlarge cluster, this provided a 23% speedup (from 514 s to 397 s, mean over 3 trials) for 10 iterations of PageRank on a synthetic graph with 10M vertices and 1.27B edges.

@ankurdave
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have started for PR 2308 at commit 449996a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have finished for PR 2308 at commit 449996a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ankurdave
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have started for PR 2308 at commit 449996a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have finished for PR 2308 at commit 449996a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ankurdave
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 2308 at commit 449996a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2308 at commit 449996a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ankurdave
Copy link
Contributor Author

@rxin

@rxin
Copy link
Contributor

rxin commented Sep 12, 2014

Thanks. Merging this.

@asfgit asfgit closed this in 15a5645 Sep 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants