Skip to content

Commit

Permalink
Updated API to be similar to KMeans plus other changes requested by X…
Browse files Browse the repository at this point in the history
…iangrui on the PR
  • Loading branch information
sboeschhuawei committed Jan 30, 2015
1 parent c12dfc8 commit 24fbf52
Show file tree
Hide file tree
Showing 6 changed files with 532 additions and 187 deletions.
299 changes: 299 additions & 0 deletions data/mllib/pic_data.txt

Large diffs are not rendered by default.

30 changes: 0 additions & 30 deletions docs/mllib-clustering-pic.md

This file was deleted.

21 changes: 19 additions & 2 deletions docs/mllib-clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,25 @@ a given dataset, the algorithm returns the best clustering result).
* *initializationSteps* determines the number of steps in the k-means\|\| algorithm.
* *epsilon* determines the distance threshold within which we consider k-means to have converged.

[Power Iteration Clustering](mllib-clustering-pic.md) that uses the Power Iteration method combined with KMeans clustering to
cluster points based on a Gaussian measure of the input data pairwise similarity.
### Power Iteration Clustering

Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm:

* accepts a [Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points.
* calculates the principal eigenvalue and eigenvector
* Clusters each of the input points according to their principal eigenvector component value

Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]{www.icml2010.org/papers/387.pdf}

Example outputs for a dataset inspired by the paper - but with five clusters instead of three- have he following output from our implementation:

<p style="text-align: center;">
<img src="img/PIClusteringFiveCirclesInputsAndOutputs.png"
title="The Property Graph"
alt="The Property Graph"
width="50%" />
<!-- Images are downsized intentionally to improve quality on retina displays -->
</p>

### Examples

Expand Down
7 changes: 0 additions & 7 deletions mllib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,6 @@
<type>test-jar</type>
<scope>test</scope>
</dependency>
<!-- <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency> -->
</dependencies>
<profiles>
<profile>
Expand Down
Loading

0 comments on commit 24fbf52

Please sign in to comment.