Graph-based Recommender

Graph-based recommender for BBC positive only feedback dataset using Gremlin graph query language. See the excellent blog post of M. Rodriguez to get an idea of the basic principles of a graph-based recommender system.

How-To-Run

Download Titan Graph Database v0.4.4 (all): http://thinkaurelius.github.io/titan/
Set the Java VM args for groovy/gremlin. The complete BBC 6.7 Mio ratings dataset needs at least 35Gb (!) of RAM. If you want to serialize the graph to disk you also need to increase the stack size. Hence:
```
JAVA_OPTIONS="-Xms1g -Xmx35g -Xss100m"
export JAVA_OPTIONS
```

Pass main.groovy to gremlin.sh in script mode (-e):

titan-all-0.4.4/bin/gremlin.sh -e path/to/Graph-based-recommender/main.groovy -p dir/for/optional/graph/serialization -u some,user,ids,to,process path/to/BBC/dataset path/to/output/directory

Graph Structure

Graphs queried for recommendations have the structure depicted below.

Vertices in red: User
Vertices in cyan: Shows
Vertices in purple: Show Attributes (e.g. 'genres/comedy' or 'service/bbcfour')
User -> Show edge: 'watched'
Show -> Attribute edge: 'show has attribute'

If a user watches the same show more than once the graph contains multiple 'watched' edges between the user and the show.

Recommendation Strategies

The recommendations are determined by the implementation of IRecommendationStrategy:

interface IRecommendationStrategy{
	List<Vertex> recommendShowsToUser(Vertex user);
}

RankedCollaborativeFilteringRecommendationStrategy: Pure collaborative filtering. Recommended are shows that have not been watched by the current user but by users that watched at least one of the shows that the current user watched. The retrieved shows are ranked by the number of shortest paths (length == 3) from the current user to the new shows.

class RankedCollaborativeFilteringRecommendationStrategy implements IRecommendationStrategy{
    List recommendShowsToUser(Vertex user){
        // find all path from current user to new shows in one other user distance, count number of paths to new show, order descending by counts 
        def showsWatchedByCurrentUser = [];
        // gremlin code follows
        return user.out('watched').aggregate(showsWatchedByCurrentUser).in('watched').out('watched').except(showsWatchedByCurrentUser).groupCount.cap.orderMap(T.decr).toList();
    }
}

If we apply this strategy on the sample graph with:

titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

We get:

Your input arguments: [-t, -u, U1,U6,U12, ., .]
-- Test graph loaded: tinkergraph[vertices:24 edges:36] --
-- Some basic graph properties:
---- Number of vertices: 24
---- Number of vertices of type 'Show': 7
---- Number of vertices of type 'User': 14
---- Number of vertices of type 'Attribute': 3
---- Number of edges: 36
---- Number of edges of type 'hasAttribute': 11
---- Number of edges of type 'watched': 25
---------------------------------
Calculating recommendations for 3 users
-- Start recommendation calculation --
-- Start recommendation calculation for user U1 (1/3) --
### Watched shows (5): ###
S6 - <NA> - <NA>
S7 - <NA> - <NA>
S6 - <NA> - <NA>
S1 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (3, best first): ###
1 - S5 - <NA> - <NA>
2 - S2 - <NA> - <NA>
3 - S3 - <NA> - <NA>
-- Recommendation calculation for user U1 completed (1/3) --
-- Start recommendation calculation for user U6 (2/3) --
### Watched shows (1): ###
S5 - <NA> - <NA>
### Recommendations (1, best first): ###
1 - S6 - <NA> - <NA>
-- Recommendation calculation for user U6 completed (2/3) --
-- Start recommendation calculation for user U12 (3/3) --
### Watched shows (3): ###
S2 - <NA> - <NA>
S3 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (2, best first): ###
1 - S6 - <NA> - <NA>
2 - S7 - <NA> - <NA>
-- Recommendation calculation for user U12 completed (3/3) --
-- Recommendation calculation completed --
Calculating recommendations for 3 users took 0.153 seconds (0.051 s per user).
Application terminated without exceptions.

ItemAttributesConsideringRankedCollaborativeFilteringRecommendationStrategy: Collaborative filtering with mixed in metadata / attributes similarity. Recommended are shows that have not been watched by the current user but by users that watched at least one of the shows that the current user watched OR shows that have a common attribute. The retrieved shows are ranked by the number of shortest paths (length == 3) from the current user to the new shows, either passing through an other user or a shared attribute.

class ItemAttributesConsideringRankedCollaborativeFilteringRecommendationStrategy implements IRecommendationStrategy{
    List recommendShowsToUser(Vertex user){
        // find all path from current user to new shows in one other user or an attribute distance, count number of paths to new show, order descending by counts 
        def showsWatchedByCurrentUser = [];
        return user.out('watched').aggregate(showsWatchedByCurrentUser).both('watched','hasAttribute').both('watched', 'hasAttribute').except(showsWatchedByCurrentUser).groupCount.cap.orderMap(T.decr).toList();
    }
}

If we apply this strategy on the sample graph with:

titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

We get:

Your input arguments: [-t, -u, U1,U6,U12, ., .]
-- Test graph loaded: tinkergraph[vertices:24 edges:36] --
-- Some basic graph properties:
---- Number of vertices: 24
---- Number of vertices of type 'Show': 7
---- Number of vertices of type 'User': 14
---- Number of vertices of type 'Attribute': 3
---- Number of edges: 36
---- Number of edges of type 'hasAttribute': 11
---- Number of edges of type 'watched': 25
---------------------------------
Calculating recommendations for 3 users
-- Start recommendation calculation --
-- Start recommendation calculation for user U1 (1/3) --
### Watched shows (5): ###
S6 - <NA> - <NA>
S7 - <NA> - <NA>
S6 - <NA> - <NA>
S1 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (4, best first): ###
1 - S5 - <NA> - <NA>
2 - S2 - <NA> - <NA>
3 - S3 - <NA> - <NA>
4 - S4 - <NA> - <NA>
-- Recommendation calculation for user U1 completed (1/3) --
-- Start recommendation calculation for user U6 (2/3) --
### Watched shows (1): ###
S5 - <NA> - <NA>
### Recommendations (5, best first): ###
1 - S6 - <NA> - <NA>
2 - S7 - <NA> - <NA>
3 - S3 - <NA> - <NA>
4 - S4 - <NA> - <NA>
5 - S2 - <NA> - <NA>
-- Recommendation calculation for user U6 completed (2/3) --
-- Start recommendation calculation for user U12 (3/3) --
### Watched shows (3): ###
S2 - <NA> - <NA>
S3 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (4, best first): ###
1 - S7 - <NA> - <NA>
2 - S6 - <NA> - <NA>
3 - S4 - <NA> - <NA>
4 - S5 - <NA> - <NA>
-- Recommendation calculation for user U12 completed (3/3) --
-- Recommendation calculation completed --
Calculating recommendations for 3 users took 0.156 seconds (0.052 s per user).
Application terminated without exceptions.

RankedItemAttributesRecommendationStrategy: Item attributes / metadata only based recommendations.

class RankedItemAttributesRecommendationStrategy implements IRecommendationStrategy{
    List recommendShowsToUser(Vertex user){
        // get attributes of all users shows, get unwatched shows with same attributes, rank shows by number of paths to the show and append to end of result list, loop starting from previously found shows
        def l=[]; def a=[]; def s=[];
        user.out('watched').as('x').except(s).aggregate(s).out('hasAttribute').except(a).aggregate(a).in('hasAttribute').except(s).gather{def counts=it.countBy{it}; counts=counts.sort{i, j -> j.value <=> i.value}; l.addAll(counts.keySet()); return it}.scatter.loop('x'){it.loops < 20}.iterate();
        return l
    }
}

If we apply this strategy on the sample graph with:

titan-all-0.4.4/bin/gremlin.sh -e graphRecommender/main.groovy -t -u U1,U6,U12 . .

We get:

Your input arguments: [-t, -u, U1,U6,U12, ., .]
-- Test graph loaded: tinkergraph[vertices:24 edges:36] --
-- Some basic graph properties:
---- Number of vertices: 24
---- Number of vertices of type 'Show': 7
---- Number of vertices of type 'User': 14
---- Number of vertices of type 'Attribute': 3
---- Number of edges: 36
---- Number of edges of type 'hasAttribute': 11
---- Number of edges of type 'watched': 25
---------------------------------
Calculating recommendations for 3 users
-- Start recommendation calculation --
-- Start recommendation calculation for user U1 (1/3) --
### Watched shows (5): ###
S6 - <NA> - <NA>
S7 - <NA> - <NA>
S6 - <NA> - <NA>
S1 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (4, best first): ###
1 - S5 - <NA> - <NA>
2 - S2 - <NA> - <NA>
3 - S3 - <NA> - <NA>
4 - S4 - <NA> - <NA>
-- Recommendation calculation for user U1 completed (1/3) --
-- Start recommendation calculation for user U6 (2/3) --
### Watched shows (1): ###
S5 - <NA> - <NA>
### Recommendations (5, best first): ###
1 - S6 - <NA> - <NA>
2 - S7 - <NA> - <NA>
3 - S3 - <NA> - <NA>
4 - S4 - <NA> - <NA>
5 - S2 - <NA> - <NA>
-- Recommendation calculation for user U6 completed (2/3) --
-- Start recommendation calculation for user U12 (3/3) --
### Watched shows (3): ###
S2 - <NA> - <NA>
S3 - <NA> - <NA>
S1 - <NA> - <NA>
### Recommendations (4, best first): ###
1 - S7 - <NA> - <NA>
2 - S6 - <NA> - <NA>
3 - S4 - <NA> - <NA>
4 - S5 - <NA> - <NA>
-- Recommendation calculation for user U12 completed (3/3) --
-- Recommendation calculation completed --
Calculating recommendations for 3 users took 0.177 seconds (0.059 s per user).
Application terminated without exceptions.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
main.groovy		main.groovy
sample-graph.jpg		sample-graph.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph-based Recommender

How-To-Run

Graph Structure

Recommendation Strategies

About

Releases

Packages

Languages

faberchri/Graph-based-recommender

Folders and files

Latest commit

History

Repository files navigation

Graph-based Recommender

How-To-Run

Graph Structure

Recommendation Strategies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages