This is the implementation of the GLL-based context-free path querying (CFPQ) algorithm. The proposed algorithm solves both the reachability-only and the all-paths problems for the all-pairs and the multiple sources cases. It handles queries in Extended Backus-Naur Form (EBNF) using Recursive State Machines (RSM)
This project uses maven, to build a jar-file with the procedure in this project, simply package the project with maven:
mvn clean package
This will produce a jar-file,target/cfpq-gll-neo4j-procedure-1.0.0.jar
,
that can be deployed in the plugin
directory of your Neo4j instance.
Once the code is compiled, the JAR file needs to be put into the plugin's directory of Neo4j root folder. To call the stored procedure the following Cypher query can be used,
CALL cfpq.gll.getReachabilities(nodes, rsm)
where nodes
is a collection of start nodes, and rsm
is a string representation of RSM specified over relations types.
CREATE (n1:Node{id:1})
CREATE (n2:Node{id:2})
CREATE (n3:Node{id:3})
CREATE (n4:Node{id:4})
CREATE (n1)-[:a]->(n2)
CREATE (n2)-[:a]->(n3)
CREATE (n3)-[:a]->(n1)
CREATE (n3)-[:b]->(n4)
CREATE (n4)-[:b]->(n3)
WITH
'StartState(id=0,nonterminal=Nonterminal(S),isStart=true,isFinal=true);' +
'TerminalEdge(tail=0,head=0,terminal=Terminal(a));' +
'TerminalEdge(tail=0,head=0,terminal=Terminal(b));' as rsm
MATCH (n1:Node{id:1})
CALL cfpq.gll.getReachabilities([n1], rsm)
YIELD first, second
RETURN first.id, second.id
first | second |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
The evaluation of real-world graphs demonstrates that the utilization of RSMs increases the performance of query evaluation. Being implemented as a stored procedure for Neo4j, our solution demonstrates better performance than a similar solution for RedisGraph. The performance of the solution of regular path queries is comparable with the performance of the native Neo4j solution, and in some cases, it requires significantly less memory.
Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM.
Environment configuration:
- OpenJDK 64-Bit Server VM Corretto-17.0.8.8.1 (build 17.0.8.1+8-LTS, mixed mode, sharing).
- JVM heap configuration: 55Gb both xms and xmx.
- Neo4j 5.0.12 is used. Almost all configurations are default except one:
- memory_transaction_global_max_size parameter is set to 0, which means unlimited memory usage per transaction.
The graph data is selected from CFPQ_Data dataset. Graphs related to RDF analysis problems were taken.
A detailed description of the graphs is listed below.
RDF analysis
Graph name | |V| | |E| | #subClassOf | #type | #broaderTransitive |
---|---|---|---|---|---|
Core | 1 323 | 2 752 | 178 | 0 | 0 |
Pathways | 6 238 | 12 363 | 3 117 | 3 118 | 0 |
Go hierarchy | 45 007 | 490 109 | 490 109 | 0 | 0 |
Enzyme | 48 815 | 86 543 | 8 163 | 14 989 | 8 156 |
Eclass_514en | 239 111 | 360 248 | 90 962 | 72 517 | 0 |
Geospecies | 450 609 | 2 201 532 | 0 | 89 065 | 20 867 |
Go | 582 929 | 1 437 437 | 94 514 | 226 481 | 0 |
Taxonomy | 5 728 398 | 14 922 125 | 2 112 637 | 2 508 635 | 0 |
Regular queries Regular queries were generated using a well-established set of templates for RPQ algorithms evaluation. Four nontrivial templates (that contain compositions of Kleene star and union) that are expressible in Cypher syntax to be able to compare native Neo4j querying algorithm with this solution were chosen.
Used templates are presented below.
reg1 = (π | π)*
reg2 = π* π*
reg3 = (π | π | π)+
reg4 = (π | π)+ (π | π)+
Respective path patterns expressed in Cypher are presented below.
reg1 = ()-[:a| :b]->{0,}()
reg2 = ()-[:a]->{0,}()-[:b]->{0,}()
reg3 = ()-[:a | :b | :c]->{1,}()
reg4 = ()-[:a | :b]->{1,}()-[:c | :d]->{1,}()
Context-Free Queries
All queries used in the evaluation are variants of same-generation query. The inverse of an x
relation and the respective edge is denoted as x_r
.
Grammars used for RDF graphs:
G1
S -> subClassOf_r S subClassOf | subClassOf_r subClassOf
| type_r S type | type_r type
G2
S -> subClassOf_r S subClassOf | subClassOf
Geo
S -> broaderTransitive S broaderTransitive_r
| broaderTransitive broaderTransitive_r
Respective RSMs are presented below.
![](https://private-user-images.githubusercontent.com/31728695/272900126-e7f23dd4-8bd4-45cf-9cc6-0d3a0b3d2fed.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMzIyNDksIm5iZiI6MTczOTIzMTk0OSwicGF0aCI6Ii8zMTcyODY5NS8yNzI5MDAxMjYtZTdmMjNkZDQtOGJkNC00NWNmLTljYzYtMGQzYTBiM2QyZmVkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIzNTkwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgwMjE1OWE1ZWVhNDdhZjVmNTY3NDFlMTFkMDBkYTQwOTVlYjU0OTk0YjgzY2JjYmJmYjQ3YTEzOGE1MGRhY2MmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.xCvOkZh52DXpGNUDC5Y68wJv7aMcTIlqoNEJvT2kXUM)
![](https://private-user-images.githubusercontent.com/31728695/272898694-a04a7eb7-f42e-4b72-aa94-d884a247a143.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMzIyNDksIm5iZiI6MTczOTIzMTk0OSwicGF0aCI6Ii8zMTcyODY5NS8yNzI4OTg2OTQtYTA0YTdlYjctZjQyZS00YjcyLWFhOTQtZDg4NGEyNDdhMTQzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIzNTkwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY4OGJlNGNmMDM3MjU0Yjk3Njk5M2RjOWQyZTY1YjFkMTgxMTA3NTFjMGRmMDBhOGE4ZTgxMjJkNzkyNDc2ZGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.XhmPRZkwtsnWc5cBQojvPkVvYZfcMqRFA6KrdPqIp0A)
![](https://private-user-images.githubusercontent.com/31728695/272898719-83371945-ebd7-4c01-b102-66456abe857a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMzIyNDksIm5iZiI6MTczOTIzMTk0OSwicGF0aCI6Ii8zMTcyODY5NS8yNzI4OTg3MTktODMzNzE5NDUtZWJkNy00YzAxLWIxMDItNjY0NTZhYmU4NTdhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIzNTkwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWYwMWFkZTVlNGVlMTAxNTFmNDY3MmQwMjYyYzE5Y2Y5YzYzOWNhMWQ1M2UyZjFjMjBkY2JhZTRjOWI2YjUyYTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.IlDcknsJ3z0m_KeCFcawB6VULpqYJA05oGNwAud_v_g)
The starting sets for the multiple sources querying are generated from all vertices of a graph with a random permutation. Chunks of size 1, 10, and 100 were used.
Results for queries G1, G2 and Geo are presented below respectively. The performance results are compared with matrix-based CFPQ algorithm implemented in RedisGraph by Arseniy Terekhov et al in paper.
Multiple sources CFPQ reachability results for queries related to RDF analysis and G1
Multiple sources CFPQ reachability results for queries related to RDF analysis and G2
Multiple sources CFPQ reachability results for queries related to RDF analysis and Geo
Results for queries reg1, reg2, reg3 and reg4 are presented below respectively. The performance results were compared with native Neo4j solution.
Multiple sources RPQ reachability results for queries related to RDF analysis and reg1 (native solution failed with OOM on last two graphs)
Multiple sources RPQ reachability results for queries related to RDF analysis and reg2 (native solution failed with OOM on last two graphs)
Multiple sources RPQ reachability results for queries related to RDF analysis and reg3 (native solution failed with OOM on last two graphs)
Multiple sources RPQ reachability results for queries related to RDF analysis and reg4 (native solution failed with OOM on last two graphs)
This project is licensed under Apache License 2.0. License text can be found in the license file