Skip to content

Pattern Match Pattern

okram edited this page Jun 10, 2011 · 18 revisions

Many graph applications have to do with identifying patterns in the graph. That is, identifying sub-graphs that match a particular topology and/or feature set. In single-relational graphs, usually a topology is only considered. For graphs that are labeled and have key/value pairs on their elements (i.e. property graphs), then features on that topology must be considered as well to be generally useful.

Table and Supporting Pipes

The table step is useful for cataloging parts of the graph that match a particular pattern. The best way to explain its use is with due respect to the graph pattern match language SPARQL. An example SPARQL query is provided below. The query will return those vertices that Marko knows along with their respective creations

SELECT ?x ?y WHERE {
  marko knows ?x
  ?x created ?y
}

SPARQL is an excellent, intuitive language for defining patterns in an RDF graph (i.e. multi-relational graph). Unfortunately, RDF (and thus, SPARQL) does not support key/value pairs on the elements of the graph—it does not support property graphs (save through complex and verbose reification constructs). As such, SPARQL does not easily map over to the property graph domain.

In Gremlin, the above query is accomplished using the table step and respective TablePipe and Table object. First lets load our toy graph diagrammed in Defining a Property Graph.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]

In the example below, a traversal is executed starting from marko (vertex 1). The table step will select the 1st and 2nd parts of the paths that reach it an insert them into the Table t as rows.

gremlin> t = new Table()    
gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(t)
==>v[5]
==>v[3]
gremlin> t
==>[x:v[4], y:v[5]]
==>[x:v[4], y:v[3]]

As such, components of the traversals history (path) are saved into the table and serve as the desired return bindings. Given the SPARQL query from previous, the columns of t are equivalent to ?x and ?y.

Finally, to provide the developer flexibility, its possible to evaluate closures over those path components prior to inserting them into the table.

gremlin> t = new Table()                                                       
gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(t){it.name}{it.name}
==>v[5]
==>v[3]
gremlin> t
==>[x:josh, y:ripple]
==>[x:josh, y:lop]

In the above Gremlin snippets, we found all the people that Marko knows who have created a product. Specifically, Marko knows Josh and Josh has created both Ripple and LoP.

For those familiar with Pipes, TablePipe is a SideEffectPipe and as such, can be “capped.”

gremlin> g.v(1).out('knows').as('x').out('created').as('y').table(new Table()).cap() >> 1
==>[x:v[4], y:v[5]]
==>[x:v[4], y:v[3]]

Table Methods

There are numerous methods that are offered by Table. Here are some examples of its use:

gremlin> t
==>[x:josh, y:ripple]
==>[x:josh, y:lop]
gremlin> t.getRow(0)
==>josh
==>ripple
gremlin> t.getColumn(1)
==>ripple
==>lop
gremlin> t.getRow(1).get('y')      
==>lop
gremlin> t.getRow(1).get(1)  
==>lop