Skip to content

Pivoting With Query

lvca edited this page Dec 22, 2012 · 3 revisions

One-sentence summary of this page.

Determining the pivot class on select queries

When planning a SELECT OrientDB query is important to determine the model class that will be taken as pivot class of the query. This class is expressed in the 'from' clause. It affects other elements in the query as follows:

  • projections will be referred to the pivot class. Is possible to traverse within a projection to refer neightbor classes by chaining edge syntax expressions (i.e. {{{in[However, consider that multiple results from a projection traversed from pivot class will be returned as a collection within the result set (unless is a single value).
  • filtering conditions in the "WHERE" clause are also referred to the pivot class. It is also possible to traverse to neightbor classes in order to compose advanced conditions by using edge syntax expressions (e.g. and inlabel='company'.).out.out[IN '0000345'}}}).
  • the 'ORDER BY' clause will be referred to one of the projections and must be returned as a single value per record (i.e. an attribute of the pivot class or a single attribute of a neighbor class). It will not be possible to order by traversed projections in a single query if they return multiple results (as a collection). Therefore, in queries using "ORDER BY" clause, there is not possible choice of the pivot class as it most be the one containing the attribute to ORDER BY.

Additionally, there are performance considerations that should be considered on selecting the pivot class. Assuming 2 classes as follows:

65205b66ddede24ec097328bc88a7b84

Queries:

  1. select ... from CountryType from [1. select ... from PersonType from [...]}

The query (1) will intend to apply the "where" filtering and projections to less number of vertices and, therefore will perform faster that the query (2). Therefore, it is advisable to assign the pivot class to the class with higher average of items relevant for the query to remove unnecessary loops from the evaluation, i.e. usually the one with lower multiplicity.

Switching the pivot class within a query

According to previous discussion, is possible to find conflicting requirements on determining the pivot class. Suppose the case where we need to 'ORDER BY' a class with a very high multiplicity (say, millions of vertices), but most of these vertices are not relevant for the outcome of our query.

On one hand, according to the requirements of the 'ORDER BY' clause, we are forced to choose the class containing the attribute to order by, as the pivot class. But, as commented, this class can not be an optimal choice from a performance point of view if only a small subset of vertices is relevant to the query. In this case, we have a design conflict between having a poor performance by setting the pivot class as the class containing the attribute to order by sentence with higher multiplicity, or get a best performing query by taking out the "ORDER BY" clause and ordering results in the invoking Java code. If we choose to execute the full operation in one query, indices can be used to improve the poor performance, but it would be usually an overkill as a consequence of a bad query planning.

A more elegant solution can be achieved by the technique of nesting queries, as shown below:

    SELECT
      in[label='city'].out.name AS name,
      in[label='city'].out.out[label='city'].size() AS count_of_cities,
      CityLat,
      CityLong,
      distance(CityLat, CityLong, 51.513363,-0.089178) AS distance
    FROM ( 
      SELECT flatten( in[label='region'].out.out[label='city'].in )
      FROM CountryType WHERE id IN '0032'
    ) WHERE
      CityLat <> '' AND
      CityLong  <> ''
      ORDER BY distance

This nested query represents a two-fold operation, taking best of both worlds. The first subquery (in red) uses the class with lower multiplicity as pivot class, so the number of required loops is smaller, and as a result, delivers a better performance. The resulting set of vertices from the first subquery, which includes the ordering parameter (in blue), (allowing the ORDER BY operation), is taken as pivot class for the second subquery. The flatten() function is required to expose items from the first subquery as a flat structure to the second query. The higher the multiplicity and number of no relevant records in the class with the parameter to "ORDER BY", the more convenient becomes using this approach.

Clone this wiki locally