diff --git a/Full-Text-Index.md b/Full-Text-Index.md index 58ede03c..4a286f19 100644 --- a/Full-Text-Index.md +++ b/Full-Text-Index.md @@ -9,9 +9,9 @@ In addition to the standard FullText Index, which uses the SB-Tree index algorit **Syntax**: -```sql -CREATE INDEX ON (prop-names) FULLTEXT ENGINE LUCENE -``` +
+orientdb> CREATE INDEX  ON  (prop-names) FULLTEXT ENGINE LUCENE
+                                                                                                                  
The following SQL statement will create a FullText index on the property `name` for the class `City`, using the Lucene Engine. @@ -26,68 +26,53 @@ orientdb> CREATE INDEX City.name_description ON FULLTEXT ENGINE LUCENE +The default analyzer used by OrientDB when a Lucene index is created id the [StandardAnalyzer](https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html). ### Analyzer -This creates a basic FullText Index with the Lucene Engine on the specified properties. In the even that you do not specify the analyzer, OrientDB defaults to [StandardAnalyzer](http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html). - -In addition to the StandardAnalyzer, you can also create indexes that use different analyzer, using the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md). - -
-orientdb> CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
-          {"analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"}
-
- -**(from 2.1.16)** - -From version 2.1.16 it is possible to provide a set of stopwords to the analyzer to override the default set of the analyzer: +In addition to the StandardAnalyzer, full text indexes can be configured to use different analyzer by the `METADATA` operator through [`CREATE INDEX`](SQL-Create-Index.md). +Configure the index on `City.name` to use the `EnglishAnalyzer`:
-orientdb> CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
-          {
-          "analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer",
-          "analyzer_stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by" ]
-          }
-          
-          
+orientdb> CREATE INDEX City.name ON City(name)
+            FULLTEXT ENGINE LUCENE METADATA {
+                "analyzer": "org.apache.lucene.analysis.en.EnglishAnalyzer"
+            }
 
-**(from 2.2)** - -Starting from 2.2 it is possible to configure different analyzers for indexing and querying. +Configure the index on `City.name` to use different analyzers for indexing and querying.
-orientdb> CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
-          {
-          "index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
-          "query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
+orientdb> CREATE INDEX City.name ON City(name)
+            FULLTEXT ENGINE LUCENE METADATA {
+                "index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
+                "query": "org.apache.lucene.analysis.standard.StandardAnalyzer"
           }
 
-EnglishAnalyzer will be used to analyze text while indexing and the StandardAnalyzer will be used to analyze query terms. +`EnglishAnalyzer` will be used to analyze text while indexing and the `StandardAnalyzer` will be used to analyze query terms. -It is posssbile to configure analyzers at field level +A very detailed configuration, on multi-field index configuration, could be:
-orientdb> CREATE INDEX City.name_description ON City(name, lyrics, title,author, description) FULLTEXT ENGINE LUCENE METADATA
-          {
-            "default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
-            "index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
-            "query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
-            "name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
-            "name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
-            "lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
-            "title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
-            "title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
-            "author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
-            "description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
-            "description_index_stopwords": [
-              "the",
-              "is"
-            ]
-
-          }
+orientdb> CREATE INDEX City.name_description ON City(name, lyrics, title,author, description)
+            FULLTEXT ENGINE LUCENE METADATA {
+                "default": "org.apache.lucene.analysis.standard.StandardAnalyzer",
+                "index": "org.apache.lucene.analysis.core.KeywordAnalyzer",
+                "query": "org.apache.lucene.analysis.standard.StandardAnalyzer",
+                "name_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
+                "name_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
+                "lyrics_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
+                "title_index": "org.apache.lucene.analysis.en.EnglishAnalyzer",
+                "title_query": "org.apache.lucene.analysis.en.EnglishAnalyzer",
+                "author_query": "org.apache.lucene.analysis.core.KeywordAnalyzer",
+                "description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer",
+                "description_index_stopwords": [
+                  "the",
+                  "is"
+                ]
+            }
 
With this configuration, the underlying Lucene index will works in different way on each field: @@ -98,14 +83,18 @@ With this configuration, the underlying Lucene index will works in different way * *author*: indexed and searched with KeywordhAnalyzer * *description*: indexed with StandardAnalyzer with a given set of stopwords -You can also use the FullText Index with the Lucene Engine through the Java API. -```java -OSchema schema = databaseDocumentTx.getMetadata().getSchema(); -OClass oClass = schema.createClass("Foo"); -oClass.createProperty("name", OType.STRING); -oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"}); -``` +### Java API + +The FullText Index with the Lucene Engine is configurable through the Java API. + +

+    OSchema schema = databaseDocumentTx.getMetadata().getSchema();
+    OClass oClass = schema.createClass("Foo");
+    oClass.createProperty("name", OType.STRING);
+    oClass.createIndex("City.name", "FULLTEXT", null, null, "LUCENE", new String[] { "name"});
+
+
## Query parser @@ -161,16 +150,16 @@ SELECT from Person WHERE name LUCENE "name" It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter
-CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA
-{
-  "directory_type": "nio",
-  "use_compound_file": false,
-  "ram_buffer_MB": "16",
-  "max_buffered_docs": "-1",
-  "max_buffered_delete_terms": "-1",
-  "ram_per_thread_MB": "1024",
-  "default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
-}
+CREATE INDEX City.name ON City(name)
+    FULLTEXT ENGINE LUCENE METADATA {
+        "directory_type": "nio",
+        "use_compound_file": false,
+        "ram_buffer_MB": "16",
+        "max_buffered_docs": "-1",
+        "max_buffered_delete_terms": "-1",
+        "ram_per_thread_MB": "1024",
+        "default": "org.apache.lucene.analysis.standard.StandardAnalyzer"
+    }
 
 
@@ -186,20 +175,26 @@ It is possible to fine tune the behaviour of the underlying Lucene's IndexWriter For a detailed explanation of config parameters and IndexWriter behaviour -* indexWriterConfig : https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriterConfig.html -* indexWriter: https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexWriter.html +* indexWriterConfig : https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriterConfig.html +* indexWriter: https://lucene.apache.org/core/6_3_0/core/org/apache/lucene/index/IndexWriter.html ## Querying Lucene FullText Indexes -You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/5_4_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine. +You can query the Lucene FullText Index using the custom operator `LUCENE` with the [Query Parser Syntax](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description) from the Lucene Engine.
 orientdb> SELECT FROM V WHERE name LUCENE "test*"
 
This query searches for `test`, `tests`, `tester`, and so on from the property `name` of the class `V`. +The query can use proximity operator _~_, the required (_+_) and prohibit (_-_) operators, phrase queries, regexp queries: + +
+orientdb> SELECT FROM Article WHERE content LUCENE "(+graph -rdbms) AND +cloud"
+
-### Working with Multiple Fields + +### Working with multiple fields In addition to the standard Lucene query above, you can also query multiple fields. For example, @@ -207,19 +202,60 @@ In addition to the standard Lucene query above, you can also query multiple fiel orientdb> SELECT FROM Class WHERE [prop1, prop2] LUCENE "query" -In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field. +In this case, if the word `query` is a plain string, the engine parses the query using [MultiFieldQueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) on each indexed field. To execute a more complex query on each field, surround your query with parentheses, which causes the query to address specific fields.
-orientdb> SELECT FROM CLass WHERE [prop1, prop2] LUCENE "(prop1:foo AND prop2:bar)"
+orientdb> SELECT FROM Article WHERE [content, author] LUCENE "(content:graph AND author:john)"
+
+ +Here, the engine parses the query using the [QueryParser](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html) + +### Numeric and date range queries + +If the index is defined over a numeric field (INTEGER, LONG, DOUBLE) or a date field (DATE, DATETIME), the engine supports [range queries](http://lucene.apache.org/core/6_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches) +Suppose to have a `City` class witha multi-field Lucene index defined: + +
+orientdb> 
+CREATE CLASS CITY EXTENDS V
+CREATE PROPERTY CITY.name STRING
+CREATE PROPERTY CITY.size INTEGER
+CREATE INDEX City.name ON City(name,size) FULLTEXT ENGINE LUCENE
+
 
-Here, hte engine parses the query using the [QueryParser](http://lucene.apache.org/core/4_7_0/queryparser/org/apache/lucene/queryparser/classic/QueryParser.html) +Then query using ranges: + +
+orientdb> 
+SELECT FROM City WHERE [name,size] LUCENE 'name:cas* AND size:[15000 TO 20000]'
+
+
+ +Ranges can be applied to DATE/DATETIME field as well. Create a Lucene index over a property: + +
+orientdb> 
+CREATE CLASS Article EXTENDS V
+CREATE PROPERTY Article.createdAt DATETIME
+CREATE INDEX Article.createdAt  ON Article(createdAt) FULLTEXT ENGINE LUCENE
+
+
+ +Then query to retrieve articles published only in a given time range: + +
+orientdb> 
+SELECT FROM Article WHERE createdAt LUCENE '[201612221000 TO 201612221100]'
+
+ + ### Retrieve the Score -When the lucene index is used in a query, the results set carries a context variable for each record rappresenting the score. +When the lucene index is used in a query, the results set carries a context variable for each record representing the score. To display the score add `$score` in projections. ``` @@ -228,7 +264,7 @@ SELECT *,$score FROM V WHERE name LUCENE "test*" ## Creating a Manual Lucene Index -Beginning with version 2.1, the Lucene Engine supports index creation without the need for a class. +The Lucene Engine supports index creation without the need for a class. **Syntax**: