select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

apapacy · 2017-06-29T11:00:41Z

OrientDB Version: 2.2.22

Java Version: openjdk version "1.8.0_131"

OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS: ubuntu 16.04 lts

Expected behavior

select expand(rid) from index:collection.field where key lucene 'Харько~0.2' limit 3
mast work fine

Actual behavior

Dont work. OrientDB process corrupt. Database not accessible. Need kill + start.sh

Steps to reproduce

insert into class more 1 000 000 document with same "word" and select key lucene 'word~0.3' limit 2

select expand(rid) from index:collection.field where key lucene 'Харько~0.2' limit 3 - not work

robfrank · 2017-06-30T12:15:27Z

How is the index created? Are you using a Russian analyzer or the standard one?

refs: #7513

robfrank · 2017-06-30T12:45:14Z

I'm unable to reproduce the problem. The commits I added are a little SQL script to create a sample dataset and a test. Tested in remote too.
May you provide more sample data and command used to create the index?

robfrank · 2017-06-30T13:51:56Z

Documentation about lucene indexes:

http://orientdb.com/docs/last/Full-Text-Index.html

Yoy will find how to configure custom analyzers for fields

Without sample dataset and command used to create the index, I can't do more.

apapacy · 2017-06-30T14:11:19Z

I'll do another test with an analyzer.

My simple data is 4 500 000 documents {message: 'Харьков - 1'}, {message: 'Харьков - 2'} ...
My query select expand(rid) from index:test.message key 'Харько~0.3' limit 2.

Maybe the "limit 2" is not working as I thought? And all data in processing?

robfrank · 2017-06-30T14:22:05Z

Why don't simply use the select from test where lucene 'Харько~0.3' limit 2 ?

moreover, it seems that your "message" field are single term, why are you usingn the slop operator '~' ?

The limit is processed AFTER the search on the index, the lucene index only manage the lucene query, not the sql part. So the index could find 10000 docs and then only the firts 2 will be returned.
This is resonable, because the full-text query could be part of a more complex query, and the limit should work on the global result, not only on the index one.

apapacy · 2017-06-30T14:57:37Z

#7193

If create 2 indexes on same column (fulltext lucene and some other) select not use right index but first created (Hobson's choice)

Same select where lucene 'Харько~0.3' limit 2 - work fine
But if only 1 index created on field.

So. My aim to use OriendDB with great lucene search on production without any problem.

robfrank · 2017-06-30T15:05:10Z

Thanks for the explanation about the api. How many documents are extracted with the query without the limit clause? I tested only the behavior of the query, but I guess the problem is in the size of the result set in conjunction with expand. May you restrict the query to limit the number of documents retrieved?

apapacy · 2017-06-30T15:24:07Z

My "limit" is 2.

My create statement
create class russian
create property russian.message string

create index russian.message on russian(message)
fulltext engine lucene metadata {
"analyzer": "org.apache.lucene.analysis.ru.RussianAnalyzer"
}

My insertion

odb.insert().into('russian').set({message: Харьков - ${k} - ${i} }).one().catch(function(err){console.log(err);});
k=1...1000
i=1...1000

So. there are 1 000 000 document of class "russian" and all match where condition

select * from russian where message lucene 'Харько0.1' limit 2 - work fine
select rid from index:russian.message where key lucene 'Харько0.1' limit 2 - work fine

But this select seems as too heavy (about 10 s.)
select expand(rid) from index:russian.message where key lucene 'Харько~0.1' limit 2

If count of document not 1 000 000 but 4 500 000 process corrupt

luigidellaquila · 2017-07-02T07:35:01Z

Hi @apapacy

The problem seems to be in the SQL query executor, not in the Lucene index plugin.
I'm fixing it now

Thanks

Luigi

luigidellaquila · 2017-07-02T07:48:23Z

Hi @apapacy

I just pushed a fix on branch 2.2.x, the fix will be available in V 2.2.23

Thanks

Luigi

apapacy · 2017-07-02T22:34:30Z

Thanks.

wolf4ood assigned robfrank Jun 29, 2017

wolf4ood added the bug label Jun 29, 2017

robfrank added this to the 2.2.x (next hotfix) milestone Jun 29, 2017

robfrank added a commit that referenced this issue Jun 30, 2017

adds test for russian text

15ce0b7

refs: #7513

robfrank added a commit that referenced this issue Jun 30, 2017

adds test for russian text

deee8b4

refs: #7513

robfrank added the waiting reply label Jun 30, 2017

prjhub removed the waiting reply label Jun 30, 2017

apapacy changed the title ~~select expand(rid) from index:collection.field where key lucene - corrupt process~~ select expand(rid) from index:collection.field where key lucene limit N - work slow Jun 30, 2017

luigidellaquila assigned luigidellaquila and unassigned robfrank Jul 2, 2017

luigidellaquila closed this as completed Jul 2, 2017

santo-it modified the milestones: 2.2.23, 2.2.x (next hotfix) Jul 3, 2017

robfrank mentioned this issue Jul 14, 2017

LUCENE SPATIAL corrupts OrientDb when query out of range (V.2.1 release) #4880

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

apapacy commented Jun 29, 2017 •

edited

Loading

robfrank commented Jun 30, 2017

robfrank commented Jun 30, 2017

robfrank commented Jun 30, 2017

apapacy commented Jun 30, 2017

robfrank commented Jun 30, 2017 •

edited

Loading

apapacy commented Jun 30, 2017 •

edited

Loading

robfrank commented Jun 30, 2017

apapacy commented Jun 30, 2017 •

edited

Loading

luigidellaquila commented Jul 2, 2017

luigidellaquila commented Jul 2, 2017

apapacy commented Jul 2, 2017

select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

Comments

apapacy commented Jun 29, 2017 • edited Loading

OrientDB Version: 2.2.22

Java Version: openjdk version "1.8.0_131"

OS: ubuntu 16.04 lts

Expected behavior

Actual behavior

Steps to reproduce

robfrank commented Jun 30, 2017

robfrank commented Jun 30, 2017

robfrank commented Jun 30, 2017

apapacy commented Jun 30, 2017

robfrank commented Jun 30, 2017 • edited Loading

apapacy commented Jun 30, 2017 • edited Loading

robfrank commented Jun 30, 2017

apapacy commented Jun 30, 2017 • edited Loading

luigidellaquila commented Jul 2, 2017

luigidellaquila commented Jul 2, 2017

apapacy commented Jul 2, 2017

apapacy commented Jun 29, 2017 •

edited

Loading

robfrank commented Jun 30, 2017 •

edited

Loading

apapacy commented Jun 30, 2017 •

edited

Loading

apapacy commented Jun 30, 2017 •

edited

Loading