Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select expand(rid) from index:collection.field where key lucene limit N - work slow #7513

Closed
apapacy opened this issue Jun 29, 2017 · 11 comments
Assignees
Labels
Milestone

Comments

@apapacy
Copy link

apapacy commented Jun 29, 2017

OrientDB Version: 2.2.22

Java Version: openjdk version "1.8.0_131"

OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS: ubuntu 16.04 lts

Expected behavior

select expand(rid) from index:collection.field where key lucene 'Харько~0.2' limit 3
mast work fine

Actual behavior

Dont work. OrientDB process corrupt. Database not accessible. Need kill + start.sh

Steps to reproduce

insert into class more 1 000 000 document with same "word" and select key lucene 'word~0.3' limit 2

select expand(rid) from index:collection.field where key lucene 'Харько~0.2' limit 3 - not work

@robfrank
Copy link
Contributor

How is the index created? Are you using a Russian analyzer or the standard one?

robfrank added a commit that referenced this issue Jun 30, 2017
robfrank added a commit that referenced this issue Jun 30, 2017
@robfrank
Copy link
Contributor

I'm unable to reproduce the problem. The commits I added are a little SQL script to create a sample dataset and a test. Tested in remote too.
May you provide more sample data and command used to create the index?

@robfrank
Copy link
Contributor

Documentation about lucene indexes:

http://orientdb.com/docs/last/Full-Text-Index.html

Yoy will find how to configure custom analyzers for fields

Without sample dataset and command used to create the index, I can't do more.

@apapacy
Copy link
Author

apapacy commented Jun 30, 2017

I'll do another test with an analyzer.

My simple data is 4 500 000 documents {message: 'Харьков - 1'}, {message: 'Харьков - 2'} ...
My query select expand(rid) from index:test.message key 'Харько~0.3' limit 2.

Maybe the "limit 2" is not working as I thought? And all data in processing?

@robfrank
Copy link
Contributor

robfrank commented Jun 30, 2017

Why don't simply use the select from test where lucene 'Харько~0.3' limit 2 ?

moreover, it seems that your "message" field are single term, why are you usingn the slop operator '~' ?

The limit is processed AFTER the search on the index, the lucene index only manage the lucene query, not the sql part. So the index could find 10000 docs and then only the firts 2 will be returned.
This is resonable, because the full-text query could be part of a more complex query, and the limit should work on the global result, not only on the index one.

@apapacy
Copy link
Author

apapacy commented Jun 30, 2017

#7193

If create 2 indexes on same column (fulltext lucene and some other) select not use right index but first created (Hobson's choice)

Same select where lucene 'Харько~0.3' limit 2 - work fine
But if only 1 index created on field.

So. My aim to use OriendDB with great lucene search on production without any problem.

@robfrank
Copy link
Contributor

Thanks for the explanation about the api. How many documents are extracted with the query without the limit clause? I tested only the behavior of the query, but I guess the problem is in the size of the result set in conjunction with expand. May you restrict the query to limit the number of documents retrieved?

@apapacy
Copy link
Author

apapacy commented Jun 30, 2017

My "limit" is 2.

My create statement
create class russian
create property russian.message string

create index russian.message on russian(message)
fulltext engine lucene metadata {
"analyzer": "org.apache.lucene.analysis.ru.RussianAnalyzer"
}

My insertion

odb.insert().into('russian').set({message: Харьков - ${k} - ${i} }).one().catch(function(err){console.log(err);});
k=1...1000
i=1...1000

So. there are 1 000 000 document of class "russian" and all match where condition

select * from russian where message lucene 'Харько0.1' limit 2 - work fine
select rid from index:russian.message where key lucene 'Харько
0.1' limit 2 - work fine

But this select seems as too heavy (about 10 s.)
select expand(rid) from index:russian.message where key lucene 'Харько~0.1' limit 2

image

If count of document not 1 000 000 but 4 500 000 process corrupt

@apapacy apapacy changed the title select expand(rid) from index:collection.field where key lucene - corrupt process select expand(rid) from index:collection.field where key lucene limit N - work slow Jun 30, 2017
@luigidellaquila
Copy link
Member

Hi @apapacy

The problem seems to be in the SQL query executor, not in the Lucene index plugin.
I'm fixing it now

Thanks

Luigi

@luigidellaquila
Copy link
Member

Hi @apapacy

I just pushed a fix on branch 2.2.x, the fix will be available in V 2.2.23

Thanks

Luigi

@apapacy
Copy link
Author

apapacy commented Jul 2, 2017

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

6 participants