Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding date-range queries with composite index #6795

Closed
1 task done
rdelangh opened this issue Oct 10, 2016 · 10 comments
Closed
1 task done

Question regarding date-range queries with composite index #6795

rdelangh opened this issue Oct 10, 2016 · 10 comments

Comments

@rdelangh
Copy link

OrientDB Version, operating system, or hardware.

  • v2.2.10

Operating System

  • Linux

I have a class 'myclass' with many properties, among which a DATETIME property named 'start_date'. I made a UNIQUE index
CREATE INDEX myindex ON myclass(start_date, filename, recordnumber) UNIQUE

If I want to search on a range of 'start_date' and a particular filename value. I would expect that the following query returns results in a reasonable timeframe:

SELECT max(someotherproperty) FROM myclass WHERE start_date BETWEEN '2016-10-07 10:00:00' AND '2016-10-07 12:00:00' AND filename = 'foobar'

However after 3 hours of waiting still no result returned, the query is still busy... There is approx 5M records per day in this class, and about 600M records in the entire class. So I suspect strongly that this query is doing a full scan of the class, not using the index.

I read the documentation on indexes and composite indexes, range queries etc. But cannot figure out what's happening (or not) with the above query. Asking an explain plan is of no use, I have the impression that this explain plan is determined from how the query has ran, after it did ran completely. Which takes hours (or more) to complete...

@luigidellaquila
Copy link
Member

Hi @rdelangh

did you try to do the query with LIMIT 1 and see how long it takes to return?

I think the easiest thing to do is to create a small dataset with your data model and try an EXPLAIN on that to make sure that the query is using the index

Thanks

Luigi

@rdelangh
Copy link
Author

I have another class "mds_csv_cf12" with many fewer records.
It has a unique, SBTREE index "idx_mds_csv_cf12_0" on the properties ["START_DATETIME","FILENAME","RECORD_NUMBER"]

I run a
SELECT * FROM mds_csv_cf12
and this shows me a list of 20 records, among which I see some that have "START_DATETIME" in the day June-22 2016.

However, if I run these queries

select count(*) FROM INDEX:idx_mds_csv_cf12_0 WHERE key between '2016-06-22 00:00:00' AND '2016-06-22 23:00:00'

or

select count(*) FROM INDEX:idx_mds_csv_cf12_0 WHERE key between ['2016-06-22 00:00:00'] AND ['2016-06-22 23:00:00']

the result comes back in a split second but is each time 0 (zero), however there is definately such records in the class...

The "explain select ..." statement shows me no info at all about using index :

{
"result": [
{
"@type": "d",
"@Version": 0,
"projectionElapsed": 0,
"groupByElapsed": 1,
"user": "#5:0",
"elapsed": 1.714646,
"resultType": "collection",
"resultSize": 1,
"@fieldTypes": "projectionElapsed=l,groupByElapsed=l,user=x,elapsed=f"
}
],
"notification": "Query executed in 0.11 sec. Returned 1 record(s)"
}
No comprendo ... :-(

@rdelangh
Copy link
Author

@luigidellaquila , @lvca
hi Luigi, I think I found the issue: the ODB syntax parser does not recognize a string such as '2016-06-22 00:00:00' as something that's a valid DATETIME value !

If I change the criteria with strings such as '2016-06-22 00:00:00.000' (that is including the milliseconds part), then the index is used, and the query runs very fast...

-> can this be enhanced in the parser, that it recognizes DATETIME values better from partial strings like '2016-06-22 00:00:00' or even '2016-06-22' ?

@luigidellaquila
Copy link
Member

Hi @rdelangh

Thanks, good catch!
I'll check it ASAP

Thanks

Luigi

@rdelangh
Copy link
Author

rdelangh commented Nov 2, 2016

@luigidellaquila
hi Luigi, any update on this, please ?

1 similar comment
@rdelangh
Copy link
Author

@luigidellaquila
hi Luigi, any update on this, please ?

@rdelangh
Copy link
Author

hello ODB gurus, is here any update possible please?

@luigidellaquila
Copy link
Member

Hi @rdelangh

So sorry, let me check it now...

Luigi

@luigidellaquila
Copy link
Member

Hi @rdelangh

I just tried it on latest 2.2.17 and it seems to correctly use the index... maybe it's already fixed?
Do you have a chance to try it on your dataset and see if it actually is?

Thanks

Luigi

@luigidellaquila
Copy link
Member

Hi @rdelangh

I'm closing this, it should be solved long time ago.

Please feel free to re-open if you see any related problems

Thanks

Luigi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants