Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.findBucket (OSBTree.java:1754) #7053

Closed
4 tasks
jamieb22 opened this issue Jan 3, 2017 · 19 comments
Assignees
Labels
Milestone

Comments

@jamieb22
Copy link

jamieb22 commented Jan 3, 2017

OrientDB Version, operating system, or hardware.

v2.2.14

Operating System

  • [x ] Linux
  • MacOSX
  • Windows
  • Other Unix
  • Other, name?

Expected behavior and actual behavior

We do not expect to receive null ptr exceptions.

017-01-03 13:04:38 c.s.a.te [ERROR] null
java.lang.NullPointerException: null
at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.findBucket(OSBTree.java:1754)
at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.get(OSBTree.java:200)
at com.orientechnologies.orient.core.index.engine.OSBTreeIndexEngine.get(OSBTreeIndexEngine.java:128)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doGetIndexValue(OAbstractPaginatedStorage.java:1777)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getIndexValue(OAbstractPaginatedStorage.java:1766)
at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:56)
at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:40)
at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:58)
at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:262)
at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:40)
at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.getVertices(OrientBaseGraph.java:813)
..
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

Steps to reproduce the problem

@andrii0lomakin
Copy link
Member

@jamieb22 it means that either tree is broken because a bug in tree algorithm or database was crashed and then restored correctly. Could you provide feedback whether the database was shutdown incorrectly? And in opposite case could you send me database indexes so I will analyze them ?

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

It is possible that it wasn't shutdown correctly though this instance is running on one of our cloud servers. We have a policy of not using kill -9 unless there is no other choice. Due to the fact that the instance is running in the cloud, nor does the machine ever experience a sudden loss of power. I'll try to get the db. Andrey, I think you know this already, but it needs to be stated. A general problem with Orient DB continues to be that the database is brittle, and over time information is lost. We see this behavior on pretty much all installations.

@andrii0lomakin
Copy link
Member

andrii0lomakin commented Jan 4, 2017

@jamieb22 do you mean the information is lost in case of incorrect shutdown? We never stated that it will not be lost on 100%, the last operations will always be lost in case of an incorrect shutdown. We try to achieve integrity of data but not 100% save of data in case of hard kill, there is no database wich will assure that information will not be lost. Did you log when and why hard kill is caused?

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

Andrey I never said that it was a hard kill. I said it was a possibility, but unlikely. Is the Orient DB index affected? is there a way to get Orient to reindex? Can I simply delete certain index files?

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

FYI: Our product is using MVStore heavily for queuing purposes and it never gets corrupted. Yet, we see Orient DB corruption on a regular basis. Both databases are attached to the same shutdown routines.

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

Andrey, on a more constructive note. There are a huge number of WAL Closer Task. Does Orient DB have a certain number of WAL Closer Task per database? the scenario where there are a large number of databases, this would result in a large number of threads. If you are not doing it already, wouldn't it be more appropriate to use a common thread pool across multiple Orient databases?

#1669 daemon prio=5 os_prio=0 tid=0x00007fc2d98c9800 nid=0x4a94 waiting on condition [0x00007fbf713be000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000648589a80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@andrii0lomakin
Copy link
Member

@jamieb22 >here are a huge number of WAL Closer Task
Since 2.2.15 we have only one WAL Closer task.

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

I dont see 2.2.15. The latest version is 2.2.14 http://orientdb.com/download/

@andrii0lomakin
Copy link
Member

Yes I mean upcoming 2.2.15 version has only one such task.

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

Thank you. Is there a way to fix this Null Pointer on the Index? Can I delete index files?

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

FYI: I believe we have a similar problem with the Write Cache Flush Task. On systems with many databases, there are a huge number of Write Cache Flush threads. This is not so scalable.

OrientDB Write Cache Flush Task" #5445 daemon prio=10 os_prio=0 tid=0x00007fb05868f800
nid=0x51be runnable [0x00007fad7619f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000064b71bd10> (a java.util.concurrent.locks.AbstractQueuedSynchron
izer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueu
edSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExe
cutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExe
cutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@jamieb22
Copy link
Author

jamieb22 commented Jan 4, 2017

2017-01-04 13_40_20-tomcat - yourkit java profiler 2016 02-b43 - 64-bit

The above illustrates the point. On one machine, we have around 100 or so Write Cash Flush threads and none of them are working particularly hard. It seems a waste of resources to have so many threads.

@andrii0lomakin
Copy link
Member

That is WAL Flush Task, not Write Cache, it is also fixed in 2.2.15 version.

@andrii0lomakin
Copy link
Member

About NPE @taburet will look on it . You can execute index deleter and rebuild and sure it will fix npe itself.

@taburet
Copy link
Contributor

taburet commented Jan 9, 2017

Hi @jamieb22! It's hard to tell what is wrong with the database at this point, I'm considering following scenarios:

  1. Database, at least indexes, was corrupted because of OOMs (Very high memory consumption #6922). This is the most probable scenario. The easiest way to fix this is to rebuild the indexes and hope the actual data is not corrupted. OOMs in general are pain since if you catch an OOM you can't do anything about it. OrientDB 3.x will bring a new query processing engine with much lower memory footprint.

  2. There is a very rare problem with WAL "head" cutting in 2.2.x which we fixing now. But it's very unlikely that your data was affected by this problem.

  3. There still may be some hidden bug in binary data manipulation while tracking transaction changes. That is pretty unlikely too.

Please provide the sample of a corrupted database, if that is possible. I will try to analyse it to recover more information about the issue.

@andrii0lomakin
Copy link
Member

@jamieb22 we found two issues which are with high probability may be root causes of your problem all of them will be fixed in next 2.2.15 which I ask to release ASAP.

@andrii0lomakin
Copy link
Member

@jamieb22 fixed will be in 2.2.15 which is scheduled for next week. You use Lucene which uses MMapDirectory which means Lucene data files consume whole OS page buffer. As a result was a risk in previous versions that pages are written only partially now this bug is fixed. I do suggest you make export-import of a database once you update your DB version to new one.

@jamieb22
Copy link
Author

jamieb22 commented Jan 13, 2017

Andrey I tried out your patch. I still see same error. I assume it cannot proceed since the database is corrupted? Is this correct?

2017-01-13 17:08:30 c.s.a.s.e.ChainedException [ERROR] null
java.lang.NullPointerException: null
        at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.findBucket(OSBTree.java:1765)
        at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.get(OSBTree.java:200)
        at com.orientechnologies.orient.core.index.engine.OSBTreeIndexEngine.get(OSBTreeIndexEngine.java:128)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doGetIndexValue(OAbstract
PaginatedStorage.java:1779)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getIndexValue(OAbstractPa
ginatedStorage.java:1768)
        at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:56)
        at com.orientechnologies.orient.core.index.OIndexOneValue.get(OIndexOneValue.java:40)
        at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:58)
        at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:262)
        at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:40)
        at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.getVertices(OrientBaseGraph.java:835)
        at 
:197)

@andrii0lomakin
Copy link
Member

@jamieb22 could you make what I wrote to you, this fix prevents a problem, but if data is already broken it obviously not help. also explained in the previous message why it happened.
" As a result was a risk in previous versions that pages are written only partially now this bug is fixed. I do suggest you make export-import of a database once you update your DB version to new one."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants