Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create a vertex and caught OSBTreeBonsaiLocalException #7584

Closed
mexeniz opened this issue Jul 25, 2017 · 19 comments
Closed

Failed to create a vertex and caught OSBTreeBonsaiLocalException #7584

mexeniz opened this issue Jul 25, 2017 · 19 comments
Labels

Comments

@mexeniz
Copy link

mexeniz commented Jul 25, 2017

OrientDB Version: v.2.2.10

Java Version: 1.8.0_112

OS: FreeBSD 11.0-RELEASE-p1

Hi, I ran OrientDB server on FreeBSD 11.
I have found a problem about running sql to create vertices via binary protocol.
My query is to create a new vertex and update values of its predecessors.
When a server is under heavy load, it randomly fails to process some queries.

My query

let file = create vertex files set name=‘file1.txt’, size=10
create edge hardlink from $file to #32:12951 set type='file’
update (select from (traverse out('HardLink','ShareLink') from $file) where @Class='folders') increment size = 10
update (select from (traverse out('HardLink','ShareLink', 'MemberOf') from $file) where @Class='users' OR @Class='domains') increment usage = 10
commit retry 100

Here is error log from orientdb.err.

Error Log

2017-07-24 17:45:09:778 INFO {db=FTO-Structure-2.0} null possible previous close - Flush [OChannelBinaryServer]
2017-07-24 17:45:09:778 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Reading byte (1 byte)... [OChannelBinaryServer]
2017-07-24 17:45:09:921 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Read byte: 41 [OChannelBinaryServer]
2017-07-24 17:45:09:921 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Reading int (4 bytes)... [OChannelBinaryServer]
2017-07-24 17:45:09:921 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Read int: 12 [OChannelBinaryServer]
2017-07-24 17:45:09:921 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Reading byte (1 byte)... [OChannelBinaryServer]
2017-07-24 17:45:10:020 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Read byte: 115 [OChannelBinaryServer]
2017-07-24 17:45:10:020 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Reading chunk of bytes. Reading chunk length as int (4 bytes)... [OChannelBinaryServer]
2017-07-24 17:45:10:020 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Read chunk lenght: 762 [OChannelBinaryServer]
2017-07-24 17:45:10:021 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Reading 762 bytes... [OChannelBinaryServer]
2017-07-24 17:45:10:021 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Read 762 bytes: ����s����SQL����begin
let file = create vertex files set name=‘file1.txt’
create edge hardlink from $file to #32:12951 set type='file’, size=10
update (select from (traverse out('HardLink','ShareLink') from $file) where @Class='folders') increment size = 10
update (select from (traverse out('HardLink','ShareLink', 'MemberOf') from $file) where @Class='users' OR @Class='domains') increment usage = 10
commit retry 100
return $file�� [OChannelBinaryServer]
2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} null possible previous close - Flush [OChannelBinaryServer]
2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing byte (1 byte): 1 [OChannelBinaryServer]
2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing int (4 bytes): 12 [OChannelBinaryServer]

2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing byte (1 byte): 1 [OChannelBinaryServer]
2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing string (4+71=75 bytes): com.orientechnologies.orient.core.exception.OSBTreeBonsaiLocalException [OChannelBinaryServer]
2017-07-24 17:45:10:024 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing string (4+112=116 bytes): Error creation of sbtree with namecollections_32
DB name="FTO-Structure-2.0"
Component Name="collections_32" [OChannelBinaryServer]
2017-07-24 17:45:10:025 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing byte (1 byte): 1 [OChannelBinaryServer]

2017-07-24 17:45:10:025 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing string (4+30=34 bytes): java.lang.NullPointerException [OChannelBinaryServer]
2017-07-24 17:45:10:025 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing string (4+0=4 bytes): null [OChannelBinaryServer]
2017-07-24 17:45:10:025 INFO {db=FTO-Structure-2.0} /10.10.9.44:59678 - Writing byte (1 byte): 0 [OChannelBinaryServer]

Could you tell me what is OSBTreeBonsaiLocalException and how to prevent this error?
Thank you.

@andrii0lomakin
Copy link
Member

@mexeniz there are a lot of reasons why it can happen. Do you have a full stack trace or could you send us database and test case to reproduce the issue?

@andrii0lomakin
Copy link
Member

@mexeniz as another possible tool for detection of the issue I can provide you new build which you can use, to identify the root cause of the issue.

@mexeniz
Copy link
Author

mexeniz commented Jul 26, 2017

@Laa Sorry, My database is confidential so I can't send it and this problem is completely random. Anyway, could you tell how me how to get a full stack trace? Also, I would like to try your new build. Thank you.

@andrii0lomakin
Copy link
Member

Hi, @mexeniz. I see I think the best way to understand the root cause of your exception is to upgrade to latest 2.2.24 version. It contains much more detailed logging of exceptions which are thrown by storage. Also, it provides some data verification tools which may be handy to identify the reason of your problem. WDYT ?

@mexeniz
Copy link
Author

mexeniz commented Jul 26, 2017

@Laa Alright, I will upgrade it to 2.2.24. Should I edit any orientdb config to get more detailed log?

@mexeniz
Copy link
Author

mexeniz commented Jul 26, 2017

I have upgraded orientdb to 2.2.24 and set log level to ALL.
Still found errors. Error log is in an attached file.
error_log.txt

@andrii0lomakin
Copy link
Member

@mexeniz I suppose we will need to add more logging to identify the root cause of the issue. Right now I can not do it because I am busy with another fix, but tomorrow I will be able to provide you new distribution with more logging which will help us to solve your issue.

@andrii0lomakin
Copy link
Member

@mexeniz sorry for the delay, we had an urgent issue. I will create a build for you with additional logging at first half of the day tomorrow. I very appreciate your help, but we had to concentrate on urgent task today so all tasks are delayed a bit.

@andrii0lomakin
Copy link
Member

Hi, @mexeniz , could you run this distribution and send me the output of the log? Probably that is not latest build with logs which I send to you, but it will make our target of fixing issue closer https://drive.google.com/file/d/0B2oZq2xVp841SDU2N0JkNWNTSVk/view?usp=sharing .

@mexeniz
Copy link
Author

mexeniz commented Jul 28, 2017

@Laa Thank you for this distribution. I will have time to test with the new distribution again on next Monday and send you logs ASAP.

@mexeniz
Copy link
Author

mexeniz commented Jul 31, 2017

@Laa Just finished testing with the new distribution. I still found the problem and error log is in attached file below.
error_log2.txt

The same logging configs were used, but log format were likely to be changed a lot. Here are logging configs that I changed from default.

1. orientdb-server-log.properties

handlers = java.util.logging.ConsoleHandler, java.util.logging.FileHandler

.level = ALL
com.orientechnologies.level = ALL
com.orientechnologies.orient.server.distributed.level = INFO

java.util.logging.ConsoleHandler.level = INFO
java.util.logging.ConsoleHandler.formatter = com.orientechnologies.common.log.OAnsiLogFormatter

java.util.logging.FileHandler.level = INFO
java.util.logging.FileHandler.pattern=../log/orient-server.log
java.util.logging.FileHandler.formatter = com.orientechnologies.common.log.OLogFormatter
java.util.logging.FileHandler.limit=10000000
java.util.logging.FileHandler.count=10

2. a part from orientdb-server-config.xml

<properties>
        <!-- DATABASE POOL: size min/max -->
        <entry name="db.pool.min" value="1"/>
        <entry name="db.pool.max" value="50"/>

        <!-- PROFILER: configures the profiler as <seconds-for-snapshot>,<archive-snapshot-size>,<summary-size> -->
        <entry name="profiler.enabled" value="false"/>
        <!-- <entry name="profiler.config" value="30,10,10" /> -->
	<entry value="finest" name="log.console.level" />
	<entry value="finest" name="log.file.level" />
    </properties>

@andrii0lomakin
Copy link
Member

@mexeniz I suppose I found the reason of your issue, prior 2.2.15 version it was an issue, which appears exactly during high load. The problem was that data files could be flushed only partially. Unfortunately damaged data structure can be fixed only by using export/import from to JSON.
So I suggest you do following:

  1. Export database to JSON.
  2. Create a database using 2.2.24 version.
  3. Import database from JSON in a newly created database.
  4. Run the system using 2.2.24 version for a couple of weeks, if an issue will not be reproduced (and I suppose that it will be the case) we can close given bug.

Could you make steps which I wrote to you?

@mexeniz
Copy link
Author

mexeniz commented Aug 1, 2017

@Laa Yes, I could. Thank you for the instructions. I will report to you again if an issue is reproduced.

@mexeniz
Copy link
Author

mexeniz commented Aug 4, 2017

@Laa I have tried to export and import database. There was a problem that a rid of some entries was changed. I have also tried importing with '-preserveClusterIDs=true' but rids were still changed. Then, I investigated the entries in exported database JSON file and found that their rid was the same as one in original database.

Example query
select @rid, username from users where username.toLowerCase() in ["u1@cat.com","u2@cat.com","u3@cat.com","u4@cat.com"]

Result from original
screen shot 2560-08-04 at 11 28 58 am
Result from re-import
screen shot 2560-08-04 at 11 29 14 am

I have 2 questions.

  1. Is this normal for damaged database that rid might be changed?
  2. Is 'preserveClusterIDs' flag actually for preventing rid change?

PS. Load test is in progress.

@andrii0lomakin
Copy link
Member

@mexeniz rids never persevered during export import, does not matter whether DB is damaged or not. Is it a problem for you ? Do you use rids as external links? We typically suggest not doing that.

@andrii0lomakin
Copy link
Member

About your second question, this flag is used to preserve ids of clusters, not rids of records. I mean for example cluster with the name has "cat1" has id = 42, after import, the same cluster will have the same id 42 but rids of records will be changed.

@mexeniz
Copy link
Author

mexeniz commented Aug 4, 2017

@Laa Thank you very much for answers. Non-preserved record id is a problem and I use it as an external link, but it is okay. I would do a little more work on migration script.

@mexeniz
Copy link
Author

mexeniz commented Aug 10, 2017

@iaa I've done many load tests and also tested with OrientDB v. 2.2.18 which we used on production. An issue wasn't reproduced anymore. Thank you very much for your help!

@andrii0lomakin
Copy link
Member

@mexeniz cool but I strongly recommend to update to at least 2.2.20 version. It will prevent possible problems with memory consumption in your case I think there will be no difference to update to 2.2.20 or to 2.2.25 so probably better to update to 2.2.25 at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants