Performance Tuning

Guide to improve performance based on the use case

Introduction

This guide contains the general tips to optimize your application that use the OrientDB. Below you can find links for the specific guides different per database type used.

General settings

JVM settings

The JVM settings suggested when you run your application that uses OrientDB are:

    -server -XX:+AggressiveOpts -XX:CompileThreshold=200

Configuration

OrientDB can be configured in several ways. To know the current configuration use the console with the config command.

To dump the OrientDB configuration you can set a parameter at JVM launch:

    java -Denvironment.dumpCfgAtStartup=true ...

Or via API at any time:

    OGlobalConfiguration.dumpConfiguration(System.out);

By command line

java -Dcache.size=10000 -Dstorage.keepOpen=true ...

By server configuration

Put in the <properties> section of the file orientdb-server-config.xml the entries to configure. Example:

      ...
      <properties>
        <entry name="cache.size" value="10000" />
        <entry name="storage.keepOpen" value="true" />
      </properties>
      ...

At run-time

      OGlobalConfiguration.MVRBTREE_NODE_PAGE_SIZE.setValue(2048);

Parameters

To know more look at the Java enum: OGlobalConfiguration.java.

Area	Parameter	Default 32bit	Default 64bit	Default Server 32bit	Default Server 64bit	Allowed input	Description	Since
Environment	`environment.dumpCfgAtStartup`	true	true	true	true	true or false	Dumps the configuration at application startup
Environment	`environment.concurrent`	true	true	true	true	true or false	Specifies if running in multi-thread environment. Setting this to false turns off the internal lock management
Memory	`memory.optimizeThreshold`	0.85	0.85	0.85	0.85	0.5-0.95	Threshold of heap memory where to start the optimization of memory usage. Deprecated since 1.0rc7
Storage	`storage.keepOpen`	true	true	true	true	true or false	Tells to the engine to not close the storage when a database is closed. Storages will be closed when the process will shutdown
Storage	`storage.record.lockTimeout`	5000	5000	5000	5000	0-N	Maximum timeout in milliseconds to lock a shared record
Cache	`cache.level1.enabled`	true	true	false	false	true or false	Uses the level-1 cache
Cache	`cache.level1.size`	-1	-1	0	0	-1 - N	Size of the Level-1 cache in terms of record entries. -1 means no limit but when the free Memory Heap is low then cache entries are freed
Cache	`cache.level2.enabled`	true	true	false	false	true or false	Uses the level-2 cache
Cache	`cache.level2.size`	-1	-1	0	0	-1 - N	Size of the Level-2 cache in terms of record entries. -1 means no limit but when the free Memory Heap is low then cache entries are freed
Database	`db.mvcc`	true	true	true	true	true or false	Enable Multi Version Control Checking (MVCC) or not
Database	`object.saveOnlyDirty`	false	false	false	false	true or false	Object Database saves only object bound to dirty records
Database	`nonTX.recordUpdate.synch`	false	false	false	false	true or false	Executes a synch against the file-system at every record operation. This slows down records updates but guarantee reliability on unreliable drives
Transaction	`tx.useLog`	true	true	true	true	true or false	Transactions use log file to store temporary data to being rollbacked in case of crash
Transaction	`tx.log.fileType`	classic	classic	classic	classic	'classic' or 'mmap'	File type to handle transaction logs: mmap or classic
Transaction	`tx.log.synch`	false	false	false	false	true or false	Executes a synch against the file-system for each log entry. This slows down transactions but guarantee transaction reliability on non-reliable drives
Transaction	`tx.commit.synch`	false	false	true	true	true or false	Synchronizes the storage after transaction commit (see [Disable the disk synch](#Disable_the_disk_synch))
TinkerPop Blueprints	`blueprints.graph.txMode`	0	0	0	0	0 or 1	Transaction mode used in TinkerPop Blueprints implementation. 0 = Automatic (default), 1 = Manual
Index	`index.auto.rebuildAfterNotSoftClose`	true	true	true	true	Auto rebuild all automatic indexes after upon database open when wasn't closed propertly	1.3.0
MVRB Tree (index and dictionary)	`mvrbtree.lazyUpdates`	20000	20000	1	1	-1=Auto, 0=always lazy until express lazySave() is called by application, 1=No lazy, commit at each change. >1=Commit at every X changes	Configure the MVRB Trees (indexes and dictionaries) as buffered or not
MVRB Tree (index and dictionary)	`mvrbtree.nodePageSize`	128	128	128	128	63-65535	Page size of each single node. 1,024 means that 1,024 entries can be stored inside a node
MVRB Tree (index and dictionary)	`mvrbtree.loadFactor`	0.7f	0.7f	0.7f	0.7f	0.1-0.9	HashMap load factor
MVRB Tree (index and dictionary)	`mvrbtree.optimizeThreshold`	200000	200000	200000	200000	10-N	Auto optimize the MVRB Tree every X operations as get, put and remove. -1=Auto (default)
MVRB Tree (index and dictionary)	`mvrbtree.entryPoints`	16	16	16	16	1-200	Number of entry points to start searching entries
MVRB Tree (index and dictionary)	`mvrbtree.optimizeEntryPointsFactor`	1.0f	1.0f	1.0f	1.0f	0.1-N	Multiplicand factor to apply to entry-points list (parameter `mvrbtree.entrypoints`) to determine if needs of optimization
MVRB Tree RIDs (index and dictionary)	`mvrbtree.ridBinaryThreshold`	8	8	8	8	-1 - N	Valid for set of rids. It's the threshold as number of entries to use the binary streaming instead of classic string streaming. -1 means never use binary streaming
MVRB Tree RIDs (index and dictionary)	`mvrbtree.ridNodePageSize`	16	16	16	16	4 - N	Page size of each treeset node. 16 means that 16 entries can be stored inside each node
MVRB Tree RIDs (index and dictionary)	`mvrbtree.ridNodeSaveMemory`	False	False	False	False	true or false	Save memory usage by avoid keeping RIDs in memory but creating them at every access
Lazy Collections	`lazyset.workOnStream`	true	true	false	false	true or false	Work directly on streamed buffer to reduce memory footprint and improve performance
File (I/O)	`file.lock`	false	false	false	false	true or false	Locks the used files so other process can't modify them
File (I/O)	`file.defrag.strategy`	0	0	0	0	0,1	Strategy to recycle free space. 0=recycles the first hole with enough size (default): fast, 1=recycles the best hole: better usage of space but slower
File (I/O)	`file.defrag.holeMaxDistance`	32768 (32Kb)	32768 (32Kb)	32768 (32Kb)	32768 (32Kb)	8K-N	Max distance in bytes between holes to execute the defrag of them. Set it to -1 to use dynamic size. Pay attention that is db is huge, then moving blocks to defrag could be expensive
File (I/O)	`file.mmap.useOldManager`	false	false	false	false	true or false	Manager that will be used to handle mmap files. true = USE OLD MANAGER, false = USE NEW MANAGER
File (I/O)	`file.mmap.lockMemory`	true	true	true	true	true or false	When using new map manager this parameter specify prevent memory swap or not. true = LOCK MEMORY, false = NOT LOCK MEMORY(If you want this parameter take effect you need to have Orient Native OS jar in class path)
File (I/O)	`file.mmap.strategy`	0	0	0	0	0-4	Strategy to use with memory mapped files. 0 = USE MMAP ALWAYS, 1 = USE MMAP ON WRITES OR ON READ JUST WHEN THE BLOCK POOL IS FREE, 2 = USE MMAP ON WRITES OR ON READ JUST WHEN THE BLOCK IS ALREADY AVAILABLE, 3 = USE MMAP ONLY IF BLOCK IS ALREADY AVAILABLE, 4=NEVER USE MMAP
File (I/O)	`file.mmap.blockSize`	1048576 (1Mb)	1048576 (1Mb)	1048576 (1Mb)	1048576 (1Mb)	10k-N	Size of the memory mapped block(this property takes effect only if file.mmap.useOldManager is set up to true)
File (I/O)	`file.mmap.bufferSize`	8192 (8Kb)	8192 (8Kb)	8192 (8Kb)	8192 (8Kb)	1K-N	Size of the buffer for direct access to the file through the channel(this property takes effect only if file.mmap.useOldManager is set up to true)
File (I/O)	`file.mmap.maxMemory`	134217728 (134Mb)	(maxOsMemory - maxProcessHeapMemory) / 2	134217728 (134Mb)	(maxOsMemory - maxProcessHeapMemory) / 2	100000-the maximum allowed by OS	Max memory allocable by memory mapping manager. Note that on 32bit OS the limit is to 2Gb but can change to OS by OS(this property takes effect only if file.mmap.useOldManager is set up to true)
File (I/O)	`file.mmap.overlapStrategy`	2	2	2	2	0-2	Strategy when a request overlap in-memory buffers: 0 = Use the channel access, 1 = force the in memory buffer and use the channel access, 2 = always create an overlapped in-memory buffer (default) (this property takes effect only if file.mmap.useOldManager is set up to true)
File (I/O)	`file.mmap.forceDelay`	500 (0.5sec)	500 (0.5sec)	500 (0.5sec)	500 (0.5sec)	100-5000	Delay time in ms to wait for another force flush of the memory mapped block to the disk
File (I/O)	`file.mmap.forceRetry`	20	20	20	20	0-N	Number of times the memory mapped block will try to flush to the disk
JNA	`jna.disable.system.library`	true	true	true	true	true or false	This property disable to using JNA installed in your system. And use JNA bundled with database.
Networking (I/O)	`network.socketBufferSize`	32768	32768	32768	32768	8K-N	TCP/IP Socket buffer size
Networking (I/O)	`network.lockTimeout`	15000 (15secs)	15000 (15secs)	15000 (15secs)	15000 (15secs)	0-N	Timeout in ms to acquire a lock against a channel, 0=no timeout
Networking (I/O)	`network.socketTimeout`	10000 (10secs)	10000 (10secs)	10000 (10secs)	10000 (10secs)	0-N	TCP/IP Socket timeout in ms, 0=no timeout
Networking (I/O)	`network.retry`	5	5	5	5	0-N	Number of times the client connection retries to connect to the server in case of failure
Networking (I/O)	`network.retryDelay`	500 (0.5sec)	500 (0.5sec)	500 (0.5sec)	500 (0.5sec)	1-N	Number of ms the client wait to reconnect to the server in case of failure
Networking (I/O)	`network.binary.maxLength`	100000 (100Kb)	100000 (100Kb)	100000 (100Kb)	100000 (100Kb)	1K-N	TCP/IP max content length in bytes of BINARY requests
Networking (I/O)	`network.binary.readResponse.maxTime`	30	30	30	30	0-N	Maximum time (in seconds) to wait until response will be read. Otherwise response will be dropped from chanel	1.0rc9
Networking (I/O)	`network.binary.debug`	false	false	false	false	true or false	Debug mode: print all the incoming data on binary channel
Networking (I/O)	`network.http.maxLength`	100000 (100Kb)	100000 (100Kb)	100000 (100Kb)	100000 (100Kb)	1000-N	TCP/IP max content length in bytes of HTTP requests
Networking (I/O)	`network.http.charset`	utf-8	utf-8	utf-8	utf-8	Supported HTTP charsets	Http response charset
Networking (I/O)	`network.http.sessionExpireTimeout`	300 (5min)	300 (5min)	300 (5min)	300 (5min)	0-N	Timeout to consider a http session expired in seconds
Profiler	`profiler.enabled`	false	false	false	false	true or false	Enable the recording of statistics and counters
Profiler	`profiler.autoDump.interval`	0	0	0	0	0=inactive >0=time in seconds	Dumps the profiler values at regular intervals. Time is expressed in seconds	1.0rc8
Profiler	`profiler.autoDump.reset`	true	true	true	true	true or false	Resets the profiler at every auto dump	1.0rc8
Profiler	`profiler.config`	null	null	null	null	String with 3 values separated by comma with the format: `<seconds-for-snapshot>,<archive-snapshot-size>,<summary-size>`	Configure the profiler	1.2.0
Log	`log.console.level`	info	info	info	info	fine, info, warn, error	Console's logging level
Log	`log.file.level`	fine	fine	fine	fine	fine, info, warn, error	File's logging level
Client	`client.channel.minPool`	1	1	1	1	1-N	Minimum size of the channel pool
Client	`client.channel.maxPool`	5	5	5	5	1-N	maximum size of the channel pool
Server	`server.channel.cleanDelay`	5000	5000	5000	5000	0-N	Time in ms of delay to check pending closed connections	1.0
Server	`server.log.dumpClientExceptionLevel`	FINE	FINE	FINE	FINE	OFF, FINE, CONFIG, INFO, WARNING, SEVERE	Logs client exceptions. Use any level supported by Java java.util.logging.Level class	1.0
Server	`server.log.dumpClientExceptionFullStackTrace`	false	false	false	false	true or false	Dumps the full stack trace of the exception to sent to the client	1.0
Server	`server.cache.staticFile`	false	false	false	false	true or false	Cache static resources after loaded. It was `server.cache.file.static` before 1.0
Distributed cluster	`distributed.async.timeDelay`	0	0	0	0	0-N	Delay time (in ms) of synchronization with slave nodes. 0 means early synchronization
Distributed cluster	`distributed.sync.maxRecordsBuffer`	100	100	100	100	0-10000	Maximum number of records to buffer before to send to the slave nodes

NOTE: On 64-bit systems you have not the limitation of 32-bit systems with memory.

Memory optimization

What can makes the difference is the right balancing between the heap and the virtual memory used by Memory Mapping, specially on large datasets (GBs, TBs and more) where the in memory cache structures count less than raw IO.

For example if you can assign 4GB to the Java process, it could be better assigning small heap and large Virtual Memory. Rather than:

    java -Xmx4g ...

You could instead try this:

    java -Xmx800m -Dfile.mmap.maxMemory=3.2gb ...

The parameter file.mmap.maxMemory tells how much memory to use for the Memory Mapping at the storage level. The default value for 32-bit systems is very tiny (134 Mb) but with 32-bit architecture you have a lot of limitation and you need to pay attention to set it too large. On 64 bit systems I suggest to set it to: ( os tot memory - orientdb heap ) * 85%. 85% should be reduced when you're running other memory expensive process on your OS.

On 64 bit systems the default value is (maxOsMemory - maxProcessHeapMemory) / 2.

NOTE: If you use too much memory your system will goes in swap and the entire machine will slow down. Play with this parameter in order to find the best value for your configuration.

File System access strategy

This is more technical. It tells to the storage engine the strategy to use when access to the file system. Previous versions always used the 0 strategy, namely uses Memory Mapping techniques for all. Mode 1 tells to use Memory Mapping but on reads only if there is room in memory, otherwise regular Java NIO file channel read will be used. The strategy 2 is more conservative since reads will use Memory Mapping only if the requested data has already been loaded in memory. The strategy 3 means use Memory Mapping until there is space in the pool, then use regular Java NIO file channel read/write. Strategy 4 means don't use Memory Mapping at all.

By default the strategy 1 is used, but feel free to test the others to know what is the best for your use case.

Remote connections

There are many ways to improve performance when you access to the database using the remote connection.

Network Connection Pool

Each client, by default, uses only one network connection to talk with the server. Multiple threads on the same client share the same network connection pool.

When you've multiple threads could be a bottleneck since a lot of time is spent on waiting for a free network connection. This is the reason why is much important to configure the network connection pool.

The configurations is very simple, just 2 parameters:

minPool, is the initial size of the connection pool. The default value is configured as global parameters "client.channel.minPool" (see parameters)
maxPool, is the maximum size the connection pool can reach. The default value is configured as global parameters "client.channel.maxPool" (see parameters)

At first connection the minPool is used to pre-create network connections against the server. When a client thread is asking for a connection and all the pool is busy, then it tries to create a new connection until maxPool is reached.

If all the pool connections are busy, then the client thread will wait for the first free connection.

Example of configuration by using database properties:

    database = new ODatabaseDocumentTx("remote:localhost/demo");
    database.setProperty("minPool", 2);
    database.setProperty("maxPool", 5);
    
    database.open("admin", "admin");

Enlarge timeouts

If you see a lot of messages like:

    WARNIGN: Connection re-acquired transparently after XXXms and Y retries: no errors will be thrown at application level

means that probably default timeouts are too low and server side operation need more time to complete. It's strongly suggested you enlarge your timeout only after tried to enlarge the Network Connection Pool. The timeout parameters to tune are:

network.lockTimeout, the timeout in ms to acquire a lock against a channel. The default is 15 seconds.
network.socketTimeout, the TCP/IP Socket timeout in ms. The default is 10 seconds.

Query

Use of indexes

The first improvement to speed up queries is to create Indexes against the fields used in WHERE conditions. For example this query:

    SELECT FROM Profile WHERE name = 'Jay'

Browses the entire "profile" cluster looking for records that satisfy the conditions. The solution is to create an index against the 'name' property with:

    CREATE INDEX profile.name UNIQUE

Use NOTUNIQUE instead of UNIQUE if the value is not unique.

For more complex queries like

    select * from testClass where prop1 = ? and prop2 = ?

Composite index should be used

    CREATE INDEX compositeIndex ON testClass (prop1, prop2) UNIQUE

or via Java API:

    oClass.createIndex("compositeIndex", OClass.INDEX_TYPE.UNIQUE, "prop1", "prop2");

Moreover, because of partial match searching, this index will be used for optimizing query like

    select * from testClass where prop1 = ?

For deep understanding of query optimization look at the unit test: http://code.google.com/p/orient/source/browse/trunk/tests/src/test/java/com/orientechnologies/orient/test/database/auto/SQLSelectIndexReuseTest.java

Right usage of the graph

OrientDB is a graph database. This means that traversing is very efficient. You can use this feature to optimize queries. A common technique is the Pivoting.

Avoid use of @rid in WHERE conditions

Using @rid in where conditions slow down queries. Much better to use the RecordID as target. Example:

Change this:

    SELECT FROM Profile WHERE @rid = #10:44

With this:

    SELECT FROM #10:44

Also

    SELECT FROM Profile WHERE @rid IN [#10:44, #10:45]

With this:

    SELECT FROM [#10:44, #10:45]

Massive Insertion

Use the Massive Insert intent

Intents suggest to OrientDB what you're going to do. In this case you're telling to OrientDB that you're executing a massive insertion. OrientDB auto-reconfigure itself to obtain the best performance. When done you can remove the intent just setting it to null.

Example:

    db.declareIntent( new OIntentMassiveInsert() );
    
    // YOUR MASSIVE INSERTION
    
    db.declareIntent( null );

Massive Updates

Updates generates "holes" at Storage level because rarely the new record fits perfectly the size of the previous one. Holes are free spaces between data. Holes are recycled but an excessive number of small holes it's the same as having a highly defragmented File System: space is wasted (because small holes can't be easily recycled) and performance degrades when the database growth.

Oversize

If you know you will update certain type of records, create a class for them and set the Oversize (default is 0) to 2 or more.

By default the OGraphVertex class has an oversize value setted at 2. If you define your own classes set this value at least at 2.

OClass myClass = getMetadata().getSchema().createClass("Car"); myClass.setOverSize(2);

Wise use of transactions

To obtain real linear performance with OrientDB you should avoid to use Transactions as far as you can. In facts OrientDB keeps in memory all the changes until you flush it with a commit. So the bottleneck is your Heap space and the management of local transaction cache (implemented as a Map).

Transactions slow down massive inserts unless you're using a "remote" connection. In that case it speeds up all the insertion because the client/server communication happens only at commit time.

Disable Transaction Log

If you need to group operations to speed up remote execution in a logical transaction but renouncing to the Transaction Log, just disable it by setting the property tx.useLog to false.

Via JVM configuration:

    java ... -Dtx.useLog=false ...

or via API:

    OGlobalConfiguration.TX_USE_LOG.setValue(false);

NOTE: Please note that in case of crash of the JVM the pending transaction OrientDB could not be able to rollback it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly