Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum 2 not reached #6438

Closed
Dmitry-PlainID opened this issue Jul 19, 2016 · 15 comments
Closed

Quorum 2 not reached #6438

Dmitry-PlainID opened this issue Jul 19, 2016 · 15 comments

Comments

@Dmitry-PlainID
Copy link

We have upgraded our Orient DB version to 2.2.4 because of a bug that was fixed in this version (bug number 7075).
Now, after we restart our nodes we receive the following when trying to update a record.

Our environment includes 2 nodes:
WildFly 10 application server with embedded instance of OrientDB 2.2.4 (xmx was set to 2G and XX:MaxDirectMemorySize=512g)
OrientDB 2.2.4 node

after we restart one of them we cannot update any record.
we tried to updated from the console as well and getting the same error.

Caused by: com.orientechnologies.orient.server.distributed.task.ODistributedOperationException: Quorum 2 not reached for request (id=1.2 task=tx user=#5:0). Elapsed=90005ms No server in conflict. Received:

  • nodeRuleEngine: TX[1]{Fetch plan support via REST [moved] #174:3 v.1}
  • main: waiting-for-response
    DB name="RuleEngineDB"
    at com.orientechnologies.orient.server.distributed.ODistributedResponseManager.manageConflicts(ODistributedResponseManager.java:612)
    at com.orientechnologies.orient.server.distributed.ODistributedResponseManager.getFinalResponse(ODistributedResponseManager.java:339)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedDatabaseImpl.waitForResponse(ODistributedDatabaseImpl.java:568)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedDatabaseImpl.send2Nodes(ODistributedDatabaseImpl.java:295)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedAbstractPlugin.sendRequest(ODistributedAbstractPlugin.java:578)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager$1.call(ODistributedTransactionManager.java:157)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager$1.call(ODistributedTransactionManager.java:109)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.executeOperationInLock(ODistributedStorage.java:1079)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedTransactionManager.commit(ODistributedTransactionManager.java:109)
    at com.orientechnologies.orient.server.distributed.impl.ODistributedStorage.commit(ODistributedStorage.java:1228)
    at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:557)
    at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:111)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2672)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2641)
    at com.tinkerpop.blueprints.impls.orient.OrientTransactionalGraph.commit(OrientTransactionalGraph.java:175)
    at com.plainid.j2c.graphdb.orientdb.managed.OrientDBManagedConnection$1.commit(OrientDBManagedConnection.java:87)
@prjhub prjhub added the bug label Jul 19, 2016
@lvca
Copy link
Member

lvca commented Jul 19, 2016

How many servers are running?

@Dmitry-PlainID
Copy link
Author

We run 2 servers:

  1. OrientDB console
  2. Embedded server in our application

@lvca
Copy link
Member

lvca commented Jul 19, 2016

The error is relative to the "Quorum 2" not reached, so at a certain point, you had 2 servers. The majority for 2 servers is (N/2+1) = 2/2+1 = 2. If you run one server only, you can never reach the quorum. If you want to have a dynamic configuration I suggest you to set distributed.autoRemoveOfflineServers=0 in orientdb-server-config.xml (last section), so as soon as a node is offline, it's removed from the list.

@lvca lvca self-assigned this Jul 19, 2016
@Dmitry-PlainID
Copy link
Author

I have 2 running nodes, and in the log I see that them synchronised :
Members [2] {
Member [127.0.0.1]:2434
Member [127.0.0.1]:2435 this
}
The error "Quorum 2" I get when trying to update or create record

@lvca
Copy link
Member

lvca commented Jul 20, 2016

This exception:

nodeRuleEngine: TX[1]{#174:3 v.1}
main: waiting-for-response

Means the client was attached to the server nodeRuleEngine that executed a create record in a transaction, but the server "main" cannot respond. How the "main" node is configured? Are you running the main server as embedded? Is this happening at the beginning, after a while or randomly?

@OTalmor
Copy link

OTalmor commented Jul 20, 2016

main is a regular node.
configurations:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<orient-server>
    <handlers>
        <handler class="com.orientechnologies.orient.graph.handler.OGraphServerHandler">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="50" name="graph.pool.max"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
            <parameters>
                <parameter value="main" name="nodeName"/>
                <parameter value="true" name="enabled"/>
                <parameter value="${ORIENTDB_HOME}/config/default-distributed-db-config.json" name="configuration.db.default"/>
                <parameter value="${ORIENTDB_HOME}/config/hazelcast.xml" name="configuration.hazelcast"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OJMXPlugin">
            <parameters>
                <parameter value="false" name="enabled"/>
                <parameter value="true" name="profilerManaged"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OAutomaticBackup">
            <parameters>
                <parameter value="false" name="enabled"/>
                <parameter value="${ORIENTDB_HOME}/config/automatic-backup.json" name="config"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OServerSideScriptInterpreter">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="SQL" name="allowedLanguages"/>
            </parameters>
        </handler>
    </handlers>
    <network>
        <sockets>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="ssl">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
                    <parameter value="password" name="network.ssl.keyStorePassword"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
                    <parameter value="password" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="https">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
                    <parameter value="password" name="network.ssl.keyStorePassword"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
                    <parameter value="password" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
        </sockets>
        <protocols>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary" name="binary"/>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpDb" name="http"/>
        </protocols>
        <listeners>
            <listener protocol="binary" socket="default" port-range="2424-2430" ip-address="0.0.0.0"/>
            <listener protocol="http" socket="default" port-range="2480-2490" ip-address="0.0.0.0">
                <commands>
                    <command implementation="com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetStaticContent" pattern="GET|www GET|studio/ GET| GET|*.htm GET|*.html GET|*.xml GET|*.jpeg GET|*.jpg GET|*.png GET|*.gif GET|*.js GET|*.css GET|*.swf GET|*.ico GET|*.txt GET|*.otf GET|*.pjs GET|*.svg GET|*.json GET|*.woff GET|*.woff2 GET|*.ttf GET|*.svgz" stateful="false">
                        <parameters>
                            <entry value="Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\nPragma: no-cache" name="http.cache:*.htm *.html"/>
                            <entry value="Cache-Control: max-age=120" name="http.cache:default"/>
                        </parameters>
                    </command>
                    <command implementation="com.orientechnologies.orient.graph.server.command.OServerCommandGetGephi" pattern="GET|gephi/*" stateful="false"/>
                </commands>
                <parameters>
                    <parameter value="utf-8" name="network.http.charset"/>
                    <parameter value="true" name="network.http.jsonResponseError"/>
                </parameters>
            </listener>
        </listeners>
    </network>
    <storages/>
    <users>
        <user resources="*" password="{PBKDF2WithHmacSHA256}XXX" name="root"/>
        <user resources="connect,server.listDatabases,server.dblist" password="{PBKDF2WithHmacSHA256}YYY" name="guest"/>
    </users>
    <properties>
        <entry value="1" name="db.pool.min"/>
        <entry value="50" name="db.pool.max"/>
        <entry value="false" name="profiler.enabled"/>
        <entry value="30000" name="distributed.crudTaskTimeout"/>
        <entry value="0" name="distributed.autoRemoveOfflineServers"/>
    </properties>
    <isAfterFirstTime>true</isAfterFirstTime>
</orient-server>

@OTalmor
Copy link

OTalmor commented Jul 20, 2016

now with XML

<orient-server>
    <handlers>
        <handler class="com.orientechnologies.orient.graph.handler.OGraphServerHandler">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="50" name="graph.pool.max"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
            <parameters>
                <parameter value="main" name="nodeName"/>
                <parameter value="true" name="enabled"/>
                <parameter value="${ORIENTDB_HOME}/config/default-distributed-db-config.json" name="configuration.db.default"/>
                <parameter value="${ORIENTDB_HOME}/config/hazelcast.xml" name="configuration.hazelcast"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OJMXPlugin">
            <parameters>
                <parameter value="false" name="enabled"/>
                <parameter value="true" name="profilerManaged"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OAutomaticBackup">
            <parameters>
                <parameter value="false" name="enabled"/>
                <parameter value="${ORIENTDB_HOME}/config/automatic-backup.json" name="config"/>
            </parameters>
        </handler>
        <handler class="com.orientechnologies.orient.server.handler.OServerSideScriptInterpreter">
            <parameters>
                <parameter value="true" name="enabled"/>
                <parameter value="SQL" name="allowedLanguages"/>
            </parameters>
        </handler>
    </handlers>
    <network>
        <sockets>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="ssl">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
                    <parameter value="password" name="network.ssl.keyStorePassword"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
                    <parameter value="password" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
            <socket implementation="com.orientechnologies.orient.server.network.OServerTLSSocketFactory" name="https">
                <parameters>
                    <parameter value="false" name="network.ssl.clientAuth"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.keyStore"/>
                    <parameter value="password" name="network.ssl.keyStorePassword"/>
                    <parameter value="config/cert/orientdb.ks" name="network.ssl.trustStore"/>
                    <parameter value="password" name="network.ssl.trustStorePassword"/>
                </parameters>
            </socket>
        </sockets>
        <protocols>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary" name="binary"/>
            <protocol implementation="com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpDb" name="http"/>
        </protocols>
        <listeners>
            <listener protocol="binary" socket="default" port-range="2424-2430" ip-address="0.0.0.0"/>
            <listener protocol="http" socket="default" port-range="2480-2490" ip-address="0.0.0.0">
                <commands>
                    <command implementation="com.orientechnologies.orient.server.network.protocol.http.command.get.OServerCommandGetStaticContent" pattern="GET|www GET|studio/ GET| GET|*.htm GET|*.html GET|*.xml GET|*.jpeg GET|*.jpg GET|*.png GET|*.gif GET|*.js GET|*.css GET|*.swf GET|*.ico GET|*.txt GET|*.otf GET|*.pjs GET|*.svg GET|*.json GET|*.woff GET|*.woff2 GET|*.ttf GET|*.svgz" stateful="false">
                        <parameters>
                            <entry value="Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\nPragma: no-cache" name="http.cache:*.htm *.html"/>
                            <entry value="Cache-Control: max-age=120" name="http.cache:default"/>
                        </parameters>
                    </command>
                    <command implementation="com.orientechnologies.orient.graph.server.command.OServerCommandGetGephi" pattern="GET|gephi/*" stateful="false"/>
                </commands>
                <parameters>
                    <parameter value="utf-8" name="network.http.charset"/>
                    <parameter value="true" name="network.http.jsonResponseError"/>
                </parameters>
            </listener>
        </listeners>
    </network>
    <storages/>
    <users>
        <user resources="*" password="{PBKDF2WithHmacSHA256}removed" name="root"/>
        <user resources="connect,server.listDatabases,server.dblist" password="{PBKDF2WithHmacSHA256}removed" name="guest"/>
    </users>
    <properties>
        <entry value="1" name="db.pool.min"/>
        <entry value="50" name="db.pool.max"/>
        <entry value="false" name="profiler.enabled"/>
        <entry value="30000" name="distributed.crudTaskTimeout"/>
        <entry value="0" name="distributed.autoRemoveOfflineServers"/>
    </properties>
    <isAfterFirstTime>true</isAfterFirstTime>
</orient-server>

@OTalmor
Copy link

OTalmor commented Jul 20, 2016

main is not embedded

@lvca
Copy link
Member

lvca commented Jul 20, 2016

Is this happening at the beginning, after a while or randomly?

@Dmitry-PlainID
Copy link
Author

main server is an OrientDB console, not embedded. And this happening after first restart of one of nodes

@lvca
Copy link
Member

lvca commented Aug 2, 2016

We resolved this issue in 2.2.6 (and 2.2.7-SNAPSHOT). Could you please test with it?

@lvca lvca closed this as completed Sep 1, 2016
@lvca
Copy link
Member

lvca commented Sep 1, 2016

Closing this because no answer. In case you can still see this issue after 2.2.6, please reopen/comment this. Thanks.

@will-Niu
Copy link

will-Niu commented Jun 29, 2017

I have 2 running nodes , both master .

2017-06-29 21:53:39:908 INFO  [node_1] Distributed servers status (*=current @=lockmgr[node_1]):

+------------+------+------------------------------------+-----+---------+--------------------+--------------------+------------------------+
|Name        |Status|Databases                           |Conns|StartedOn|Binary              |HTTP                |UsedMemory              |
+------------+------+------------------------------------+-----+---------+--------------------+--------------------+------------------------+
|node_1(*)(@)|ONLINE|will_test_db_1=ONLINE (MASTER)      |9    |21:44:18 |192.168.100.75:2424 |192.168.100.75:2480 |274.17MB/3.83GB (6.98%) |
|            |      |GratefulDeadConcerts=ONLINE (MASTER)|     |         |                    |                    |                        |
|            |      |cmdb_org_8888=ONLINE (MASTER)       |     |         |                    |                    |                        |
|node_163    |ONLINE|will_test_db_1=ONLINE (MASTER)      |2    |21:44:40 |192.168.100.163:2424|192.168.100.163:2480|896.84MB/3.83GB (22.85%)|
|            |      |GratefulDeadConcerts=ONLINE (MASTER)|     |         |                    |                    |                        |
|            |      |cmdb_org_8888=ONLINE (MASTER)       |     |         |                    |                    |                        |
+------------+------+------------------------------------+-----+---------+--------------------+--------------------+------------------------+
 [OHazelcastPlugin]
2017-06-29 21:54:10:866 WARNI [node_1] Timeout (10339ms) on waiting for synchronous responses from nodes=[node_1, node_163] responsesSoFar=[node_163] request=(id=0.324 task=command_sql(create class v_instance_CLUSTER extends V) user=#5:0) [ODistributedDatabaseImpl]

@will-Niu
Copy link

2 nodes, mulitcast deployment in LAN
I encountered the above problems, trouble to solve, thank you!

@lvca
Copy link
Member

lvca commented Jul 2, 2017

@will-Niu what release? Can you please try with last 2.2.22?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants