Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

After stepping down the primary node, queries failed with error 16550: "not authorized for query on test.users" #247

Closed
tanpeter opened this issue Dec 24, 2013 · 18 comments
Assignees
Milestone

Comments

@tanpeter
Copy link

I'm using moped 1.5.1, mongodb 2.4.8

Steps to reproduce:

  1. Have a replica set of 3 nodes
  2. A script that does one insert, and one query every couple seconds in the loop (using eventual consistency)
  3. While the script is running, on mongo shell, do rs0:PRIMARY> rs.stepDown(). As the result, one of the secondary nodes becomes primary and the ex-primary becomes a secondary.
  4. Before the replica set information gets refreshed, insert operations still go the ex-primary and get dropped (no new doc created).
  5. After the replica set information refreshes, new docs are now created.
  6. However the query operation starts failing with this error:
#<Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=61
  @request_id=98
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="test.users"
  @skip=0
  @limit=-1
  @selector={:_id=>"52ba15d345f540523400007a"}
  @fields=nil>
failed with error 16550: "not authorized for query on test.users"
  1. On the ex-primary, now-secondary mongodb logs, similar error:
[conn327] assertion 16550 not authorized for query on test.users ns:test.users query:{ _id: ObjectId('52ba0b3c45f5406fd9000086') }
  1. The error only happens on the ex-primary, the other secondary node is fine.
  2. The error won't go away until either I restart the script, or I restart the ex-primary node.

It seems to me that moped continues to use the connection to the ex-primary node for querying purpose now but mongodb requires reauth (or even reconnect, that's what mongo shell does).

Thanks in advance for looking into the issue.

-Peter

@tanpeter
Copy link
Author

Here is my test script used in the steps above.

require 'moped'
require 'pp'

#Moped.logger = Logger.new($stdout)

session = Moped::Session.new(%w(
  node1, node2, node3
), refresh_interval: 60)
session.use :test
session.login("test", "password")

1000.to_i.times {|x|
  begin
    id = Moped::BSON::ObjectId.new
    session.with(safe: false)[:users].insert({_id: id, created_at: Time.now })
    sleep 1
    pp x
    pp session[:users].find(_id: id).one
    sleep 1
  rescue => e
    pp e
  end
}

@tanpeter
Copy link
Author

tanpeter commented Jan 8, 2014

In moped v1.5.1, node#query method, it tried to re-auth when getting 16550 unauthorized error (you can see the comment says exactly like the scenario I reported). The problem is the "database" variable could be either String or Symbol; when it is a Symbol, that causes the auth.has_key?(database) lookup fails (line 372) so the re-auth never happens.

    def query(database, collection, selector, options = {})
      operation = Protocol::Query.new(database, collection, selector, options)

      process(operation) do |reply|
        if reply.query_failed?
          if reply.unauthorized? && auth.has_key?(database)
            # If we got here, most likely this is the case of Moped
            # authenticating successfully against the node originally, but the
            # node has been reset or gone down and come back up. The most
            # common case here is a rs.stepDown() which will reinitialize the
            # connection. In this case we need to requthenticate and try again,
            # otherwise we'll just raise the error to the user.
            login(database, *auth[database])
            reply = query(database, collection, selector, options)
          else
            raise Errors::QueryFailure.new(operation, reply.documents.first)
          end
        end
        reply
      end
    end

Converting the database variable to a String fixed the issue. However it appears that in 2.0.0-beta, the error/failover handling and auth have major refactoring and it looks like this bug doesn't happen anymore (I didn't do thorough testing with 2.0.0).

To workaround this issue, just use string as a database name (e.g. session.use 'test' not session.use :test) and you won't run into this issue.

@ghost ghost assigned durran Jan 8, 2014
@arthurnn
Copy link
Contributor

arthurnn commented Jan 8, 2014

Can you tried this against moped master? is the error still happening?

@tanpeter
Copy link
Author

tanpeter commented Jan 9, 2014

I ran against the latest master branch; in short, yes it is still happening. But the sequence of events has changed so I'll just state what I observed below:

  1. It seems like even with safe: false option (which should be default), an insert operation incurs a read, probably the getlasterror command (in v1.5.1 with safe: false, inserts never complaint about unauthorized errors, now they do). Here is a reply from an insert operation:
#<Moped::Protocol::Reply
  @length=177
  @request_id=7658
  @response_to=11
  @op_code=1
  @flags=[:await_capable]
  @cursor_id=0
  @offset=0
  @count=1
  @documents=[{"n"=>0, "lastOp"=>#<BSON::Timestamp:0x007fa26bb324b8 @seconds=1389227990, @increment=1>, "connectionId"=>212, "writtenTo"=>[{"_id"=>0, "host"=>"ip-10-231-128-103:27017"}], "wtime"=>0, "err"=>nil, "ok"=>1.0}]>
  1. Using the same test script in my second post, all the read operations went to the primary. I tried "session.with(consistency: :eventual)" and it didn't help.
  2. Because of 2, when I did rs.stepDown(), reads went to the new primary and they were fine. However I did two more rs.stepDown (I have total 3 nodes), it made moped to send read operations to the node which had been stepped down before, and 16550 unauthorized errors happened:

@durran
Copy link
Member

durran commented Jan 9, 2014

See readme for master changes - safe: false and consistency: :eventual are no longer valid options and would be noops. https://github.com/mongoid/moped/blob/master/CHANGELOG.md#api-changes-backwards-incompatible

@tanpeter
Copy link
Author

tanpeter commented Jan 9, 2014

Thanks for the explanation, I should have read the CHANGELOG.

Now with read: :secondary option, the same error occurs whenever the read hits the ex-primary node

#<Moped::Protocol::Reply
  @length=100
  @request_id=176592
  @response_to=85
  @op_code=1
  @flags=[:query_failure]
  @cursor_id=0
  @offset=0
  @count=1
  @documents=[{"$err"=>"not authorized for query on test.users", "code"=>16550}]>
#<Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=61
  @request_id=85
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="test.users"
  @skip=0
  @limit=-1
  @selector={:_id=>BSON::ObjectId('52cde95e737061416a110000')}
  @fields=nil>
failed with error 16550: "not authorized for query on test.users"

@tanpeter
Copy link
Author

tanpeter commented Jan 9, 2014

Also the workaround for moped 1.5.1 (ie. use session.use 'test' instead of session.use :test) doesn't work for 2.0

@chrisckchang
Copy link

@ptan-iswifter are you still seeing this issue with 2.0.0? I just tried testing on 1.5.1 and 2.0.0 and it looks like this is still an issue... :/

@chrisckchang
Copy link

was able to resolve using refresh_interval setting.

@duromano
Copy link

@chrischang12 how exactly did you solve this with the refresh_interval setting?

@chrisckchang
Copy link

@duromano refresh_interval helps discover the status of nodes quicker. by default mongoid/moped caches information about a node for 300 seconds/5 minutes. if there are any changes to a node's state within these 5 minutes, that means that the driver won't know about it. lowering the refresh_interval makes the driver check the status of nodes more frequently.

see source:
https://github.com/mongoid/moped/blob/master/lib/moped/cluster.rb#L96

@duromano
Copy link

@chrischang12 hmmm I see.. It makes sense! Thanks!
But I'm not sure if it will solve my problem, cause once this "Moped authorization error" pops up, it never comes back to normal (even after 5, 30, 60.. minutes), unless I restart my application.
What drives me crazy is the fact that the application runs normal for days, and suddenly I get this error.. :/

@chrisckchang
Copy link

@duromano i'm not too familiar with how moped creates/manages connections but one potential workaround could be simply closing the connection that throws the error after X number of times and creating a new one. i'm assuming here that you're only getting that error when the driver reuses existing connections, as opposed to newly created connections.

@zarqman
Copy link
Contributor

zarqman commented Sep 4, 2014

We just got caught by this in production. Seems both rs.reconfig() and rs.stepDown() can trigger it.

A review of the mongodb logs suggests that moped may be losing the authentication credentials as a db.auth() call is not being logged (but is logged when the app first starts, so we know auth is in use and configured correctly). We do see a new connection to the mongodb server at the time of the error, just not the expected db.auth() right after.

If this diagnosis is true, that's the real source of the problem, and not the error & trace below, which are just the natural result of the auth error.

Current error, as of moped 2.0.0:

Moped::Errors::QueryFailure: The operation: #<Moped::Protocol::Query
  @length=202
  @request_id=6
  @response_to=0
  @op_code=2004
  @flags=[:slave_ok]
  @full_collection_name="xxxx.yyyy"
  @skip=0
  @limit=-1
  @selector={"$query"=>{"domains"=>"zzzz"}, "$orderby"=>{:_id=>1}}
  @fields=nil>
failed with error 16550: "not authorized for query on xxxx.yyyy"

Backtrace (apologies for formatting):

moped/operation/read.rb ? in block in execute at line 50
moped/node.rb ? in [] at line 594
moped/node.rb ? in block (2 levels) in flush at line 594
moped/node.rb ? in map at line 593
moped/node.rb ? in block in flush at line 593
moped/node.rb ? in block in logging at line 617
active_support/notifications.rb ? in block in instrument at line 159
active_support/notifications/instrumenter.rb ? in instrument at line 20
active_support/notifications.rb ? in instrument at line 159
moped/instrumentable.rb ? in instrument at line 31
moped/node.rb ? in logging at line 616
moped/node.rb ? in flush at line 587
moped/node.rb ? in process at line 391
moped/operation/read.rb ? in execute at line 48
moped/node.rb ? in read at line 648
moped/node.rb ? in query at line 411
moped/query.rb ? in block in first at line 128
moped/cluster.rb ? in block in with_primary at line 243
moped/node.rb ? in block in ensure_primary at line 204
moped/executable.rb ? in execute at line 25
moped/node.rb ? in ensure_primary at line 203
moped/cluster.rb ? in with_primary at line 242
moped/read_preference/primary_preferred.rb ? in block in with_node at line 47
moped/read_preference/selectable.rb ? in call at line 65
moped/read_preference/selectable.rb ? in with_retry at line 65
moped/read_preference/primary_preferred.rb ? in with_node at line 45
moped/query.rb ? in first at line 127
mongoid/query_cache.rb ? in block in first_with_cache at line 187
mongoid/query_cache.rb ? in with_cache at line 135
mongoid/query_cache.rb ? in first_with_cache at line 186
mongoid/contextual/mongo.rb ? in block (2 levels) in first at line 199
mongoid/contextual/mongo.rb ? in with_sorting at line 535
mongoid/contextual/mongo.rb ? in block in first at line 198
mongoid/contextual/mongo.rb ? in try_cache at line 447
mongoid/contextual/mongo.rb ? in first at line 197
mongoid/contextual.rb ? in first at line 20
mongoid/findable.rb ? in find_by at line 107
our_model.rb ? in smart_find at line 141 (application)
...

@wandenberg
Copy link
Contributor

Hi,

I was working on these problems and created a fix here.
The is the same fix to 1.5.2 is here.

@nadilsons
Copy link

Great @wandenberg !!!
Works fine here! Tks!

@wandenberg
Copy link
Contributor

This is good.
I did just a small change, move the code to (1.5.0|2.0.0)-fix_retries_operations branches
and created the pull request for 2.0.0 and 1.5.2

@arthurnn
Copy link
Contributor

closing this as #324 was merged and it should solve this issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants