Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening realm with cached user while offline results in fatal error and session does not retry connection #7349

Closed
sync-by-unito bot opened this issue Feb 16, 2024 · 15 comments · Fixed by #7469
Assignees

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Feb 16, 2024

This ticket was created based on the customer reported issue in HELP-55798 and is a P2, since that ticket is also P2.

With the changes for updating the use of app::Config::base_url and adding the App::update_base_url() function, the first SyncSession opened after app start will always request the user creds to be updated (and thereby updating the location info) if the location had not been requested via another App HTTP request.

If the Realm is opened while the client is offline, this results in a fatal error due to the resolve failed or offline error returned while requesting the location, which is reported to the client via the registered error_handler and the SyncSession becomes inactive without retrying the connection attempt.

Copy link
Author

sync-by-unito bot commented Feb 16, 2024

➤ michael-wb commented:

If the location request fails when a Realm is opened (e.g. offline, can't find server, etc.), then a fatal error is definitely being reported to the client and the session will not attempt to reconnect without being manually resumed.

A couple of thoughts on possible implementations:

  1. Add a new sync session state (e.g. AccessTokenReconnectWait) and add the plumbing to create a timer on the event loop via the SyncManager to attempt the reconnect, with basic hold-off logic (e.g. previous wait * 2, up to max of 5 mins) if the client remains offline.
  2. Do nothing - if they get this error, they will need to manually resume the session - or ensure it is revived if handle_reconnect() is called when the client is back online
  3. Update the starting of the SyncSession without a sync route so the Connection object ensures the location is up to date - a bit more plumbing will be needed, since the sessions created won’t have a sync_route until the location is updated. Then the connection object can handle the reconnect timing.

The current route being investigated/implemented is #3, since it will work somewhat similar to how the access token is refreshed if the initial request fails to update the access token prior to starting the session - it starts the session anyways and uses the Connection connect logic to request an updated access token when the websocket fails to be established.

Copy link
Author

sync-by-unito bot commented Feb 19, 2024

➤ michael-wb commented:

The most straightforward approach seems to be to add a new state to SyncSession (WaitForLocation) that the session will enter if it is activated prior to the location being updated. The SyncSession will notify the SyncManager to update the location by creating a timer via the SyncClient->Client->ClientImpl that will handle the update when the timer expires. If the location is not updated successfully, the timer will be restarted with a longer delay (up to 15 minutes) if the location is not updated. Calling handle_reconnect() or resume() will force the location update timer to be restarted (e.g. request the location immediately).

Copy link
Author

sync-by-unito bot commented Mar 4, 2024

➤ michael-wb commented:

Adding a more detailed of the description/requirements of this issue, since this is extending into 4 week territory
As a decision made during the design of the changes for updating the BaseUrl, the caching of the location metadata was deprecated in favor of always requesting the location metadata prior to issuing an HTTP appserver request or initiating a websocket connection with the server. When caching was used for the location metadata, the App and SyncSession would always use the location metadata stored in the metadata cache, regardless of the current value of the base_url in App::Config. This made it impossible to update the base_url via App::Config without first deleting the metadata realm file.
The base_url value comes into play for the following situations:

  • The default base_url (https://realm.mongodb.com) requests the location metadata from the global Atlas endpoint to retrieve the hostname information for HTTP and websocket requests
  • If a specific deployment region is selected:
    o A location request to the global Atlas endpoint will either return a 301/308 response or the correct location info that points to the correct server
    o A location request to an old deployment region will return a 301/308 response to redirect the client to request the location info from the correct server
  • If the base_url is set or changed in the App::Config, the client should use that address for HTTP requests and websockets the next time the App is created

With the changes for the "update BaseURL" project, if the client is currently online and the configured base_url value is correct, the current operation works fine:

  • HTTP appserver requests will automatically request the location prior to sending the appserver request.
  • SyncSession will automatically to request the location prior to opening the first websocket to the server.
  • The call to App::update_base_url() will update the base URL location metadata and new HTTP requests and websocket connections will use the new server.
  • If a 301/308 redirect is detected while issuing an HTTP appservices request, the redirect support in App will request the correct location metadata from the new server location.
  • A 301/308 redirect detected while opening a websocket connection would result in requesting an updated access token, which will force the location metadata to be updated prior to requesting the access token.
  • If the base_url is updated in the App::Config or the App::update_base_url() function is called the correct base URL value is always used when communicating with the server.

It becomes a little bit tricky if the client is currently offline, since not all the changes work as expected:

  • HTTP appserver requests (and the location metadata request) will fail, which is the same as what happens today.
  • The SyncSession will request the location metadata when it is activated and will pass an error via the error_handler after one attempt at updating the location metadata, which requires the SyncSession to be resumed in order to reconnect (or attempt to retrieve the location metadata again).
  • The call to App::update_base_url() will fail since the server cannot be reached and the base URL will remain the original value.
  • If the SyncSession's location metadata was up to date prior to going offline, the SyncSession will kick off the internal session and the usual retry/backoff mechanism will come into play to handle the connection to the server.

Copy link
Author

sync-by-unito bot commented Mar 4, 2024

➤ michael-wb commented:

Some potential implementations for handling the location update:

  • If a SyncSession is activated and the location needs to be updated, coordinate with the SyncManager to handle the location update (using the App) and schedule retries from the SyncManager if the location update fails
  • If a ClientImpl::Connection is activated without an updated location, trigger an error via the error_handler to attempt to refresh the location using the App reference in the SyncSession. If the location update fails, the connection will attempt to update the location on the next connection retry.
  • Restore the location metadata cache and include additional storage for the last used base_url value. When the SyncSession is activated, it will check to see if the configured base_url is the same as that stored with the location metadata. If they are the same, then the session will start with the cached info, otherwise, the location will be updated using one of the methods above. For HTTP calls, the location will always be update on the first request after App was created. There is a (very) slight risk of the location metadata being out of date if the cached info is used but the underlying deployment region has been changed, although should be resolved by the 301/308 redirections.

Copy link
Author

sync-by-unito bot commented Mar 5, 2024

➤ Jonathan Reams commented:

So the thing that I've always found confusing is that you always have a base url. Either it's the default value in App::Config or the user has manually set a value in App::Config or the user has called update_base_url() and explicitly set it that way. We can always just try to connect to whatever the current value of base url is and we can always get a redirect that may update our baseurl until the next time the app restarts or update_base_url is called. the sync::Session already knows how to back off after a failed connection attempt, and I don't know why we need this new separate backoff procedure to lookup the url to connect to when we could just try to connect via sync::Session and get a redirect.

Copy link
Author

sync-by-unito bot commented Mar 21, 2024

➤ Andy Wang commented:

Hi [~michael.wilkersonbarker@mongodb.com] , just to follow up on this one. I checked the github and found there seems to be code already merged and the fix is completed. Can I consider this issue fixed? and if yes, is there anything the customer needs to do to implement the fix? 

ty!

Copy link
Author

sync-by-unito bot commented Mar 21, 2024

➤ michael-wb commented:

Hi [~andy.wang@mongodb.com],

A fix has been merged into the Core master branch and we will need to create a release (likely tomorrow) that will contain this fix. Once this has been released, the SDKs will need to create a release that includes this change. I am hoping we can have this available by the end of the week or early next week.

Since it looks like issues have been reported for JS and Swift, I will reach out to those SDK teams once the Core release is ready so we can make a build available for them ASAP.

Thanks,

Michael

Copy link
Author

sync-by-unito bot commented Mar 21, 2024

➤ Andy Wang commented:

thanks so much for the fast reply [~michael.wilkersonbarker@mongodb.com] !!!

Can I safely assume the moment I see a new SDK version release is when I can revert back to the customer and ask them to upgrade the SDK version?

Copy link
Author

sync-by-unito bot commented Mar 21, 2024

➤ michael-wb commented:

Yes, I will also try to keep an eye on the SDKs to let you know when they have been released.

Thanks for following up!

Copy link
Author

sync-by-unito bot commented Mar 28, 2024

➤ Andy Wang commented:

hi [~michael.wilkersonbarker@mongodb.com] , 

sorry for bothering you again. Does realm 14.4.1 fixes the issue? I saw there's a new release last week and this ticket has been closed. just wondering:) ty!

Copy link
Author

sync-by-unito bot commented Mar 28, 2024

➤ michael-wb commented:

Hi [~andy.wang@mongodb.com] - yes, Realm 14.4.1 / Swift v10.49.1 fixes this particular issue, but there is an outstanding issue with @AutoOpen in the Swift SDK that is being fixed in RCORE-2060, which is currently in review.

Copy link
Author

sync-by-unito bot commented Mar 28, 2024

➤ Andy Wang commented:

Hey [~michael.wilkersonbarker@mongodb.com] , thanks for the fast reply!

Will we get a newer realm core release in this case?

I assume it will be safe for me to wait up till you inform me it is okay to ask the customer to retry.

Copy link
Author

sync-by-unito bot commented Mar 28, 2024

➤ michael-wb commented:

If they are not using @AutoOpen this release is fine. The primary outstanding issue is that @AutoOpen doesn't drop into opening the realm if the location update fails (e.g. client is offline) and remains stuck in the .connecting state.

Copy link
Author

sync-by-unito bot commented Apr 27, 2024

➤ Andy Wang commented:

Hey [~michael.wilkersonbarker@mongodb.com] , 

I see that we might have a new Rm CORE release (v 14.6.0). Is it now safe for me to inform the customer to update their app? TY!

Copy link
Author

sync-by-unito bot commented Apr 27, 2024

➤ danieltabacaru commented:

[~andy.wang@mongodb.com] Not yet. The SDKs need a new release too.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.