During rolling deploy it is possible for the old application pod to interact with the updated database #1867

jobara · 2023-07-24T19:32:19Z

Prerequisites

I've searched for existing issues

Describe the bug

In our current rolling deploy system, as new pods are being deployed an old pod sticks around until the new ones are ready for use. However, there is a single shared database that the pods connect to. The issue here is that a user may be interacting with the old pod, but the database could have been migrated to a new structure. This could lead to data corruption and/or 500 errors reported to the user as the application may have a mismatch of expectations of the data compared to the current database.

Expected behavior

We should minimize or eliminate the possibility of the old application and new database from interacting with each other.

colleenskemp · 2023-08-02T17:02:28Z

This ticket captures the following tickets as related sub-tickets:
#1728
#1686
#1550

colleenskemp · 2023-08-02T17:04:25Z

@jobara - We understand that this is not a priority at this time. Is that right? The sense of our team is that we can turn off the rolling updates, but then we will have downtimes for each deployment. This might not be worth our time.

Do you agree?

jobara · 2023-08-02T19:24:32Z

@colleenskemp I'll have to think so more on this. I'll check in with @michelled when she's back.

jobara · 2023-09-27T17:48:35Z

At the dev check in meeting with @JureUrsic, @peterhebert, and @michelled we discussed using Laravel's maintenance mode for this. When the deploy is happening the script would call php artisan down, after the deploy is finished it would call php artisan up. Any users accessing the site during the maintenance time would see a maintenance page.

jobara · 2023-10-25T13:16:41Z

@JureUrsic I was thinking about this today, and wondering when/where it should run. I was thinking it could go around the migration step in DeployGlobal.php but I'm not sure because wouldn't the old web head need to come down before we take the site out of maintenance mode? Also are you able to take on work on this task?

JureUrsic · 2023-10-26T16:56:45Z

@jobara it should go into "local" command on start and beginning

JureUrsic · 2023-10-26T16:57:32Z

I can run some tests on dev, just give me the commands to run

jobara · 2023-10-26T17:11:07Z

I can run some tests on dev, just give me the commands to run

@JureUrsic thanks, you can use the php artisan down and php artisan up commands. See Laravel's maintenance mode for more information.

jobara · 2023-11-14T16:02:08Z

@JureUrsic the other day I manually reset the database in the dev deploy. As part of that I put the site in maintenance mode. However, after bringing the site back up using php artisan up the site was removed from maintenance mode; however, for several minutes the site remained inaccessible and returned a 500 error from nginx I believe. So the site actually looked broken for awhile. I'm not sure if this will happen with the plans we have for this ticket, but something to look into along with it.

SantiagoG-Colab · 2023-11-15T18:51:08Z

@marvinroman

marvinroman · 2023-11-16T00:42:20Z

So the problem with maintenance mode currently is that the health check on the pods also gets maintenance mode so the pod is considered unhealthy and the load balancer doesn't forward connections.

We will take the following actions to fix:

Create a health check that will bypass maintenance mode.
Put the php artisan down/up in the php artisan deploy:global command.

marvinroman · 2023-11-16T10:20:52Z

@jobara I've made the necessary changes in the branch associated with this issue. Let me know if you want me to create a PR for it?

jobara · 2023-11-16T12:11:03Z

@marvinroman thanks for working on this. Yes, please file a PR for the changes.

jobara · 2023-11-16T12:22:18Z

So the problem with maintenance mode currently is that the health check on the pods also gets maintenance mode so the pod is considered unhealthy and the load balancer doesn't forward connections.

We will take the following actions to fix:

Create a health check that will bypass maintenance mode.

Put the php artisan down/up in the php artisan deploy:global command.

Regarding the health check, in taking a glance at your branch, it looks like it checks the DB now. But I guess that won't really tell us if the web site is actually served up properly. Is there a way to check different things if the site is in maintenance mode or not?

Regarding turning maintenance mode on/off in the global deploy, will that affect the original instance as well and not just the two new ones that are in the process of spinning up?

jobara · 2023-11-16T12:30:16Z

@marvinroman also in your branch I noticed that it brings the site back up after 5 minutes. These kinds of timers are always risky as we don't know if the task has yet to complete or completed some time before. Is it possible to get a hook into when the pods are actually being used, and/or when the old pods are all removed?

marvinroman · 2023-11-16T17:58:29Z

So the problem with maintenance mode currently is that the health check on the pods also gets maintenance mode so the pod is considered unhealthy and the load balancer doesn't forward connections.
We will take the following actions to fix:

Create a health check that will bypass maintenance mode.

Put the php artisan down/up in the php artisan deploy:global command.

Regarding the health check, in taking a glance at your branch, it looks like it checks the DB now. But I guess that won't really tell us if the web site is actually served up properly. Is there a way to check different things if the site is in maintenance mode or not?

Regarding turning maintenance mode on/off in the global deploy, will that affect the original instance as well and not just the two new ones that are in the process of spinning up?

This is a health check of the pod and not the site to know whether to forward connections to the pod from the load balancer. In other words are the services properly running. We have an external check that determines site health and will notify us of site issues.

When maintenance mode is activated it occurs across all the pods.

marvinroman · 2023-11-16T18:05:41Z

I agree that there are risks associated with a timer, but we haven't found an alternative at this time.

We have determined that lifecycle hooks aren't possible to use in our infrastructure at this time.

jobara added bug Something isn't working help wanted Extra attention is needed labels Jul 24, 2023

jobara added this to the 1.2.0 milestone Jul 24, 2023

jobara assigned JureUrsic Jul 24, 2023

jobara added this to Public Launch Jul 24, 2023

marvinroman linked a pull request Nov 16, 2023 that will close this issue

ci(deploy): add command & queue to put site into maintenance and brin… #1981

Open

jobara linked a pull request Nov 20, 2023 that will close this issue

ci(deploy): add command & queue to put site into maintenance and brin… #1981

Open

jobara modified the milestones: 1.2.0, 1.3.0 Nov 21, 2023

jobara mentioned this issue Dec 4, 2023

Node ping errors nightly on dev during database reset #2042

Closed

1 task

jobara removed this from the 1.3.0 milestone Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During rolling deploy it is possible for the old application pod to interact with the updated database #1867

During rolling deploy it is possible for the old application pod to interact with the updated database #1867

jobara commented Jul 24, 2023

colleenskemp commented Aug 2, 2023

colleenskemp commented Aug 2, 2023

jobara commented Aug 2, 2023

jobara commented Sep 27, 2023

jobara commented Oct 25, 2023

JureUrsic commented Oct 26, 2023

JureUrsic commented Oct 26, 2023

jobara commented Oct 26, 2023 •

edited

Loading

jobara commented Nov 14, 2023

SantiagoG-Colab commented Nov 15, 2023

marvinroman commented Nov 16, 2023

marvinroman commented Nov 16, 2023

jobara commented Nov 16, 2023

jobara commented Nov 16, 2023

jobara commented Nov 16, 2023

marvinroman commented Nov 16, 2023

marvinroman commented Nov 16, 2023

During rolling deploy it is possible for the old application pod to interact with the updated database #1867

During rolling deploy it is possible for the old application pod to interact with the updated database #1867

Comments

jobara commented Jul 24, 2023

colleenskemp commented Aug 2, 2023

colleenskemp commented Aug 2, 2023

jobara commented Aug 2, 2023

jobara commented Sep 27, 2023

jobara commented Oct 25, 2023

JureUrsic commented Oct 26, 2023

JureUrsic commented Oct 26, 2023

jobara commented Oct 26, 2023 • edited Loading

jobara commented Nov 14, 2023

SantiagoG-Colab commented Nov 15, 2023

marvinroman commented Nov 16, 2023

marvinroman commented Nov 16, 2023

jobara commented Nov 16, 2023

jobara commented Nov 16, 2023

jobara commented Nov 16, 2023

marvinroman commented Nov 16, 2023

marvinroman commented Nov 16, 2023

jobara commented Oct 26, 2023 •

edited

Loading