Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202111 -> 202205][warm-boot]: DB migration doesn't work #11824

Open
nazariig opened this issue Aug 23, 2022 · 6 comments
Open

[202111 -> 202205][warm-boot]: DB migration doesn't work #11824

nazariig opened this issue Aug 23, 2022 · 6 comments
Assignees
Labels

Comments

@nazariig
Copy link
Collaborator

nazariig commented Aug 23, 2022

Description

The issue is caused by DB migrator:

root@sonic:/home/admin# /usr/local/bin/db_migrator.py -o migrate
Traceback (most recent call last):
  File "/usr/local/bin/db_migrator.py", line 816, in main
    result = getattr(dbmgtr, operation)()
  File "/usr/local/bin/db_migrator.py", line 768, in migrate
    next_version = getattr(self, version)()
AttributeError: 'DBMigrator' object has no attribute 'version_2_0_4'
'DBMigrator' object has no attribute 'version_2_0_4'
usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version] [-s unix socket] [-n asic namespace]

optional arguments:
  -h, --help            show this help message and exit
  -o operation (migrate, set_version, get_version)
                        operation to perform [default: get_version]
  -s unix socket        the unix socket that the desired database listens on
  -n asic namespace     The asic namespace whose DB instance we need to connect

Configuration reload is also broken:

root@sonic:/home/admin# config reload -y -f
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Traceback (most recent call last):
  File "/usr/local/bin/db_migrator.py", line 816, in main
    result = getattr(dbmgtr, operation)()
  File "/usr/local/bin/db_migrator.py", line 768, in migrate
    next_version = getattr(self, version)()
AttributeError: 'DBMigrator' object has no attribute 'version_2_0_4'
'DBMigrator' object has no attribute 'version_2_0_4'
usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version] [-s unix socket] [-n asic namespace]

optional arguments:
  -h, --help            show this help message and exit
  -o operation (migrate, set_version, get_version)
                        operation to perform [default: get_version]
  -s unix socket        the unix socket that the desired database listens on
  -n asic namespace     The asic namespace whose DB instance we need to connect

Steps to reproduce the issue:

  1. Install 202111 image
  2. Make sure config_db.json doesn't have version defined
  3. Reload configuration
  4. Install 202205 image
  5. Run warm-reboot

Describe the results you received:

Configuration migration is failed after warm-reboot

Describe the results you expected:

Configuration migration is completed successfully

Output of show version:

SONiC Software Version: SONiC.202205.21-5c306cc2e_Internal
Distribution: Debian 11.4
Kernel: 5.10.0-12-2-amd64
Build commit: 5c306cc2e
Build date: Tue Aug 16 17:28:01 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn4600c-r0
HwSKU: ACS-MSN4600C
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2053X21259
Model Number: MSN4600-CS2FO
Hardware Revision: A1
Uptime: 05:02:38 up 3 min,  1 user,  load average: 1.18, 0.94, 0.41
Date: Sat 14 May 2022 05:02:38

Docker images:
REPOSITORY                                         TAG                            IMAGE ID       SIZE
docker-orchagent                                   202205.21-5c306cc2e_Internal   7b830bf4ac69   471MB
docker-orchagent                                   latest                         7b830bf4ac69   471MB
docker-teamd                                       202205.21-5c306cc2e_Internal   720cf72527fc   453MB
docker-teamd                                       latest                         720cf72527fc   453MB
docker-macsec                                      latest                         959fe4dc45af   455MB
docker-syncd-mlnx                                  202205.21-5c306cc2e_Internal   cf601dc58937   852MB
docker-syncd-mlnx                                  latest                         cf601dc58937   852MB
docker-platform-monitor                            202205.21-5c306cc2e_Internal   bd92acc74a9b   855MB
docker-platform-monitor                            latest                         bd92acc74a9b   855MB
docker-dhcp-relay                                  latest                         a1354d11d617   446MB
docker-sonic-telemetry                             202205.21-5c306cc2e_Internal   c02a9b7c90f3   517MB
docker-sonic-telemetry                             latest                         c02a9b7c90f3   517MB
docker-lldp                                        202205.21-5c306cc2e_Internal   489006bb88af   479MB
docker-lldp                                        latest                         489006bb88af   479MB
docker-router-advertiser                           202205.21-5c306cc2e_Internal   66ce07fe6902   437MB
docker-router-advertiser                           latest                         66ce07fe6902   437MB
docker-mux                                         202205.21-5c306cc2e_Internal   452a17f01c75   485MB
docker-mux                                         latest                         452a17f01c75   485MB
docker-database                                    202205.21-5c306cc2e_Internal   9c904fb2b204   437MB
docker-database                                    latest                         9c904fb2b204   437MB
docker-fpm-frr                                     202205.21-5c306cc2e_Internal   ee1e4edb0cc2   454MB
docker-fpm-frr                                     latest                         ee1e4edb0cc2   454MB
docker-nat                                         202205.21-5c306cc2e_Internal   c13577bb50e4   428MB
docker-nat                                         latest                         c13577bb50e4   428MB
docker-snmp                                        202205.21-5c306cc2e_Internal   62f8db365873   454MB
docker-snmp                                        latest                         62f8db365873   454MB
docker-sflow                                       202205.21-5c306cc2e_Internal   a999881d642f   426MB
docker-sflow                                       latest                         a999881d642f   426MB
docker-sonic-mgmt-framework                        202205.21-5c306cc2e_Internal   0732de8d491e   554MB
docker-sonic-mgmt-framework                        latest                         0732de8d491e   554MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@zhangyanzhao
Copy link
Collaborator

happen on multiple ASIC platforms only but load minigraph, then warmreboot works fine. Known issue, no fix.

@zhangyanzhao zhangyanzhao added MSFT Triaged this issue has been triaged labels Aug 31, 2022
@yxieca
Copy link
Contributor

yxieca commented Aug 31, 2022

This is a known limitation that db_migrator cannot work when there is a version chain disruption.

To work-around this issue, please run "sudo config load_minigraph -y" and "config save -y" to generate a new config_db.json. With that, worm-reboot upgrade should work again.

Going forward, we need to bump up major version in master branch as soon as a new feature/release branch is created.

This known issue has no fix.

@nazariig
Copy link
Collaborator Author

nazariig commented Sep 1, 2022

This is a known limitation that db_migrator cannot work when there is a version chain disruption.

To work-around this issue, please run "sudo config load_minigraph -y" and "config save -y" to generate a new config_db.json. With that, worm-reboot upgrade should work again.

Going forward, we need to bump up major version in master branch as soon as a new feature/release branch is created.

This known issue has no fix.

@yxieca not sure how this W/A suppose to mitigate the issue:

please run "sudo config load_minigraph -y" and "config save -y" to generate a new config_db.json

I tried to do the following:

  1. Install SONiC-OS-202111.95-949b426a2_Internal
  2. config load_minigraph -y & config save -y
  3. DB version is:
root@sonic:/home/admin# cat /etc/sonic/config_db.json | jq .VERSIONS
{
  "DATABASE": {
    "VERSION": "version_2_0_4"
  }
}
  1. Install SONiC-OS-202205.27-0adfd724e_Internal
  2. warm-reboot -v
  3. DB version is:
root@sonic:/home/admin# sonic-cfggen --from-db --print | jq .VERSIONS
{
  "DATABASE": {
    "VERSION": "version_2_0_4"
  }
}
  1. OA crashed due to Policer double provisioning:
root@sonic:/home/admin# zgrep -a 'trapGroupUpdatePolicer\|Copp\|Policer\|trap group\|APPLY_VIEW' /var/log/syslog
Sep  1 12:41:47.534526 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish arp(ok) to state db
Sep  1 12:41:47.534526 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish bgp(ok) to state db
Sep  1 12:41:47.534693 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish dhcp_relay(ok) to state db
Sep  1 12:41:47.534801 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish ip2me(ok) to state db
Sep  1 12:41:47.534907 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish lacp(ok) to state db
Sep  1 12:41:47.534907 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish lldp(ok) to state db
Sep  1 12:41:47.535028 sonic NOTICE swss#coppmgrd: :- setCoppTrapStateOk: Publish udld(ok) to state db
Sep  1 12:41:47.535231 sonic NOTICE swss#coppmgrd: :- setCoppGroupStateOk: Publish default(ok) to state db
Sep  1 12:41:47.535456 sonic NOTICE swss#coppmgrd: :- setCoppGroupStateOk: Publish queue1_group1(ok) to state db
Sep  1 12:41:47.535695 sonic NOTICE swss#coppmgrd: :- setCoppGroupStateOk: Publish queue4_group1(ok) to state db
Sep  1 12:41:47.535982 sonic NOTICE swss#coppmgrd: :- setCoppGroupStateOk: Publish queue4_group2(ok) to state db
Sep  1 12:41:47.536149 sonic NOTICE swss#coppmgrd: :- setCoppGroupStateOk: Publish queue4_group3(ok) to state db
Sep  1 12:42:00.399424 sonic NOTICE swss#orchagent: :- processCoppRule: Set trap group default to host interface
Sep  1 12:42:00.399424 sonic WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 11000000000003 (name:default).
Sep  1 12:42:00.400993 sonic NOTICE swss#orchagent: :- createPolicer: Create policer for trap group default
Sep  1 12:42:00.403345 sonic NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group default:
Sep  1 12:42:00.403861 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2291]- mlnx_create_hostif_trap_group: Create trap group, #0 QUEUE=1
Sep  1 12:42:00.404152 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2338]- mlnx_create_hostif_trap_group: Created trap group Trap group 1
Sep  1 12:42:00.404668 sonic NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue1_group1
Sep  1 12:42:00.404686 sonic WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 11000000000b69 (name:queue1_group1).
Sep  1 12:42:00.405685 sonic NOTICE swss#orchagent: :- createPolicer: Create policer for trap group queue1_group1
Sep  1 12:42:00.406942 sonic NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group queue1_group1:
Sep  1 12:42:00.408504 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2291]- mlnx_create_hostif_trap_group: Create trap group, #0 QUEUE=4
Sep  1 12:42:00.408780 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2338]- mlnx_create_hostif_trap_group: Created trap group Trap group 2
Sep  1 12:42:00.409182 sonic NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue4_group1
Sep  1 12:42:00.412807 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2291]- mlnx_create_hostif_trap_group: Create trap group, #0 QUEUE=4
Sep  1 12:42:00.413056 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2338]- mlnx_create_hostif_trap_group: Created trap group Trap group 3
Sep  1 12:42:00.413458 sonic NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue4_group2
Sep  1 12:42:00.413478 sonic WARNING swss#orchagent: :- trapGroupUpdatePolicer: Creating policer for existing Trap group: 11000000000b70 (name:queue4_group2).
Sep  1 12:42:00.414384 sonic NOTICE swss#orchagent: :- createPolicer: Create policer for trap group queue4_group2
Sep  1 12:42:00.415688 sonic NOTICE swss#orchagent: :- createPolicer: Bind policer to trap group queue4_group2:
Sep  1 12:42:00.420002 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2291]- mlnx_create_hostif_trap_group: Create trap group, #0 QUEUE=4
Sep  1 12:42:00.420242 sonic NOTICE syncd#SDK: [SAI_HOST_INTERFACE.NOTICE] mlnx_sai_host_interface.c[2338]- mlnx_create_hostif_trap_group: Created trap group Trap group 4
Sep  1 12:42:00.420600 sonic NOTICE swss#orchagent: :- processCoppRule: Create host interface trap group queue4_group3
Sep  1 12:42:05.893481 sonic NOTICE swss#orchagent: :- syncd_apply_view: Notify syncd APPLY_VIEW
Sep  1 12:42:05.893481 sonic NOTICE swss#orchagent: :- notifySyncd: sending syncd: APPLY_VIEW
Sep  1 12:42:05.893779 sonic NOTICE syncd#SDK: :- processNotifySyncd: very first run is TRUE, op = APPLY_VIEW
Sep  1 12:42:06.171933 sonic NOTICE syncd#SDK: :- threadFunction: time span 278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:07.172003 sonic NOTICE syncd#SDK: :- threadFunction: time span 1278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:08.172192 sonic NOTICE syncd#SDK: :- threadFunction: time span 2278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:09.172218 sonic NOTICE syncd#SDK: :- threadFunction: time span 3278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:10.172372 sonic NOTICE syncd#SDK: :- threadFunction: time span 4278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:11.172523 sonic NOTICE syncd#SDK: :- threadFunction: time span 5278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:12.172662 sonic NOTICE syncd#SDK: :- threadFunction: time span 6278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:13.172728 sonic NOTICE syncd#SDK: :- threadFunction: time span 7278 ms for 'notify:APPLY_VIEW'
Sep  1 12:42:13.950128 sonic NOTICE syncd#SDK: :- processNotifySyncd: setting very first run to FALSE, op = APPLY_VIEW
Sep  1 12:42:14.319946 sonic NOTICE swss#orchagent: :- processCoppRule: Set trap group default to host interface
Sep  1 12:42:14.323467 sonic ERR swss#orchagent: :- trapGroupUpdatePolicer: Failed to apply attribute[2].id=0 to policer for trap group:default, error:-5
  1. DB migration failed:
root@sonic:/home/admin# /usr/local/bin/db_migrator.py -o migrate
Traceback (most recent call last):
  File "/usr/local/bin/db_migrator.py", line 816, in main
    result = getattr(dbmgtr, operation)()
  File "/usr/local/bin/db_migrator.py", line 768, in migrate
    next_version = getattr(self, version)()
AttributeError: 'DBMigrator' object has no attribute 'version_2_0_4'
'DBMigrator' object has no attribute 'version_2_0_4'
usage: db_migrator.py [-h] [-o operation migrate, set_version, get_version] [-s unix socket] [-n asic namespace]

optional arguments:
  -h, --help            show this help message and exit
  -o operation (migrate, set_version, get_version)
                        operation to perform [default: get_version]
  -s unix socket        the unix socket that the desired database listens on
  -n asic namespace     The asic namespace whose DB instance we need to connect

@yxieca
Copy link
Contributor

yxieca commented Sep 2, 2022

This change (sonic-net/sonic-utilities#2272) needs to be cherry-picked into 202111 branch.

@stepanblyschak
Copy link
Collaborator

@yxieca What is required to overcome this issue? I assume we need to upgrade old 202111 to 202111 with sonic-net/sonic-utilities#2272 and then to 202205?
I am looking at sonic-net/sonic-utilities#2272 and don't see migration functions for 2_0_X versions so I believe this will not work.

@vaibhavhd vaibhavhd changed the title [202205][warm-boot]: DB migration doesn't work [202111 -> 202205][warm-boot]: DB migration doesn't work Oct 3, 2022
@vaibhavhd
Copy link
Contributor

We need this PR on 202111 branch for upgrade path 202111->202205 to work.
sonic-net/sonic-utilities#2272

The tags were added to PR 2272, but the commit has not been picked so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants