Improve event processing performance by passing host_id through event data #5514

AlanCoding · 2019-12-16T15:02:21Z

ISSUE TYPE

Feature Idea

SUMMARY

Proposal to eliminate the need to do a query on every job event save to obtain the database id of the host it is operating on.

This logic would be replaced:

awx/awx/main/models/events.py

Lines 354 to 367 in 112f896

    
           if self.job.inventory.kind == 'smart': 
        
               # optimization to avoid calling inventory.hosts, which 
        
               # can take a long time to run under some circumstances 
        
               from awx.main.models.inventory import SmartInventoryMembership 
        
               membership = SmartInventoryMembership.objects.filter( 
        
                   inventory=self.job.inventory, host__name=self.host_name 
        
               ).first() 
        
               if membership: 
        
                   host_id = membership.host_id 
        
               else: 
        
                   host_id = None 
        
           else: 
        
               host_qs = self.job.inventory.hosts.filter(name=self.host_name) 
        
               host_id = host_qs.only('id').values_list('id', flat=True).first()

The host id can be obtained in the script data if using these parameters /api/v2/inventories/45/script/?towervars=1&hostvars=1, example:

    "_meta": {
        "hostvars": {
            "change_of_vars": {
                "dynamic_2019_10_29_15_23_48.194654": "host_static_value",
                "static_key": "host_dynamic_2019_10_29_15_28_08.940019",
                "dynamic_2019_10_29_15_24_26.098715": "host_static_value",
                "dynamic_2019_10_29_15_28_08.940019": "host_static_value",
                "remote_tower_enabled": "true",
                "remote_tower_id": 370
            },
            "moover-2019_10_29_15_28_08.940019": {
                "static_var": "static_value",
                "remote_tower_enabled": "true",
                "remote_tower_id": 373
            }
        }
    }
}

This proposal is to apply those parameters to the normal pre-job-run inventory script generation. Then when saving events, obtain remote_tower_id from hostvars. This will have to be done in ansible-runner. That will need to be added into the event data, then discovered in the event saving code, and used instead of the query to obtain the host id.

Related to or maybe part of #4209 #5443. This can be done and tested separately before the main feature work. Ping @chrismeyersfsu

The text was updated successfully, but these errors were encountered:

chrismeyersfsu · 2019-12-16T15:05:48Z

remote_tower_id sounds too much like a variable intended for the multiple tower inventory feature. How about tower_host_id

we've always performed these (fairly expensive) queries *on every event save* - if you're processing tens of thousands of events in short bursts, this is way too slow see: ansible#5514

additionaly, optimize away several per-event host lookups and changed/failed propagation lookups we've always performed these (fairly expensive) queries *on every event save* - if you're processing tens of thousands of events in short bursts, this is way too slow this commit also introduces a new command for profiling the insertion rate of events, `awx-manage callback_stats` see: ansible#5514

kdelee · 2020-01-21T20:47:17Z

Could this change have caused this traceback in logs while saving events?

2020-01-21 17:56:54,831 ERROR    awx.main.commands.run_callback_receiver Database Error Saving Job Event
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
    return self.connection.commit()
psycopg2.errors.ForeignKeyViolation: insert or update on table "main_inventoryupdateevent" violates foreign key constraint "main_inventoryupdate_inventory_update_id_8974f1f7_fk_main_inve"
DETAIL:  Key (inventory_update_id)=(2038) is not present in table "main_inventoryupdate".


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/callback.py", line 57, in flush
    cls.objects.bulk_create(events)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 480, in bulk_create
    obj_without_pk._state.db = self.db
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/transaction.py", line 240, in __exit__
    connection.commit()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 262, in commit
    self._commit()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
    return self.connection.commit()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/base/base.py", line 240, in _commit
    return self.connection.commit()
django.db.utils.IntegrityError: insert or update on table "main_inventoryupdateevent" violates foreign key constraint "main_inventoryupdate_inventory_update_id_8974f1f7_fk_main_inve"
DETAIL:  Key (inventory_update_id)=(2038) is not present in table "main_inventoryupdate".


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.ForeignKeyViolation: insert or update on table "main_inventoryupdateevent" violates foreign key constraint "main_inventoryupdate_inventory_update_id_8974f1f7_fk_main_inve"
DETAIL:  Key (inventory_update_id)=(2038) is not present in table "main_inventoryupdate".


..... lots of tracebacks ... eventually stops and things go on working....

AlanCoding · 2020-01-21T20:54:57Z

that looks more related to teardown code. Inventory updates don't link from events to the host (only jobs and ad hoc commands do). What you're seeing sounds like someone had a running inventory update and deleted the inventory update, or related inventory, or related organization. We have logic to prevent that, but it might not be airtight.

Or it could be that the inventory update finished and was deleted, but not all of its events had finished processing. In that case, the tracebacks are not necessarily bad.

ryanpetrello · 2020-01-21T20:55:41Z

@kdelee that error is what @AlanCoding suggested, and isn't unique to this change - it occurs if the callback receiver tries to save an inventory update event that's related to an inventory update that no longer exists

you can reproduce this pretty easily by:

Stopping the callback receiver
Launching an inventory update
After state='successful (though no stdout will show), delete the inventory update
Start the callback receiver and observe these errors

as long as things recover and stdout still flows for new jobs after this, errors in the logs for this situation are fine/expected

kdelee · 2020-01-21T21:09:58Z

ok cool

elyezer · 2020-02-12T18:04:50Z

Did the following to verify this:

Created an inventory with about 4 hosts
Ran a job template that interacted with all the hosts
Checked if the runner related events had the host information present on the summary_fields. Also checked that the event fields host and host_name had the information set accordingly.

See below an example of the following event for one of the hosts:

        {
            "id": 559,
            "type": "job_event",
            "url": "/api/v2/job_events/559/",
            "related": {
                "job": "/api/v2/jobs/34/",
                "children": "/api/v2/job_events/559/children/",
                "host": "/api/v2/hosts/1/"
            },
            "summary_fields": {
                "host": {
                    "id": 1,
                    "name": "localhost",
                    "description": ""
                },
                "job": {
                    "id": 34,
                    "name": "Demo Job Template",
                    "description": "",
                    "status": "successful",
                    "failed": false,
                    "elapsed": 3.594,
                    "type": "job",
                    "job_template_id": 7,
                    "job_template_name": "Demo Job Template"
                },
                "role": {}
            },
            "created": "2020-02-12T16:12:21.842052Z",
            "modified": "2020-02-12T16:12:21.946018Z",
            "job": 34,
            "event": "runner_on_ok",
            "counter": 8,
            "event_display": "Host OK",
            "event_data": {
                "playbook": "hello_world.yml",
                "playbook_uuid": "dfb499d2-c070-425e-9d01-882e606180e7",
                "play": "Hello World Sample",
                "play_uuid": "0242ac12-0005-6c79-06b4-000000000006",
                "play_pattern": "all",
                "task": "Hello Message",
                "task_uuid": "0242ac12-0005-6c79-06b4-000000000008",
                "task_action": "debug",
                "task_args": "",
                "task_path": "/tmp/awx_34_ezqkhwpa/project/hello_world.yml:4",
                "host": "localhost",
                "remote_addr": "localhost",
                "res": {
                    "msg": "Hello World!",
                    "_ansible_verbose_always": true,
                    "_ansible_no_log": false,
                    "changed": false
                },
                "event_loop": null,
                "uuid": "c681c3d4-83ce-423d-8a11-46bd2c616e37"
            },
            "event_level": 3,
            "failed": false,
            "changed": false,
            "uuid": "c681c3d4-83ce-423d-8a11-46bd2c616e37",
            "parent_uuid": "0242ac12-0005-6c79-06b4-000000000008",
            "host": 1,
            "host_name": "localhost",
            "playbook": "hello_world.yml",
            "play": "Hello World Sample",
            "task": "Hello Message",
            "role": "",
            "stdout": "\u001b[0;32mok: [localhost] => {\u001b[0m\r\n\u001b[0;32m    \"msg\": \"Hello World!\"\u001b[0m\r\n\u001b[0;32m}\u001b[0m",
            "start_line": 7,
            "end_line": 10,
            "verbosity": 0
        }

With this we can consider this issue as verified.

ryanpetrello added component:api priority:medium type:enhancement labels Jan 2, 2020

ryanpetrello self-assigned this Jan 9, 2020

ryanpetrello added state:in_progress and removed state:needs_devel labels Jan 9, 2020

ryanpetrello mentioned this issue Jan 9, 2020

heavily optimize the write speed of the callback receiver #5618

Merged

ryanpetrello added state:needs_test and removed state:in_progress labels Jan 14, 2020

elyezer self-assigned this Feb 12, 2020

elyezer closed this as completed Feb 12, 2020

elyezer removed the state:needs_test label Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve event processing performance by passing host_id through event data #5514

Improve event processing performance by passing host_id through event data #5514

AlanCoding commented Dec 16, 2019

chrismeyersfsu commented Dec 16, 2019

kdelee commented Jan 21, 2020

AlanCoding commented Jan 21, 2020

ryanpetrello commented Jan 21, 2020 •

edited

Loading

kdelee commented Jan 21, 2020

elyezer commented Feb 12, 2020

Improve event processing performance by passing host_id through event data #5514

Improve event processing performance by passing host_id through event data #5514

Comments

AlanCoding commented Dec 16, 2019

ISSUE TYPE

SUMMARY

chrismeyersfsu commented Dec 16, 2019

kdelee commented Jan 21, 2020

AlanCoding commented Jan 21, 2020

ryanpetrello commented Jan 21, 2020 • edited Loading

kdelee commented Jan 21, 2020

elyezer commented Feb 12, 2020

ryanpetrello commented Jan 21, 2020 •

edited

Loading