Skip to content

Commit

Permalink
Merge pull request #51 from digital-land/incidentLog/performanceDbBugs
Browse files Browse the repository at this point in the history
updated incident log for recent page errors on submit
  • Loading branch information
DilwoarH authored Oct 10, 2024
2 parents 330b014 + d116a3d commit 1cce138
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions docs/run-book.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,42 @@ information from the document.

## Incident Response History

### Broken pages on [submit](https://submit.planning.data.gov.uk/) service - 2024-10-02

#### In attendance

In attendance:

* Providers team
* Owen

#### Description

the dataset details page stopped working indicating parameters were incorrect

#### Running log

- On October 2nd at 10:27 AM, an 'invalid parameters' error was reported when accessing certain URLs in staging and production environments.
- At 10:53 AM, investigation began to identify the cause of the issue, which was found to be related to a table rename on the performance database. see [this PR](https://github.com/digital-land/digital-land-builder/pull/29)
- At 10:56 AM, a fix was identified and [a PR](https://github.com/digital-land/submit/pull/496) was created to resolve the issue.
- At 11:09 AM, [the fix](https://github.com/digital-land/submit/pull/496) was reviewed and approved.
- At 11:31 AM, the issue was confirmed as resolved.
- At 12:13 PM, a related issue was reported with the summary table on the overview page showing incorrect metrics for each dataset, which was found to be related to changes to the performance database. see [#29](https://github.com/digital-land/digital-land-builder/pull/29) and [#31](https://github.com/digital-land/digital-land-builder/pull/31/files)
- At 1:11 PM, [a PR](https://github.com/digital-land/submit/pull/504) was created to fix the related issue.
- At 2:20 PM, [the PR](https://github.com/digital-land/submit/pull/504) was reviewed and approved.

#### Postmortem
The root cause of the incident was changes to the performance database, which broke the queries used by the [submit](https://submit.planning.data.gov.uk/) service.
specifically the renaming of the table 'column_field_summary' to 'endpoint_dataset_resource_summary' as well as two of the columns within that table


### Actions to Prevent Similar Incidents in the Future

- Implement regression testing to ensure changes to the database schema do not break the application
- Consider using an API to interact with the database, which would allow for easier testing and validation of changes
- Improve communication and coordination between teams to prevent similar incidents from occurring in the future
- Ensure adequate smoke tests are created to test all users journeys in production / staging environments

### Slow Running Queries on Server - 2024-09-17

#### In attendance
Expand Down

0 comments on commit 1cce138

Please sign in to comment.