diff --git a/architecture/design/archive/2022-11/index.html b/architecture/design/archive/2022-11/index.html index 74179aa..d7dfc3c 100644 --- a/architecture/design/archive/2022-11/index.html +++ b/architecture/design/archive/2022-11/index.html @@ -157,6 +157,7 @@

Solution design - November 2022

The solution architecture was modelled using the C4 approach.

System Context

+

Data Provider Prepares and publishesdata.Planner Browses the availabledatasets.Data Host Hosts data providerdatasetsCollection Pipeline Collects data frompublishers, checking forerrors and merging databefore storing.Application Main application presentingdata to users in multipleformats.Map Tile API Map tile server, servingvectors for use the mainapplication user interface.Data API Data server, responding toSQL queries from the mainapplication.Archive Storage Stores data from publishersmaking it available for syncwith other applications.Publishes DataRetrieves Data[HTTPS]Stores data[HTTPS]Sync SQLite files[Lambda]Sync map tile files[ECS Task]Sync live data[ECS Task]Requests data[HTTPS]Browses Data[HTTPS]Views Map Tiles[HTTPS]

diff --git a/runbook.html b/runbook.html index e8581a4..bf7b09c 100644 --- a/runbook.html +++ b/runbook.html @@ -234,13 +234,23 @@

Common Issues

TBA

Incident Response History

Outage - Datasette - 2024-08-23

+

In attendance

+

Async on Slack:

+ +

Description

Over night we had requests on the providers site which used datasette to access data. the url used is datasette.planning.data.gov.uk. The datasette application was reporting 502 errors implying that the serve was down.

Running log

-
* At around 03:00 AM and 07:00AM the requests were made and sentry reported the events to the notification channel
-* At 08:00AM the infrastructure team reacted to and investigates these errors. It was found that the application
-

had automatically reset itself and the site was back up and operational.

+

On (2024-08-23)

+ +

Postmortem

Under further investigation it was identified that the traffic was mostly due to crawlers on the providers site. The providers team will be adding a robots.txt file to the service to stop this traffic in the future.

@@ -250,7 +260,7 @@

Postmortem

We will continue to monitor the impact of these errors after the robots.txt is implemented.

The smoke tests did not trigger an alarm implying that the service was never down for ten minutes and had fixed itself.

Outage - Check Service - 2024-07-17

-

In attendance

+

In attendance

Async on Slack: