Skip to content

Commit

Permalink
Merge pull request #147 from digital-land/entity_range_doc
Browse files Browse the repository at this point in the history
entity-organisation
  • Loading branch information
averheecke-tpx authored Nov 6, 2024
2 parents af7f2d4 + ea49fa3 commit 3cce79e
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,23 @@
This method defines the required prerequisite parameters for us.
Syntax
```
make add-data COLLECTION=[COLLECTION_NAME] INPUT_CSV=[INPUT_FILE]
```
For example
```
make add-data COLLECTION=conservation-area INPUT_CSV=import.csv
```
1. **(Optiona) Update entity-organisation.csv**
If the data that has been added is part of the `conservation-area` collection e.g `conservation-area` and `conservation-area-document`, the entity range must be added as a new row. This is done using the entities generated in `lookup`. Use the first and the last of the entity numbers of the newly generated lookups e.g if `44012346` is the first and `44012370` the last, use these as `entity-minimum` and `entity-maximum`.
For an explanation of how the file works, see [entity-organisation](Configure-an-endpoint.md).
1. **Check results**
After running the command, the endpoint.csv, lookup.csv, and source.csv should be modified.
Expand Down Expand Up @@ -124,26 +132,26 @@ When handling this type of endpoint, two possible scenarios may arise.
We might receive an endpoint that contains both Tree and TPZ data. When this happens we can usually use a `filter.csv` configuration to process a subset of the endpoint data for each dataset. Data supplied like this should have a `tree-preservation-zone-type` field for the TPZ data, which should contain one of `area`, `woodland` or `group` for TPZs and `individual` for trees.
>**NOTE!**
>`filter.csv` config for a dataset will only work with a field that is in the dataset schema, and the `tree-preservation-zone-type` is not in the `tree` schema. So if you need to filter tree data using this field, it will first need to be mapped to a field in the `tree` schema that can then be used by `filter.csv`. You can use the `tree-preservation-order-tree` field (which isn't in the website guidance or tech spec, but is in the [specification repo spec](https://github.com/digital-land/specification/blob/main/content/dataset/tree.md)), like this [example in column.csv](https://github.com/digital-land/config/blob/main/pipeline/tree-preservation-order/column.csv#L201).
> **NOTE!**
> `filter.csv` config for a dataset will only work with a field that is in the dataset schema, and the `tree-preservation-zone-type` is not in the `tree` schema. So if you need to filter tree data using this field, it will first need to be mapped to a field in the `tree` schema that can then be used by `filter.csv`. You can use the `tree-preservation-order-tree` field (which isn't in the website guidance or tech spec, but is in the [specification repo spec](https://github.com/digital-land/specification/blob/main/content/dataset/tree.md)), like this [example in column.csv](https://github.com/digital-land/config/blob/main/pipeline/tree-preservation-order/column.csv#L201).
For example:
`column.csv` config
```
dataset,endpoint,resource,column,field,start-date,end-date,entry-date
tree,d6abdbc3123bc4b60ee9d34ab1ec52dda34d67e6260802df6a944a5f7d09352b,,tree_preservation_zone_type,tree-preservation-order-tree,,,
```
`filter.csv` config
```
dataset,resource,field,pattern,entry-number,start-date,end-date,entry-date,endpoint
tree-preservation-zone,,tree-preservation-zone-type,(?!Individual),,,,,d6abdbc3123bc4b60ee9d34ab1ec52dda34d67e6260802df6a944a5f7d09352b
tree,,tree-preservation-order-tree,Individual,,,,,d6abdbc3123bc4b60ee9d34ab1ec52dda34d67e6260802df6a944a5f7d09352b
```
### Tree data with polygon instead of point
By default, the tree dataset `wkt` field (which is the incoming geometry from the resource) is mapped to `point`, with by a global mapping in `column.csv`. When a provider gives us a `polygon` data instead of a `point`, we need to add a mapping in the `column.csv`file for the specific endpoint or resource from `wkt` to `geometry` which will override the default mapping.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,20 @@ Important fields:

_I think to set a default value using another field in the resource, but uncertain how this is different to column. Need more info._

## [pipeline/entity-organisation](https://github.com/digital-land/specification/blob/main/content/dataset/entity-organisation.md?plain=1)

Used to set the entity range for organisations within the conservation-area collection. This is done to ensure that entities within a range are linked to a certain organisation in the case that we have lookup entries for the same entity from different organisations. This helps prioritise data from the authoritative source.

> _Example_
> For `the conservation-area` dataset, we have an entry for `local-authority:BAB` with a `entity-minimum` of `44005968` and `entity-maximum` for `44005997`. This sets out that any entity within that range will be part of that organisation. More ranges for that organisation and dataset can be added in following rows e.g. setting the next range as `44008683 -> 44008684`
>
> Important fields:
- `dataset` \- the dataset to target e.g `conservation-area-document`
- `organisation` \- the organisation to apply to e.g `local-authority:BAB`
- `entity-minimum` \- sets the starting point of that range (inclusive)
- `entity-maximum` \- sets the ending point of that range (inclusive)

## [pipeline/filter](https://github.com/digital-land/specification/blob/main/content/dataset/filter.md?plain=1)

Used to filter a resource so that only a subset of the records are processed, based on whether the values in one of the resources fields are in a user-defined list.
Expand All @@ -121,7 +135,7 @@ Important fields:
- `field` \- the field to search for the pattern
- `pattern` \- the pattern to search for in the field (can just be a string, _does this accept regex like in patch?_)

>**NOTE!**
> **NOTE!**
> Filter config for a dataset will only work for fields that are in the dataset schema. So if you need to filter based on a column that's in the source data and not in the schema, you will first need to map it to a schema column using `column.csv` config.
## [pipeline/lookup](https://github.com/digital-land/specification/blob/main/content/dataset/lookup.md?plain=1)
Expand Down
15 changes: 15 additions & 0 deletions docs/data-operations-manual/Tutorials/Monitoring-Data-Quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,18 @@ digital-land retire-endpoints-and-sources retire.csv

Success criteria:
No erroring endpoints listed in the query for the scope of the ticket.

## Invalid Organisations

One of our monitoring tasks is patching any `invalid organisation` issues that arise. This isually happens if the organisation value provided in the endpoint is wrong or missing e.g it could be a blank field or the wrong organisation name / identifier.

A list of invalid organisation issues can be optained by downloading a csv file from either the [issue summary table](https://config-manager-prototype.herokuapp.com/reporting/odp-summary/issue) or the [overview issue table](https://config-manager-prototype.herokuapp.com/reporting/overview) and filtering for `invalid organisations` under `issue-type`.

To fix this, we can make use of the `patch.csv` file. More information on how this file works can be found in the pipeline/patch section in [configure an endpoint](../How-To-Guides/Adding/Configure-an-endpoint.md).

For example, if we were given the wrong `organisationURI` in a `brownfield-land` dataset, we can patch it by targetting the endpoint, give the current uri in the `pattern` section, and the desired uri in the `value` section like so:

```
brownfield-land,,OrganisationURI,http://opendatacommunities.org/id/london-borough-council/hammersmith-and-,http://opendatacommunities.org/doc/london-borough-council/hammersmith-and-fulham,,,,,890c3ac73da82610fe1b7d444c8c89c92a7f368316e3c0f8d2e72f0c439b5245
```
To test it, follow the guidance in [building a collection locally](../How-To-Guides/Testing/Building-a-collection-locally) but keep the new patch entry and focus on the desired endpoint.

0 comments on commit 3cce79e

Please sign in to comment.