Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting pipeline documentation #17

Merged
merged 23 commits into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5de600c
new: start harvesting pipeline documentation
nickumia-reisys Sep 11, 2023
5894911
refactor: move to docs folder
nickumia-reisys Sep 11, 2023
0e30190
new: generate diagrams from code
nickumia-reisys Sep 12, 2023
09ce4ee
new: add trigger condition to pipeline
nickumia-reisys Sep 12, 2023
8747afe
new/docs: dcat gather stage
nickumia-reisys Sep 13, 2023
ed2479e
refactor: uniformalize all arrows
nickumia-reisys Sep 14, 2023
37fe035
new: finish dcat?
nickumia-reisys Sep 19, 2023
1c001cf
new: single xml doc diagram
nickumia-reisys Sep 19, 2023
49751e9
refactor: detangle diagram by duplicating references
nickumia-reisys Sep 19, 2023
e1aa69b
refactor: introduce subgraphs to organize stages
nickumia-reisys Sep 20, 2023
de73d75
update: gather iso/non-iso
nickumia-reisys Sep 20, 2023
951fa61
refactor/update: use subgraphs on dcat diagram + mark end of for loops
nickumia-reisys Sep 21, 2023
5120485
new: waf gather completed
nickumia-reisys Sep 25, 2023
084881a
new: waf fetch completed
nickumia-reisys Sep 25, 2023
f1a2172
new: waf import completed
nickumia-reisys Sep 25, 2023
75491d4
docs: add code reference to complex functions
nickumia-reisys Sep 25, 2023
d2662be
new: initial pass at arcgis pipeline
nickumia-reisys Sep 25, 2023
34f2f00
test appearance of mermaid in markdown
btylerburton Nov 28, 2023
73435d4
adds ability to render mermaid in markdown and script to export SVG
btylerburton Dec 13, 2023
0dcccaa
adds new compare doc
btylerburton Dec 13, 2023
b10ab7d
fix typo
btylerburton Dec 13, 2023
25f8e3b
Merge branch 'main' into data-pipeline-docs
btylerburton Dec 19, 2023
4c16eb9
Merge branch 'main' into data-pipeline-docs
btylerburton Jan 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ venv/
credentials.py
dist/

# node
node_modules/
# any previous versions of schemas
**/dataset_**.json
**/catalog_**.json
Expand Down
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/arcgis-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/arcgis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./arcgis-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/dcat-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/dcat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./dcat-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/h20_compare_dcat-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/h20_compare_dcat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./h20_compare_dcat-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/new_harvesting-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/new_harvesting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./new_harvesting-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/old_harvesting-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/old_harvesting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./old_harvesting-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/single_xml-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/single_xml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./single_xml-1.svg)
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/waf_xml-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/waf_xml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./waf_xml-1.svg)
20 changes: 20 additions & 0 deletions docs/diagrams/mermaid/makeDoc.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import { run } from "@mermaid-js/mermaid-cli"
import { readdir } from 'node:fs/promises';
import { resolve } from 'node:path';

(async (req, res) => {
try {
let fileSrc = resolve('./docs/diagrams/mermaid/src')
let fileDest = resolve('./docs/diagrams/mermaid/dest')
const files = await readdir(fileSrc);
for (const file of files) {
console.log(`Found file: ${file}`);
await run(
`${fileSrc}/${file}`, `${fileDest}/${file}`, {puppeteerConfig: {"headless": "old"}},
)
console.log(` `)
}
} catch (err) {
console.error(err)
}
})();
115 changes: 115 additions & 0 deletions docs/diagrams/mermaid/src/arcgis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
```mermaid
flowchart LR

%% Algorithm
gather_stage ==> fetch_stage
fetch_stage ==> import_stage

subgraph gather_stage [Gather Stage]
direction TB
gs([GATHER STARTED])
ge([GATHER ENDED])
gs ==> is_extra_search_criteria
is_extra_search_criteria == Yes ==> add_to_query
is_extra_search_criteria == No ==>basic_query
add_to_query ==> basic_query
basic_query ==> get_for_all_time
get_for_all_time ==> query_arcgis
query_arcgis ==> get_current_objects
get_current_objects ==> compute_new
compute_new ==> create_object
compute_new ==> compute_deleted
compute_deleted ==> create_object
compute_deleted ==> compute_changed
compute_changed ==> is_date_different
is_date_different == Yes ==> create_object
is_date_different == No ==> skip
compute_changed ==> ge
end
subgraph fetch_stage [Fetch Stage]
direction TB
fs([FETCH STARTED])
fe([FETCH ENDED])
fs ==> do_nothing
do_nothing ==> fe
end
subgraph import_stage [Import Stage]
direction TB
is([IMPORT STARTED])
ie([IMPORT ENDED])
is ==> is_object_empty
is_object_empty-. Yes .-> skip_2
is_object_empty == No ==> get_existing_object
get_existing_object ==> is_existing_object
is_existing_object == Yes ==> mark_not_current
is_existing_object == No ==> is_delete
mark_not_current ==> is_delete
is_delete == Yes ==> delete
delete ==> is_object_content_empty
is_object_content_empty-. Yes .-> error
is_object_content_empty == No ==> make_package_dict
%% Code: https://github.com/GSA/ckanext-geodatagov/blob/984dc47087f981c15f7878bef5a96970adb78125/ckanext/geodatagov/harvesters/arcgis.py#L338-L431
make_package_dict-. error .-> ie
make_package_dict ==> is_status_new
is_status_new == Yes ==> default_create_package_schema
is_status_new == No ==> default_update_package_schema
default_update_package_schema ==> is_status_new_2
default_create_package_schema ==> is_status_new_2
is_status_new_2 == Yes ==> generate_guid
generate_guid ==> save_object_reference
save_object_reference ==> create
is_status_new_2 == No ==> is_status_changed
is_status_changed == Yes ==> is_existing_object_2
is_existing_object_2 == Yes ==> mark_not_current
is_existing_object_2 == No ==> update
mark_not_current ==> update
update ==> ie
create ==> ie
is_status_changed == No ==> ie
end

%% Data
error[\Error/]
skip[/Skip\]
skip_2[/Skip\]

%% Functons
%% Code: https://github.com/ckan/ckan/blob/master/ckan/logic/schema.py#L115-L194
default_update_package_schema[[Default Update]]
default_create_package_schema[[Default Create]]

create_object[[Create New Object]]
update[[Update Dataset]]
do_nothing[[Nothing to do]]
create[[Create New Package]]
delete[[Delete Package]]
save_object_reference[[Save Object Reference in Package]]
generate_guid[[Generate GUID]]
get_existing_object[[Get Existing Harvest Object]]
mark_not_current[[Mark Previous Harvest Object as not current]]
add_to_query[[Add search to basic query]]
basic_query[[Query All data from all times]]
get_for_all_time[[Build data 100 rows at a time]]
query_arcgis[[Query Server]]
get_current_objects[[Get Existing Harvest Objects]]
compute_new[[Calculate new objects]]
compute_deleted[[Calculate deleted objects]]
compute_changed[[Calculate changed objects]]

%% Code: https://github.com/GSA/ckanext-geodatagov/blob/984dc47087f981c15f7878bef5a96970adb78125/ckanext/geodatagov/harvesters/arcgis.py#L338-L431
make_package_dict[[ArcGIS Package Create]]


%% Conditional Checks
is_extra_search_criteria{Are there extra search parameters?}
is_existing_object{Does the object exist?}
is_existing_object_2{Does the object exist?}
is_object_empty{Is Object Empty?}
is_object_content_empty{Is the Object content empty?}
is_delete{Should the dataset be deleted?}
is_status_new{Is the Status new?}
is_status_new_2{Is the Status new?}
is_status_changed{Is the Status changed?}
is_date_different{Is the Date different?}

```
Loading
Loading