Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation New Guides & Updates for 1.4.0 - Round 1 #200

Merged
merged 49 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
1204c97
May 2024 Docs Images Part 1
alliomeria May 3, 2024
cdda1f8
Updated SBR home image
alliomeria May 3, 2024
2860512
Update strawberryrunners.md
alliomeria May 3, 2024
f837730
New SBR Subtitle + IIIF Server Settings docs
alliomeria May 3, 2024
d2a6830
Update strawberryrunners_pager_ocr.md
alliomeria May 7, 2024
beeea1e
Update strawberryrunners_webpage_text.md
alliomeria May 7, 2024
66b9132
Update strawberryrunners_webpage_text.md
alliomeria May 7, 2024
bf60ff5
Update strawberryrunners_wacz_binary.md
alliomeria May 7, 2024
babc48b
Update find_and_replace.md
alliomeria May 7, 2024
538a27a
Update find_and_replace.md
alliomeria May 7, 2024
9313e8d
Update iiif_server_settings.md
alliomeria Jun 17, 2024
4301bcc
IIIF Content Search Overview
alliomeria Jun 17, 2024
7a15d95
Update iiif-content-search.md
alliomeria Jun 17, 2024
3619737
Update index.md
alliomeria Jun 17, 2024
fa15c3b
Update inthewild.md
alliomeria Jun 17, 2024
59f0dde
Update strawberryfield-formatters.md
alliomeria Jun 17, 2024
7d3a14c
Update webformsasinput.md
alliomeria Jun 18, 2024
6c4c534
Update mkdocs.yml
alliomeria Jun 18, 2024
3c08f7b
Update mkdocs.yml
alliomeria Jun 18, 2024
db6e80a
Create 101_guides_list.md
alliomeria Jun 18, 2024
5877067
Update 101_guides_list.md
alliomeria Jun 18, 2024
4ec8281
Update 101_guides_list.md
alliomeria Jun 18, 2024
b49125e
Update 101_guides_list.md
alliomeria Jun 18, 2024
87ca45d
Update 101_guides_list.md
alliomeria Jun 18, 2024
9761b05
Update mkdocs.yml
alliomeria Jun 18, 2024
61f3b82
Update index.md
alliomeria Jun 18, 2024
405f648
Update presentations_events.md
alliomeria Jun 18, 2024
a1c8700
Update webformsasinput.md
alliomeria Jun 18, 2024
7ccdc25
Update webformsasinput.md
alliomeria Jun 18, 2024
79097f0
Add files via upload
alliomeria Jun 18, 2024
42332b5
Update webformsasinput.md
alliomeria Jun 18, 2024
abb3806
Add files via upload
alliomeria Jun 18, 2024
3412a8b
Rename displays-modes-2024.png to display-modes-2024.png
alliomeria Jun 18, 2024
9fb8cf3
Update webformsasinput.md
alliomeria Jun 18, 2024
ba9bf83
more updated images
alliomeria Jun 18, 2024
a84e644
Update webformsasinput.md
alliomeria Jun 18, 2024
0aea479
Update strawberryfield-formatters.md
alliomeria Jun 18, 2024
af55ccf
Add files via upload
alliomeria Jun 18, 2024
a5fe454
Update webformsasinput.md
alliomeria Jun 18, 2024
f3dd288
Update inthewild.md
alliomeria Jun 18, 2024
c9ebaf0
Update mkdocs.yml
alliomeria Jun 18, 2024
ca47108
Update docs/iiif-content-search.md
alliomeria Jun 18, 2024
5f9ce65
Update strawberryrunners_subtitle.md
alliomeria Jun 18, 2024
88b0842
Update end of strawberryrunners_subtitle.md
alliomeria Jun 18, 2024
49eba8c
Update docs/webformsasinput.md
alliomeria Jun 18, 2024
759d3eb
Update iiif-content-search.md
alliomeria Jun 18, 2024
68625b3
Update find_and_replace.md
alliomeria Jun 18, 2024
164ba16
Update iiif_server_settings.md
alliomeria Jun 18, 2024
86835e6
Update iiif_server_settings.md
alliomeria Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions docs/101_guides_list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: Archipelago 101 - Core Documentation Guides
tags:
- Archipelago 101
- Documentation
---


# Archipelago 101: Core Documentation Guides

Top 10 guides we recommend you review as you get started working with Archipelago:

1. [Metadata in Archipelago](metadatainarchipelago.md): a long and worthwhile read that covers the fundamentals of Archipelago's architecture and approach to metadata and data

2. [Strawberryfield Formatters](strawberryfield-formatters.md): overview of the general setup of an Archipelago Digital Object (ADO) page and the way your ADO JSON metadata and data are output

3. [Primer on Display Modes & How to Create a Webform as an Input Method](webformsasinput.md): deeper look at Display Modes and Form Modes, two ways you'll be interacting with your ADOs most frequently

4. [Twig Templates and Archipelago](metadatatwigs.md): a great place to dive into one of Archipelago's best loved feature areas

5. [Archipelago Multi Importer](ami_index.md): all about Archipelago's batch ingest and update functionality

6. [Search and Solr Overview](search_solr_index.md): for repositories, it's all about the search
* [In-a-nutshell : JSON data to Strawberry Keyname Providers to Solr](search_solr_index.md#in-a-nutshell-json-data-to-strawberry-keyname-providers-to-solr): essential overview of the pipeline from JSON data into and out of Solr
* [Strawberry Key Name Providers, Solr Field, and Facet Configuration](strawberry_key_name_providers.md): fundamental information for site adminisrators

7. [Advanced Batch Find and Replace](find_and_replace.md): targetted batch updates for your ADO metadata

8. [Strawberry Runners Post-Processing Configuration](strawberryrunners.md): background post-processing defaults and options for all your file transformation and data indexing needs

9. [Archipelago Local Deployment Guide](archipelago-deployment-readme.md): get your own local Archipelago up and running in about 15 minutes

10. [Archipelago Presentations, Events, and Additional Resources](presentations_events.md): features recordings and links to different Archipelago workshops, conference presentations, and other helpful references

___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.
8 changes: 8 additions & 0 deletions docs/find_and_replace.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,14 @@ After reviewing the 'Important Notes & Workflow Recommendations' below, please s

The Actions available through Archipelago's Advanced Batch Find and Replace can potentially have repository-wide effects. It is strongly recommended that you proceed with caution when executing any of the available Actions.


!!! warning "Adding New Facets"

The default Facets available through Archipelago's Advanced Batch Find and Replace have an important configuration selection made on each individual Facet. For every [new Facet you add](strawberry_key_name_providers.md) for Find and Replace, you need to select the checkboxes for both the 'VBO batch handler' settings to use the `VBO Batch Facet processor`, and the selection within the 'VBO batch handler settings' to `Use URL based facets in VBO Batches`. You need to make sure these are selected so that the "visible" list/count of objects you filter using a Facet is respected during actual VBO process execution of batch changes you make for any Find and Replace Actions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Here is a missing need/documentation part that needs to be added. VBO does not pass a "limit" (except if your VIEW has actually a "SHOW" a defined number of results which most users will never use. Because of that, when run will default to the Search API/Solr defined Limit. At
http://localhost:8001/admin/config/search/search-api/server/esmero_solr/edit (under the advanced tab). Which means if you see 1000 objects In your Find and Replace and select all and run IT and that value here is 100, only 100 will be processed. Nothing we can do around that (for now, except open an ISSUE, tag me, maybe we can find a way). We need a note for that 🐝 !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added an additional statement about this behavior, using some of the language you shared in your above comment. Let me know what think @DiegoPino

Also, please be aware that Drupal's VBO does not pass a "limit" (except if your VIEW has actually a "SHOW" a defined number of results which most users will never use). Because of that, when you run a VBO-based action, the default batch limitation will be set to the Search API/Solr defined Limit. You can view this Limit information at
'~yoursite/admin/config/search/search-api/server/esmero_solr/edit', under the Advanced Tab. This all means that if you first set a Limit of 100 in your Search API/SOLR defined Limit, then you see 1000 objects in your Find and Replace results and select all 1000 results for batch change operations, when you run your Find and Replace action only 100 changes will be processed. There is no way Archipelago can work around that VBO related behavior (for now, except open an ISSUE, perhaps a way can be found!).


## Simulation Mode

Before executing any of the available Find and Replace Actions, the best-practice workflow recommendation is to **always** first run in Simulation Mode:
Expand Down
67 changes: 67 additions & 0 deletions docs/iiif-content-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: IIIF Content Search API Integration
tags:
- IIIF
- IIIF Server Settings Form
- IIIF Content Search API
- Solr
- Solr Fields
- Solr Index
---

# IIIF Content Search API Integration

Beginning in release 1.3.0 and now fully mature in 1.4.0, Archipelago features IIIF Content Search API integration with attendant default configurations and settings.

Through a non-trifling amount of code and maths, Archipelago speaks the IIIF Content Search API language using data from your Archipelago's Digital Objects, to enable you to search within Mirador (or other supported viewers) for specific hits within OCR, VTT file, or manually created textual annotations.

Please also see the related [IIIF Server Settings Form](iiif_server_settings.md), and Strawberry Runners guides for [Reviewing and adjusting the `pager` and `ocr` Post-Processor operations](strawberryrunners_pager_ocr.md) and [Reviewing and Adjusting the `subtitle` Post-Processor operations](strawberryrunners_subtitle.md).

## 1. IIIF Manifest Templates

First, Archipelago's default IIIF Manifest templates explicitly state that they support the 3 versions of IIIF Content Search APIS in the 'service' key.

```JSON
"service": [
{
"id": "{{ baseurl }}iiifcontentsearch/v2/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"type": "SearchService2"
},
{
"id": "{{ baseurl }}iiifcontentsearch/v1/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"type": "SearchService1",
"@context": "http://iiif.io/api/search/1/context.json",
"profile": "http://iiif.io/api/search/1/search"
},
{
"@id": "{{ baseurl }}iiifcontentsearch/v1/do/{{ node.uuid.value }}/metadatadisplayexposed/iiifmanifest/mode/advanced/page/0",
"@context": "http://iiif.io/api/search/0/context.json",
"profile": "http://iiif.io/api/search/0/search"
}
],
```

## 2. API Endpoints Exposure

Next, in the default Exposed Metadata Endpoints API Endpoints (generated from the IIIF templates), Archipelago provides the specific structure needed for the IIIF Content Search API. Archipelago passes the data about “the template containing it”, the IIIF API version, if simple or advanced, and the Archipelago Digital Object resource UUID we are searching against (the one that contains the RAW data feeding the template, or at least the Top level/parent one of that).

## 3. Pathway into and out of the Solr Index

Then, Archipelago's backend recreates an ADO's IIIF manifest using this data (basically repeats what the client did before), but uses JMESPATHs to extract just what is needed, flipping the order of the structure and putting IIIF Image IDs, as "top keys" referencing canvases and their #xywh selectors (for the annotation text), if present.

Using this transformed data, Archipelago's backend search is able to be limited to OCR generated only by those images (importantly, as Archipelago repositories can contain millions of OCR'd documents). Archipelago's internal search then returns natively, via the [Bavarian State Library’s Solr OCR highlight plugin](https://github.com/dbmdz/solr-ocrhighlighting/), the relevant hits within a specified ADO. These are then reprocessed to be IIIF compliant (W3C) annotations and then reverted back to results as “canvases with images”.

## Things to keep in mind

- To make this performant, Archipelago uses two levels of caches that get invalidated automatically on any "ingredient" used modification.

- Archipelago can also tell the backend to use a "different" template than the one used at the front (Mirador), allowing you to define which "canvases" are searchable. This is not a normal use case, but still a valid one. And you can, per resource, have complex logic and/or different Viewers, even on a one by one basis.

### Acknowledgements

Archipelago's developers would like to extend our gratitude to our community, especially to [Mike](https://github.com/digitaldogsbody) and [Johannes](https://github.com/jbaiter) for their work and help, and everyone else in the IIIF and repository communities for all the amazing tools, viewers, specs and cookbook examples.

___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.

86 changes: 86 additions & 0 deletions docs/iiif_server_settings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
title: IIF Server Settings Form Default Settings
tags:
- IIIF
- IIIF Server Settings Form
- IIIF Content Search API
- Solr
- Solr Fields
- Solr Index
---

# IIIF Server Settings Form Default Settings

The IIIF Server Settings Form is used to configure different IIIF related settings used throughout your Archipelago environment. We strongly advise keeping the default settings intact. The necessary [Solr Fields](strawberry_key_name_providers.md#creating-a-solr-field) listed below should be setup by default.

You can find the IIIF Configuration Form:

- Through the `Manage` menu > `Configuration` > `Archipelago` > `Configure Strawberry Runners Post Processors`
- Directly at `/admin/config/archipelago/iiif`

![IIIF Server Settings Form](images/iiif_server_settings_form.png)

On the IIIF Server Settings Form page, you will see the following:


1. Note that these 'IIIF Server configuration URLs are used as defaults for field formatters using IIIF, but can be overridden on a one by one basis when setting up your formatters for each Display Mode.'

2. Base URL of your IIIF Media Server public accessible from the Outside World.
- Please provide a publicly accessible IIIF server URL. This URL will be used for AJAX and JS calls. Trailing Slashes will be removed.
- Set to `http://localhost:8183/iiif/2` by default.
- We do not recommend changing this selection.

3. Base URL of your IIIF Media Server accessible from inside this Webserver.
- Please provide Internal IIIF server URL. This URL will be used by Internal Server calls and needs to be locally accessible by your server, e.g 127.0.0.1 or an local Docker alias. Trailing Slashes will be removed.
- Set to `http://esmero-cantaloupe:8182/iiif/2` by default.
- We do not recommend changing this selection.

4. Checkbox to 'Enable IIIF Content Search API V1 and V2 endpoints'.
- Checked by default in later (1.4.0+) versions of Archipelago.
- See the [related (and essential) IIIF Manifest snippet shared here](iiif-content-search.md#1-iiif-manifest-templates)
- APIs are accesible at the following path: "/iiifcontentsearch/{version}/do/{node_uuid}/metadatadisplayexposed/{metadataexposeconfig_entity}/mode/{mode}/page/{page}" with:
- {version} one of [v1,v2]
- {node_uuid} the UUID of the ADO whose Manifest you want to search inside
- {metadataexposeconfig_entity} the machine name of the exposed Metadata Display endpoint used to render the Manifest that is calling the API (e.g iiifmanifest)
- {mode} one of [simple,advanced]. Advanced is the smartest choice. Simple is faster, but requires your Canvas ids to be exactly in this pattern http(s)://domain.ext/do/{node_uuid}/{file_uuid}/canvas/{internal_to_the_file_sequence_order}
- {page} 0 to N depedening on the Number of results. By default please use 0

5. Checkbox to 'Only allow searches inside a Manifest If the Manifest itself (for an ADO) defines the Search Endpoints as a Service'
- Checked by default in later (1.4.0+) versions of Archipelago.
- If enabled we will double check if the calling IIIF Manifest defines the Endpoint(s) in the `service` key. If unchecked any Manifest will be searchable by calling an API URL directly.

6. IIIF Content Search API: field(s) that holds Parent Nodes
- Strawberry Flavor Data Source Search API Fields that can be used to connect a Strawberry Flavor to a Parent AD0.
- Default specified fields are: `Strawberryfield Flavor Datasource >> SBF Parent ID` and `Strawberryfield Flavor Datasource >> SBF Parent Node >> isPartOf >> ID`

7. Strawberry Runner processors that should be searched against for visual highlights.
- e.g Strawberry Flavor Data might have been generated by the "ocr" strawberry runners processor. A comma separated list of processors (machine names) that generated miniOCR.
- Default is: `ocr`
- If you are using the [Strawberry Runners `pager` and `ocr` post-processors](strawberryrunners_subtitle.md), you should always keep this enabled.

8. Strawberry Runner processors that should be searched against for time based media.
- e.g Strawberry Flavor Data might have been generated by the "subtitle" strawberry runners processor. These will have time based fragments and will match IIIF Annotations with motivation supplementing and target the time based media on the parent Canvas. A comma separated list of processors (machine names) that generated time based transcripts encoded as miniOCR.
- Default is: `subtitle`

9. Check to 'Target the VTT Supplementing Annotation'
- If enabled (aligned with the specs) the target of a hit result will point to the supplementing Annotation containing in its body the VTT file. If not the Canvas containing in its body a Media Resource (less precise but more compatible with Viewers
- If you are using the [Strawberry Runners `subtitle` post-processor](strawberryrunners_subtitle.md), you should always keep this enabled.

10. Strawberry Runner processors that should be searched against plain text extractions.
- e.g Strawberry Flavor Data might have been generated by the "text" strawberry runners processor. These will not have coordinates but will match IIIF Annotations with motivation supplementing and target the whole canvas. A comma separated list of processors (machine names) that generated time based transcripts encoded as miniOCR.
- Default is: `text`
- If you are using the [Strawberry Runners `subtitle` post-processor](strawberryrunners_subtitle.md), you should always keep this enabled.

11. IIIF Content Search API: field(s) that hold the URI of the File that produced the Searchable content
- Strawberry Flavor Data Source Search API Fields that hold the URI of the File that generated its content.
- Default specified fields are: `Strawberryfield Flavor Datasource >> Parent File`, `Strawberryfield Flavor Datasource >> SBF source or related URI/URL`, and` Strawberryfield Flavor Datasource >> Parent File >> URI`

12. IIIF Content Search API: Max Results per Page
- Default is: `25`

13. IIIF Content Search API: Max allowed characters/length for a Search term
- Default is: `64`

___

Return to the main [Strawberry Runners](strawberryrunners.md) or the [Archipelago Documentation main page](index.md).
Binary file added docs/images/ado-type-to-view-mode-mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/display-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/forms-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/iiif_server_settings_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-display-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-display-coll.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/manage-form-display-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/managing-display-modes-2024.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/sbr_subtitle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/strawberryrunnershome_updated.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 10 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# Archipelago Commons Intro

Archipelago Commons, or simply Archipelago, is an Open Source Digital Objects Repository / DAM Server Architecture based on the popular CMS [`Drupal 9/10`](https://www.drupal.org) and released under [`GLP V.3 License`](https://www.gnu.org/licenses/gpl-3.0.txt). Archipelago is developed and supported at the [Metropolitan New York Library Council (METRO)](https://metro.org).
Archipelago Commons, or simply Archipelago, is an Open Source Digital Objects Repository / DAM Server Architecture based on the popular CMS [`Drupal 9/10+`](https://www.drupal.org) and released under [`GLP V.3 License`](https://www.gnu.org/licenses/gpl-3.0.txt). Archipelago is developed and supported at the [Metropolitan New York Library Council (METRO)](https://metro.org).

Archipelago is a mix of deeply integrated custom-coded Drupal modules (made with care by us) and a curated and well-configured Drupal instance, running under a discrete and well-planned set of service containers. Learn more about the different [`Software Services`](devops.md) used by Archipelago.
Archipelago is a mix of deeply integrated custom-coded Drupal modules (made with care by us, the [Digital Services Team and METRO](https://metro.org/digital-services)) and a curated and well-configured Drupal instance, running under a discrete and well-planned set of complementary additional service containers. You can learn more about the different [Software Services used by Archipelago here](devops.md), and [Archipelago's unique approach to Metadata here](metadatainarchipelago.md).

Archipelago's primary focus is to serve the greater [`GLAM community`](https://en.wikipedia.org/wiki/GLAM_(industry_sector)) by providing a flexible, consistent, and unified way of describing, storing, linking, exposing metadata and media assets. We respect identities and existing workflows. We endeavor to design Archipelago in ways that empower communities of every size and shape.
Archipelago's primary focus is to serve the greater [`GLAM community`](https://en.wikipedia.org/wiki/GLAM_(industry_sector)) (libraries, archives, museums, universities and colleges, cultural heritage organizations) by providing a flexible, consistent, and unified way of describing, storing, linking, exposing metadata and media assets that make up rich repository collections all around our shared beautiful world. We respect identities and existing workflows, and we endeavor to design Archipelago in ways that empower communities of every size, shade, and shape.

Finally, Archipelago tries to stay humble, slim, and nimble in nature with a small codebase full of inline comments and `@todos`. All of our work is driven by a clear and [concise but thoughtful planned technical roadmap --updated in tandem with new releases](https://github.com/esmero/archipelago-deployment/issues/243).

We recommend you start with the [Core Documentation Guides listed here](101_guides_list.md) as you begin your Archipelago explorations.
___

Thank you for reading! Please contact us on our [Archipelago Commons Google Group](https://groups.google.com/forum/#!forum/archipelago-commons) with any questions or feedback.

Finally, Archipelago tries to stay humble, slim, and nimble in nature with a small code base full of inline comments and `@todos`. All of our work is driven by a clear and [concise but thoughtful planned technical roadmap --updated in tandem with new releases](https://github.com/esmero/archipelago-deployment/issues/243).
Loading