From a52a8384ffebb2244e687aef1cbe479aa06f5b76 Mon Sep 17 00:00:00 2001 From: qqmyers Date: Thu, 13 Jun 2024 16:03:33 -0400 Subject: [PATCH] docs --- doc/release-notes/10579-avoid-solr-deletes.md | 9 +++++++++ doc/sphinx-guides/source/developers/performance.rst | 1 + doc/sphinx-guides/source/installation/config.rst | 3 +++ 3 files changed, 13 insertions(+) create mode 100644 doc/release-notes/10579-avoid-solr-deletes.md diff --git a/doc/release-notes/10579-avoid-solr-deletes.md b/doc/release-notes/10579-avoid-solr-deletes.md new file mode 100644 index 00000000000..1062a2fb78f --- /dev/null +++ b/doc/release-notes/10579-avoid-solr-deletes.md @@ -0,0 +1,9 @@ +A features flag called "reduce-solr-deletes" has been added to improve how datafiles are indexed. When the flag is enabled, +Dataverse wil avoid pre-emptively deleting existing solr documents for the files prior to sending updated information. This +should improve performance and will allow additional optimizations going forward. + +The /api/admin/index/status and /api/admin/index/clear-orphans calls +(see https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency) +will now find and remove (respectively) additional permissions related solr documents that were not being detected before. +Reducing the overall number of documents will improve solr performance and large sites may wish to periodically call the +clear-orphans API. \ No newline at end of file diff --git a/doc/sphinx-guides/source/developers/performance.rst b/doc/sphinx-guides/source/developers/performance.rst index 562fa330d75..0044899a581 100644 --- a/doc/sphinx-guides/source/developers/performance.rst +++ b/doc/sphinx-guides/source/developers/performance.rst @@ -121,6 +121,7 @@ While in the past Solr performance hasn't been much of a concern, in recent year We are tracking performance problems in `#10469 `_. In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental. +Another flag, ``reduce-solr-deletes``, avoids deleting solr documents for files in a dataset prior to sending updates. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change). Datasets with Large Numbers of Files or Versions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 8fb9460892b..f6e33a2678d 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -3274,6 +3274,9 @@ please find all known feature flags below. Any of these flags can be activated u * - add-publicobject-solr-field - Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress. - ``Off`` + * - reduce-solr-deletes + - Avoids deleting and recreating solr documents for dataset files when reindexing. + - ``Off`` **Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development `, you can set them in the `docker compose `_ file.