Skip to content

Commit

Permalink
tweak docs and release note #10022
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Apr 16, 2024
1 parent 41eb617 commit 22e5b1a
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 7 deletions.
6 changes: 5 additions & 1 deletion doc/release-notes/10022_upload_redirect_without_tagging.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
If your S3 store does not support tagging and gives an error when redirecting uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. Disabling the tagging makes it harder to identify abandoned files (created in cases where the user does not complete the upload operation) with an external script but they can still be removed using the [Cleanup Storage of a Dataset](https://guides.dataverse.org/en/5.13/api/native-api.html#cleanup-storage-of-a-dataset) API endpoint.
If your S3 store does not support tagging and gives an error if you configure direct uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For more details see https://dataverse-guide--10029.org.readthedocs.build/en/10029/developers/big-data-support.html#s3-tags #10022 and #10029.

## New config options

- dataverse.files.<id>.disable-tagging
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2013,7 +2013,7 @@ The fully expanded example above (without environment variables) looks like this
.. _cleanup-storage-api:

Cleanup storage of a Dataset
Cleanup Storage of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is an experimental feature and should be tested on your system before using it in production.
Expand Down
7 changes: 6 additions & 1 deletion doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,12 @@ with the contents of the file cors.json as follows:
Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.

Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in the Dataverse installation. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used.
.. _s3-tags:

S3 Tags and Direct Upload
~~~~~~~~~~~~~~~~~~~~~~~~~

Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, look for ``dataverse.files.<id>.disable-tagging`` under :ref:`list-of-s3-storage-options` in the Installation Guide.

Trusted Remote Storage with the ``remote`` Store Type
-----------------------------------------------------
Expand Down
10 changes: 6 additions & 4 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1189,19 +1189,21 @@ Larger installations may want to increase the number of open S3 connections allo

``./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"``

By default, when redirecting an upload to the S3 storage, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup if the file is not added to the dataset after upload (e.g., if the user cancels the operation).
If your S3 store does not support tagging and gives an error when redirecting uploads, you can disable that tag by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For example:
By default, when direct upload to an S3 store is configured, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup in case the file is not added to the dataset after upload (e.g., if the user cancels the operation). (See :ref:`s3-tags`.)
If your S3 store does not support tagging and gives an error when direct upload is configured, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For example:

``./asadmin create-jvm-options "-Ddataverse.files.<id>.disable-tagging=true"``

Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the temp tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the temp tag is applied, you can use the [Cleanup Storage of a Dataset](https://guides.dataverse.org/en/5.13/api/native-api.html#cleanup-storage-of-a-dataset) API endpoint.
Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the ``temp`` tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the ``temp`` tag is applied, you can use the :ref:`cleanup-storage-api` API endpoint.

In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please
add the options for the custom URL and region as documented below. Please read above if your desired combination has
been tested already and what other options have been set for a successful integration.

Lastly, go ahead and restart your Payara server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.

.. _list-of-s3-storage-options:

List of S3 Storage Options
##########################

Expand Down Expand Up @@ -1229,7 +1231,7 @@ List of S3 Storage Options
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server ``false``
dataverse.files.<id>.disable-tagging ``true``/``false`` Do not place the ``temp`` tag when redirecting the upload to the S3 server. ``false``
=========================================== ================== =================================================================================== =============

.. table::
Expand Down

0 comments on commit 22e5b1a

Please sign in to comment.