Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PXP-6815): update MDS object after file upload. #29

Merged
merged 14 commits into from
Oct 20, 2020

Conversation

johnfrancismccann
Copy link
Contributor

@johnfrancismccann johnfrancismccann commented Oct 2, 2020

Extend indexs3client job to update the _bucket, _filename, _file_extension, and _upload_status keys in corresponding Metadata Service object after a file has been uploaded to the bucket.

Jira ticket

Metadata Service config must be nested in the CONFIG_FILE environment variable:

{
        "url": "http://indexd-service/index",
	"username": "mr happy cat",
	"password": "whiskers",
	"metadataService": {
		"url": "http://revproxy-service/mds",
		"username": "mr friendly cat",
		"password": "paws"
	}
}

PLEASE DO NOT MERGE YET. kube-setup-ssjdispatcher PR to set up MDS creds must be merged first. Additionally, this indexs3client PR should go out in the same monthly release as the SSJDispatcher ticket/PR to require MDS creds on startup.

New Features

  • Update the _bucket, _filename, _file_extension, and _upload_status
    keys in corresponding Metadata Service object after a file has been uploaded to the bucket.

Breaking Changes

  • Going forward, indexs3client now requires the Metadata Service to be deployed. Metadata Service creds are now required in the CONFIG_FILE environment variable in order for this job to run. Without both Indexd and MDS creds configured, this job will immediately exit and update neither Indexd or MDS.

Improvements

  • Determine if Indexd record already has hashes and size populated before downloading S3 object and calculating hashes. If so, don’t bother downloading/recalculating.
  • Define ConfigInfo struct so that the config (url and creds) for new services can be easily unmarshalled from the CONFIG_FILE environment variable
  • Make use of retryablehttp package to abstract http request retry logic and only retry on 500-range errors

Dependency updates

  • Add github.com/hashicorp/go-retryablehttp v0.6.7 to go.mod

Deployment changes

  • metadata-service must already be present or added to the "versions": { section of the cdis-manifest. After metadata-service has been confirmed to be in the cdis-manifest, the now-required Metadata Service creds for SSJDispatcher/indexs3client must be configured by running gen3 roll all with JENKINS_HOME not set. gen3 roll all calls gen3 kube-setup-ssjdispatcher, which now automatically configures the SSJDispatcher secret with the appropriate Metadata Service creds. After running gen3 roll all, SSJDispatcher should be configured such that SSJDispatcher can automatically pass the Metadata Service creds to the indexs3client job instance (i.e. after running gen3 roll all, no further action should be required by DevOps).

Copy link
Contributor

@giangbui giangbui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


log.Printf("Attempting to get rev for record %s in Indexd", uuid)
rev, err := GetIndexdRecordRev(uuid, configInfo.Indexd.URL)
mdsUploadedBody := fmt.Sprintf(`{"_upload_status": "uploaded", "_filename": "%s"}`, filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ticket and design doc mention _file_type but not _filename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with @Avantol13 about this. We thought _filename could be preferable. I just updated the design doc to reflect this.

}
log.Printf("Got rev %s from Indexd for record %s", rev, uuid)

updateMetadataObjectWrapper(uuid, configInfo, `{"_upload_status": "indexs3client job calculating hashes and size"}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the others statuses in the design doc are a single word (or underscore-separated words)

you might want to check with @Avantol13 if this is an acceptable status. if not, maybe uploaded is fine, or maybe something like processing if you do want to differentiate it from uploaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just one thing to keep in mind is what _upload_status should be in the case that this job fails to download and calculate hashes for the file object (e.g. maybe the file is too big and can't be downloaded)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's not detailed in the design doc, i'll let you and Alex make that call

)

// Checks for presence of MDS config, calls UpdateMetadataObject, and logs info
func updateMetadataObjectWrapper(uuid string, configInfo *ConfigInfo, body string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't exit in case of error, idk if that's on purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was on purpose. The idea is that this wrapper does all the error handling and logging for the caller. I had been calling UpdateMetadataObject from handlers/handler.go in three places and just wanted to DRY.

@@ -58,68 +58,241 @@
}
],
"results": {
"Gopkg.lock": [
"go.sum": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can exclude go.sum from the scanning

@themarcelor themarcelor changed the title (PXP-6815): update MDS object after file upload (PXP-6815): update MDS object after file upload. Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants