-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OADP-774 must-gather: add timeout to velero logs/describe, var typos, remove duplicate logs, add make run
#816
Conversation
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
make run
make run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
mkdir -p ${object_collection_path} | ||
oc describe podvolumerestores.velero.io ${pvr} --namespace ${ns} &> "${object_collection_path}/pvr-describe-${pvr}.txt" | ||
echo "[cluster=${cluster}][ns=${ns}][pod=${pod}] Collecting Pod logs..." | ||
oc logs --all-containers --namespace ${ns} ${pod} --since ${logs_since} &> "${object_collection_path}/current.log" & | ||
echo "[cluster=${cluster}][ns=${ns}][pod=${pod}] Collecting previous Pod logs..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the empty [pod=]
logs came from
/retest-required |
1 similar comment
/retest-required |
/hold this did not fix issue we were trying to fix.. Getting stuck at "Collecting volumesnapshotlocations" |
New changes are detected. LGTM label has been removed. |
everything seems to work |
fi | ||
echo "[cluster=${cluster}][ns=${ns}] Gathering 'velero restore logs ${restore}'" | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "/velero restore logs ${restore} --insecure-skip-tls-verify=${skip_tls} --timeout=30s" &> "${object_collection_path}/restore-${restore}.log" & | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "timeout 30s /velero restore logs ${restore} --insecure-skip-tls-verify=${skip_tls} --timeout=30s" &> "${object_collection_path}/restore-${restore}.log" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are forcing velero CLI commands that involve downloadrequest.Stream to timeout which will resolve issues related to must-gather getting stuck when querying from nonexistent backup storage location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoids vmware-tanzu/velero#5324
make run
make run
/unhold ready for review |
/retest |
make run
make run
@kaovilai should there be a default timeout value? |
else | ||
oc -n ${ns} exec --request-timeout=${timeout} $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "/velero describe backup ${backup} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/backup-describe-${backup}.txt" & | ||
oc -n ${ns} exec --request-timeout=${timeout} $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "timeout ${timeout} /velero describe backup ${backup} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/backup-describe-${backup}.txt" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need --request-timeout=${timeout}
here if timeout is getting passed in the velero container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From further testing and doc digging.. --request-timeout has very different effects to timeout
in velero container.
--request-timeout is The length of time to wait before giving up on a single api-server request.
whereas, if the api-server has responded (/velero
cli executed but yet to print to stdout), request-timeout do not work to kill a stuck velero CLI process and must-gather still get stuck.
So I propose we keep both for the $timeout
defined case.
/retest |
1 similar comment
/retest |
checking for ways to write other than empty file when timeout is triggered. |
/retest |
3 similar comments
/retest |
/retest |
/retest |
@shubham-pampattiwar when timeout occurs, the file (backup/restore-log/describe-) will contain output like the following if timeout killed the program.
Specifically |
@kaovilai: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@kaovilai LGTM! |
make run
make run
/cherry-pick oadp-1.1 |
/cherry-pick oadp-1.0 |
@kaovilai: new pull request created: #821 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@kaovilai: #816 failed to apply on top of branch "oadp-1.0":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
OADP-774
Signed-off-by: Tiger Kaovilai tkaovila@redhat.com