-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OADP-774 must-gather: add timeout to velero logs/describe, var typos, remove duplicate logs, add make run
#816
Changes from all commits
4271a6a
45956db
228195c
cf13ed2
fbfd433
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,12 +9,8 @@ max_parallelism=$4 | |
pvr=$5 | ||
object_collection_path=$6 | ||
|
||
# Gather PVR describe and logs | ||
# Gather PVR describe | ||
mkdir -p ${object_collection_path} | ||
oc describe podvolumerestores.velero.io ${pvr} --namespace ${ns} &> "${object_collection_path}/pvr-describe-${pvr}.txt" | ||
echo "[cluster=${cluster}][ns=${ns}][pod=${pod}] Collecting Pod logs..." | ||
oc logs --all-containers --namespace ${ns} ${pod} --since ${logs_since} &> "${object_collection_path}/current.log" & | ||
echo "[cluster=${cluster}][ns=${ns}][pod=${pod}] Collecting previous Pod logs..." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is where the empty |
||
oc logs --previous --all-containers --namespace ${ns} ${pod} --since ${logs_since} &> "${object_collection_path}/previous.log" & | ||
|
||
# logs covered by restic pod logs in gather_logs_pods | ||
wait |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,14 +12,14 @@ timeout=$7 | |
skip_tls=$8 | ||
|
||
# Gather restore describe and logs | ||
mkdir -p "{object_collection_path}" | ||
mkdir -p "${object_collection_path}" | ||
echo "[cluster=${cluster}][ns=${ns}] Gathering 'velero restore describe ${restore}'" | ||
if [ "$timeout" = "0s" ]; then | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "/velero describe restore ${restore} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/restore-describe-${restore}.txt" & | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "timeout 30s /velero describe restore ${restore} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/restore-describe-${restore}.txt" & | ||
else | ||
oc -n ${ns} exec --request-timeout=${timeout} $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "/velero describe restore ${restore} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/restore-describe-${restore}.txt" & | ||
oc -n ${ns} exec --request-timeout=${timeout} $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "timeout ${timeout} /velero describe restore ${restore} --insecure-skip-tls-verify=${skip_tls} --details" &> "${object_collection_path}/restore-describe-${restore}.txt" & | ||
fi | ||
echo "[cluster=${cluster}][ns=${ns}] Gathering 'velero restore logs ${restore}'" | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "/velero restore logs ${restore} --insecure-skip-tls-verify=${skip_tls} --timeout=30s" &> "${object_collection_path}/restore-${restore}.log" & | ||
oc -n ${ns} exec $(oc -n ${ns} get po -l component=velero -o custom-columns=name:.metadata.name --no-headers) -- /bin/bash -c "timeout 30s /velero restore logs ${restore} --insecure-skip-tls-verify=${skip_tls} --timeout=30s" &> "${object_collection_path}/restore-${restore}.log" & | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we are forcing velero CLI commands that involve downloadrequest.Stream to timeout which will resolve issues related to must-gather getting stuck when querying from nonexistent backup storage location. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. avoids vmware-tanzu/velero#5324 |
||
|
||
wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need
--request-timeout=${timeout}
here if timeout is getting passed in the velero container?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From further testing and doc digging.. --request-timeout has very different effects to
timeout
in velero container.--request-timeout is The length of time to wait before giving up on a single api-server request.
whereas, if the api-server has responded (
/velero
cli executed but yet to print to stdout), request-timeout do not work to kill a stuck velero CLI process and must-gather still get stuck.So I propose we keep both for the
$timeout
defined case.