-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non block testing fix #363
base: main
Are you sure you want to change the base?
Conversation
…ing additions for activity tracing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 I left some initial review comments. I want to spend more time studying the code to understand how everything gets called/passed around though.
@@ -92,7 +92,7 @@ def create(): | |||
|
|||
# Transfer to HPSS. Always keep a local copy. | |||
logger.debug(f"{ts_utc()}: calling hpss_put() for {get_db_filename(cache)}") | |||
hpss_put(hpss, get_db_filename(cache), cache, keep=True) | |||
hpss_put(hpss, get_db_filename(cache), cache, keep=args.keep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically for archiving the database. I think we do want to always keep that, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. That was a mistake. (But it always seems to remain in any case - a mystery)
@@ -169,9 +169,8 @@ def setup_create() -> Tuple[str, argparse.Namespace]: | |||
# Now that we're inside a subcommand, ignore the first two argvs | |||
# (zstash create) | |||
args: argparse.Namespace = parser.parse_args(sys.argv[2:]) | |||
if args.hpss and args.hpss.lower() == "none": | |||
if not args.hpss or args.hpss.lower() == "none": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses just for clarity: if (not args.hpss) or (args.hpss.lower() == "none"):
args.hpss |
args.hpss.lower() == "none" |
args.non_blocking |
original behavior | new behavior | change |
---|---|---|---|---|---|
T | T | T | args.hpss = "none" , args.keep = True |
args.hpss = "none" , args.keep = True |
N/A |
T | T | F | args.hpss = "none" |
args.hpss = "none" , args.keep = True |
Sets args.keep = True |
T | F | T | args.keep = True |
Nothing | No longer sets args.keep = True |
T | F | F | Nothing | Nothing | N/A |
F | N/A | T | args.keep = True |
args.hpss = "none" , args.keep = True |
Sets args.hpss = "none" |
F | N/A | F | Nothing | args.hpss = "none" , args.keep = True |
Sets args.hpss = "none" , args.keep = True |
Can you confirm these are the expected changes in behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you arrive that the first two rows? Nothing in that code involves the status of "non-blocking".
Correct me if I'm wrong, but testing "if args.hpss" would only fail if the user included no "hpss" argument on the command line. That should be the same as "hpss=none" (unless some hidden config sets it elsewhere - I did not consider that).
In any case, (to my knowledge), the only time we intend to FORCE "keep" is when hpss=none. According the the "help" text, there is nothing that "non-blocking" (True or False) does to effect "keep".
Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the combined behavior of
if args.hpss and args.hpss.lower() == "none":
args.hpss = "none"
if args.non_blocking:
args.keep = True
becoming
if not args.hpss or args.hpss.lower() == "none":
args.hpss = "none"
args.keep = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only fail if the user included no "hpss" argument on the command line.
Correct, and I don't think that is possible because we set it as required:
required.add_argument(
"--hpss",
type=str,
help=(
'path to storage on HPSS. Set to "none" for local archiving. It also can be a Globus URL, '
'globus://<GLOBUS_ENDPOINT_UUID>/<PATH>. Names "alcf" and "nersc" are recognized as referring to the ALCF HPSS '
"and NERSC HPSS endpoints, e.g. globus://nersc/~/my_archive."
),
required=True,
)
Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.
Ok, that makes sense.
@@ -157,6 +157,7 @@ def file_exists(name: str) -> bool: | |||
return True | |||
return False | |||
|
|||
gv_push = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why gv_push
? A more descriptive name might be better. Maybe tar_file_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but it was just a way for me to track things. We could change it.
I wanted a variable to track "actual transfer submitted" (pushed), as opposed to just submitted to our globus_transfer() function, which may just add it to a pending transfer and return. I'll make it "gv_tarfiles_pushed".
@@ -215,7 +218,7 @@ def globus_transfer( # noqa: C901 | |||
fail_on_quota_errors=True, | |||
) | |||
transfer_data.add_item(src_path, dst_path) | |||
transfer_data["label"] = subdir_label + " " + filename | |||
transfer_data["label"] = label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: label
is defined to be exactly the same thing above already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
for src_path in prev_transfers: | ||
os.remove(src_path) | ||
prev_transfers = curr_transfers | ||
curr_transfers = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use = []
instead of = list()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to do that - but was cautioned against it (don't recall why). I'd be happy either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, I wonder why. = []
definitely seems more "pythonic" to me, as is echoed on https://stackoverflow.com/questions/5790860/whats-the-difference-between-and-vs-list-and-dict.
@@ -107,8 +114,17 @@ def setup_update() -> Tuple[argparse.Namespace, str]: | |||
help="Hard copy symlinks. This is useful for preventing broken links. Note that a broken link will result in a failed update.", | |||
) | |||
args: argparse.Namespace = parser.parse_args(sys.argv[2:]) | |||
if args.hpss and args.hpss.lower() == "none": | |||
|
|||
if not args.hpss or args.hpss.lower() == "none": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses, as in create
, would be good: if (not args.hpss) or (args.hpss.lower()) == "none":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I was relying upon the default ("not" applies only the the very next argument). Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.
I added the parentheses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.
Yes, the parentheses are only for human readers. They shouldn't affect the code at all.
args.hpss = "none" | ||
args.keep - True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! That will make a difference! :) Good catch!
@forsyth2 Allow me to make some changes to address the clear mistakes above. Should take just a moment. |
@TonyB9000 Can you push those changes? I've also reviewed the code logic; this looks good to me, aside from the already suggested changes. Following the logic of the lists of transferred tars
if transfer_type == "put":
if not keep:
if (scheme != "globus") or (
globus_status == "SUCCEEDED"
):
# Note: This is intended to fulfill the default removal of successfully-transfered
# tar files when keep=False, irrespective of non-blocking status
logger.info(f"{ts_utc()}: DEBUG: deleting transfered files {prev_transfers}")
for src_path in prev_transfers:
os.remove(src_path)
prev_transfers = curr_transfers
curr_transfers = list() Globus succeeded. We don't have to worry about these tars anymore; they've been transferred. Earlier in curr_transfers.append(file_path) which is how Following the logic of `gv_push`In # DEBUG: review accumulated items in TransferData
logger.info(f"{ts_utc()}: TransferData: accumulated items:")
attribs = transfer_data.__dict__
for item in attribs["data"]["DATA"]:
if item["DATA_TYPE"] == "transfer_item":
gv_push += 1
print(f" (routine) PUSHING (#{gv_push}) STORED source item: {item['source_path']}", flush=True) Increment for every In if transfer_data:
# DEBUG: review accumulated items in TransferData
logger.info(f"{ts_utc()}: FINAL TransferData: accumulated items:")
attribs = transfer_data.__dict__
for item in attribs["data"]["DATA"]:
if item["DATA_TYPE"] == "transfer_item":
gv_push += 1
print(f" (finalize) PUSHING ({gv_push}) source item: {item['source_path']}", flush=True)
# SUBMIT new transfer here
logger.info(f"{ts_utc()}: DIVING: Submit Transfer for {transfer_data['label']}") Again, increment for every
So, |
We'll also need to fix the pre-commit check before merging. |
Summary
Unifies the non-blocking zstash behavior between both "create" and "update" operations.
Addresses issue #361,