Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non block testing fix #363

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Non block testing fix #363

wants to merge 2 commits into from

Conversation

TonyB9000
Copy link
Collaborator

Summary

Unifies the non-blocking zstash behavior between both "create" and "update" operations.

Addresses issue #361,

@TonyB9000 TonyB9000 requested a review from forsyth2 February 21, 2025 16:10
Copy link
Collaborator

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 I left some initial review comments. I want to spend more time studying the code to understand how everything gets called/passed around though.

@@ -92,7 +92,7 @@ def create():

# Transfer to HPSS. Always keep a local copy.
logger.debug(f"{ts_utc()}: calling hpss_put() for {get_db_filename(cache)}")
hpss_put(hpss, get_db_filename(cache), cache, keep=True)
hpss_put(hpss, get_db_filename(cache), cache, keep=args.keep)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for archiving the database. I think we do want to always keep that, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. That was a mistake. (But it always seems to remain in any case - a mystery)

@@ -169,9 +169,8 @@ def setup_create() -> Tuple[str, argparse.Namespace]:
# Now that we're inside a subcommand, ignore the first two argvs
# (zstash create)
args: argparse.Namespace = parser.parse_args(sys.argv[2:])
if args.hpss and args.hpss.lower() == "none":
if not args.hpss or args.hpss.lower() == "none":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses just for clarity: if (not args.hpss) or (args.hpss.lower() == "none"):

args.hpss args.hpss.lower() == "none" args.non_blocking original behavior new behavior change
T T T args.hpss = "none", args.keep = True args.hpss = "none", args.keep = True N/A
T T F args.hpss = "none" args.hpss = "none", args.keep = True Sets args.keep = True
T F T args.keep = True Nothing No longer sets args.keep = True
T F F Nothing Nothing N/A
F N/A T args.keep = True args.hpss = "none", args.keep = True Sets args.hpss = "none"
F N/A F Nothing args.hpss = "none", args.keep = True Sets args.hpss = "none", args.keep = True

Can you confirm these are the expected changes in behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you arrive that the first two rows? Nothing in that code involves the status of "non-blocking".

Correct me if I'm wrong, but testing "if args.hpss" would only fail if the user included no "hpss" argument on the command line. That should be the same as "hpss=none" (unless some hidden config sets it elsewhere - I did not consider that).

In any case, (to my knowledge), the only time we intend to FORCE "keep" is when hpss=none. According the the "help" text, there is nothing that "non-blocking" (True or False) does to effect "keep".

Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the combined behavior of

    if args.hpss and args.hpss.lower() == "none":
        args.hpss = "none"
    if args.non_blocking:
        args.keep = True

becoming

if not args.hpss or args.hpss.lower() == "none":
        args.hpss = "none"
        args.keep = True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only fail if the user included no "hpss" argument on the command line.

Correct, and I don't think that is possible because we set it as required:

    required.add_argument(
        "--hpss",
        type=str,
        help=(
            'path to storage on HPSS. Set to "none" for local archiving. It also can be a Globus URL, '
            'globus://<GLOBUS_ENDPOINT_UUID>/<PATH>. Names "alcf" and "nersc" are recognized as referring to the ALCF HPSS '
            "and NERSC HPSS endpoints, e.g. globus://nersc/~/my_archive."
        ),
        required=True,
    )

Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.

Ok, that makes sense.

@@ -157,6 +157,7 @@ def file_exists(name: str) -> bool:
return True
return False

gv_push = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why gv_push? A more descriptive name might be better. Maybe tar_file_count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it was just a way for me to track things. We could change it.

I wanted a variable to track "actual transfer submitted" (pushed), as opposed to just submitted to our globus_transfer() function, which may just add it to a pending transfer and return. I'll make it "gv_tarfiles_pushed".

@@ -215,7 +218,7 @@ def globus_transfer( # noqa: C901
fail_on_quota_errors=True,
)
transfer_data.add_item(src_path, dst_path)
transfer_data["label"] = subdir_label + " " + filename
transfer_data["label"] = label
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: label is defined to be exactly the same thing above already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

for src_path in prev_transfers:
os.remove(src_path)
prev_transfers = curr_transfers
curr_transfers = list()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use = [] instead of = list().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to do that - but was cautioned against it (don't recall why). I'd be happy either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting, I wonder why. = [] definitely seems more "pythonic" to me, as is echoed on https://stackoverflow.com/questions/5790860/whats-the-difference-between-and-vs-list-and-dict.

@@ -107,8 +114,17 @@ def setup_update() -> Tuple[argparse.Namespace, str]:
help="Hard copy symlinks. This is useful for preventing broken links. Note that a broken link will result in a failed update.",
)
args: argparse.Namespace = parser.parse_args(sys.argv[2:])
if args.hpss and args.hpss.lower() == "none":

if not args.hpss or args.hpss.lower() == "none":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses, as in create, would be good: if (not args.hpss) or (args.hpss.lower()) == "none":

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I was relying upon the default ("not" applies only the the very next argument). Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.

I added the parentheses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.

Yes, the parentheses are only for human readers. They shouldn't affect the code at all.

args.hpss = "none"
args.keep - True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! That will make a difference! :) Good catch!

@TonyB9000
Copy link
Collaborator Author

@forsyth2 Allow me to make some changes to address the clear mistakes above. Should take just a moment.

@forsyth2
Copy link
Collaborator

forsyth2 commented Mar 4, 2025

Allow me to make some changes to address the clear mistakes above. Should take just a moment.

@TonyB9000 Can you push those changes?

I've also reviewed the code logic; this looks good to me, aside from the already suggested changes.

Following the logic of the lists of transferred tars

hpss_utils.add_files -> hpss.hpss_put -> hpss.hpss_transfer:

        if transfer_type == "put":
            if not keep:
                if (scheme != "globus") or (
                    globus_status == "SUCCEEDED"
                ):
                    # Note: This is intended to fulfill the default removal of successfully-transfered
                    # tar files when keep=False, irrespective of non-blocking status
                    logger.info(f"{ts_utc()}: DEBUG: deleting transfered files {prev_transfers}")
                    for src_path in prev_transfers:
                        os.remove(src_path)
                    prev_transfers = curr_transfers
                    curr_transfers = list()

Globus succeeded. We don't have to worry about these tars anymore; they've been transferred.
Delete them and reset the lists.

Earlier in hpss.hpss_transfer, we saw:

curr_transfers.append(file_path)

which is how curr_transfers builds up the list of tars currently being transferred.

Following the logic of `gv_push`

In globus.globus_transfer:

        # DEBUG: review accumulated items in TransferData
        logger.info(f"{ts_utc()}: TransferData: accumulated items:")
        attribs = transfer_data.__dict__
        for item in attribs["data"]["DATA"]:
            if item["DATA_TYPE"] == "transfer_item":
                gv_push += 1
                print(f"   (routine)  PUSHING (#{gv_push}) STORED source item: {item['source_path']}", flush=True)

Increment for every transfer_item we encounter.

In globus.globus_finalize:

    if transfer_data:
        # DEBUG: review accumulated items in TransferData
        logger.info(f"{ts_utc()}: FINAL TransferData: accumulated items:")
        attribs = transfer_data.__dict__
        for item in attribs["data"]["DATA"]:
            if item["DATA_TYPE"] == "transfer_item":
                gv_push += 1
                print(f"    (finalize) PUSHING ({gv_push}) source item: {item['source_path']}", flush=True)

        # SUBMIT new transfer here
        logger.info(f"{ts_utc()}: DIVING: Submit Transfer for {transfer_data['label']}")

Again, increment for every transfer_item we encounter.

gv_push is only ever incremented, never reset to 0. From Tony:

I wanted a variable to track "actual transfer submitted" (pushed), as opposed to just submitted to our globus_transfer() function, which may just add it to a pending transfer and return.

So, gv_push simply counts the number of transfer_items encountered throughout the entire run.

@forsyth2
Copy link
Collaborator

forsyth2 commented Mar 4, 2025

We'll also need to fix the pre-commit check before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants