Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt OS-native Filesystem Location for File Caching #777

Merged
merged 8 commits into from
Jul 13, 2022

Conversation

rmartin16
Copy link
Member

@rmartin16 rmartin16 commented Jul 1, 2022

Fixes #374.

Scope

Store and maintain file caches in an OS-native cache directory and provide a migration for existing users.

Design

The appdirs package provides a simple API to access a (hopefully) unique filesystem directory to store and maintain file caches.

Linux: ~/.local/share/briefcase
macOS: ~/Library/Application Support/briefcase
Windows: ~\AppData\Local\BeeWare\briefcase

Interestingly, BeeWare is only incorporated in to the cache filepath for Windows... Based on the design of their API, we could simply use appdirs.user_cache_dir() without specifying appname or appauthor to get the base location of caches for the platform and manually specify BeeWare/briefcase from there.

When a user runs briefcase with this change, it will notify users of the data directory change.

Additionally, cookiecutter uses two directories in the user's home directory. This can be changed, though, via config file for cookiecutter. If its useful, I can try to get cookiecutter to use this new cache as well.

I'll add and fix tests once the design is agreed on.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@rmartin16 rmartin16 force-pushed the dirs branch 2 times, most recently from d28e99c to 901935e Compare July 1, 2022 20:27
@rmartin16
Copy link
Member Author

rmartin16 commented Jul 1, 2022

I did just realize that the migration could be attempted again at a later date if ~/.briefcase sticks around and the new cache directory is somehow deleted. The new cache just disappearing seems entirely plausible since it lives in a directory tree dedicated to caches that could presumably be deleted without meaningful loss of data.

That might make it necessary to more forcibly tell users to manually delete ~/.briefcase or allow users an option for briefcase to delete that directory even if the migration fails or is skipped for other reasons....

Or maybe something more extreme like specifying a config directory for briefcase to specify this migration took place and should not be attempted again.....akin to a django migration sorta....

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good (although there's obviously tests and a changelog file still needed).

Some high level comments:

  1. Is cache path the right option?
  • macOS File System Programming Guide says "app created support files" go in app support; data caches should include "transient, downloadable content"; the system may delete the Caches directory, so your app must be able to recreate or download files as needed. The contents of ~.briefcase it could be almost entirely reproduced, as it's downloads; but it is also user specific, and I'm not sure Briefcase would take to kindly to having parts of it's files deleted (especially if parts of the cache directory are going to be deleted). It's definitely not transient.
  • The XDG specification says user-specific non-essential data files should be stored in caches; "user specific data files" go in data. Briefcase's downloads are definitely essential.
  • The difference on Windows is a subdirectory, so I doubt there's a policy distinction.
  1. Is migration actually needed at all? I know I mentioned migration in my initial comment, but the example I give (Android emulator configs) won't actually be fixed by this; so it's worth questioning whether a migration is needed. Briefcase is still in the early stages of development. Adding the migration scheme is definitely a nice user affordance; but it comes at the cost of us needing to maintain a body of code that will improve the lives of a relatively small segment of users - people who are actively using Briefcase today. We would also need to make a decision about when we remove the migration support... which, frankly, if we did that after a single point release would probably have fixed every configuration that is likely to be affected. I'm not opposed to adding migration if folks think it's worthwhile; I just don't want to end up going down the path of maintaining it because of a quick comment 2+ years ago.
  2. I'm not concerned about macOS/Linux not having a vendor name in the path. I'm not sure we gain anything in particular by having that additional layer. However, I think we should continue to use the AppDirs API like it suggests on the box, even if the vendor detail isn't actually used on most platforms.
  3. The mainline seems like a reasonable place to put the migration; but we will want to put a comments in the code that let us know what we're migrating and why, and what the condition is for removing the migration in the future.

migration_cookie.touch()
try:
with self.input.wait_bar("Copying caches..."):
self.shutil.copytree(dot_briefcase_path, self.cache_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a "move" vs "copy" option we should consider here? The briefcase path could easily be 2+GB... that's a lot of data to duplicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A move should be possible....my preference to copy was it is more easily recoverable from if something goes wrong midway. Although, I think the larger discussion on the need for a migration may make this moot.

@rmartin16
Copy link
Member Author

rmartin16 commented Jul 5, 2022

Is cache path the right option?

I vacillated between using the cache or data dirs in this initial implementation....even at one point putting only the tools directory in the data dir. Given that's where the bulk of the data is anyway and users are certainly not going to have the best experience if this data is deleted, the data dir is the best choice. (This is especially true if an OS could delete just part of a cache directory.)

Is migration actually needed at all?

I see 3 options:

  1. Do nothing
  • Related to your comment about "move vs copy", this option basically guarantees existing users will have two copies of this potentially large data set.
  1. Delete ~/.briefcase and allow normal use to recreate the data files in the new location
  • This is the most straightforward transition due to its simplicity and avoids occupying twice as much space as necessary.
  • This choice could also be paired with directing the user to manually perform the migration if they want.
  • I'm imagining the implementation of this to be a prompt to the user to have ~/.briefcase automatically deleted & everything re-downloaded or to quit & "Copy the contents of ~/.briefcase to /path/to/briefcase/<data dir>" manually before continuing.
  1. Migrate the data files from ~/.briefcase to the new location
  • This is obviously the most complex but most user friendly.
  • I have already largely implemented this....but it is not exactly simple, tries to handle several corner cases, and therefore likely has problems I haven't yet identified.

Ultimately, I think this is an executive decision for the briefcase maintainers since they will inherit all responsibility of support...in reality, I think that's primarily you :)

FWIW, (as you said) given this project is still in early dev and has a relatively small user base, I think option 2 strikes the best balance. I'd expect most users to have fast enough internet access to simply re-download everything without too much delay (that said, I've had gigabit internet for a while and may have forgotten the pain)....but a simple backdoor of telling the user they can move the data themselves technically provides a straightforward way for users to avoid repeating these downloads.

I am also not 100% clear if the migration is "safe". That is, if everything being migrated will still work in a new location. Based on my testing on Windows for Android, my existing emulators seemed to still work okay. Additionally, everything I've seen seems to be relative-path based suggesting a move wouldn't be detrimental.

Finally, want to bump the cookiecutter comments since we will still be putting things in to a home dot-directory via using it.

@rmartin16 rmartin16 force-pushed the dirs branch 3 times, most recently from 59f25b8 to 18793d7 Compare July 5, 2022 19:26
@rmartin16 rmartin16 requested a review from freakboy3742 July 6, 2022 16:26
@freakboy3742
Copy link
Member

Is migration actually needed at all?

  1. Do nothing
    ...
  2. Delete ~/.briefcase and allow normal use to recreate the data files in the new location
    ...
    FWIW, (as you said) given this project is still in early dev and has a relatively small user base, I think option 2 strikes the best balance. I'd expect most users to have fast enough internet access to simply re-download everything without too much delay (that said, I've had gigabit internet for a while and may have forgotten the pain)....but a simple backdoor of telling the user they can move the data themselves technically provides a straightforward way for users to avoid repeating these downloads.

I think a hybrid of 1 and 2 might be best. If the startup logic is:

if .briefcase exists:
    if new data location exists:
        Warn the user they haven't cleaned up, suggest cleanup
    else:
        Warn the user we're going to start in a new location, suggest cleanup or migration
        Create the new data location
    prompt the user to continue or cancel

then Briefcase won't ever be destructive, but the user will be clearly advised that (a) there's a migration path available, or (b) they haven't followed the migration advice.

I am also not 100% clear if the migration is "safe". That is, if everything being migrated will still work in a new location. Based on my testing on Windows for Android, my existing emulators seemed to still work okay. Additionally, everything I've seen seems to be relative-path based suggesting a move wouldn't be detrimental.

Yeah - that's my biggest concern with the migration approach. To paraphrase Tolstoy - every working systems are alike; but every broken system is broken in it's own way. And diagnosing those edge cases is going to be painful.

Finally, want to bump the cookiecutter comments since we will still be putting things in to a home dot-directory via using it.

One possible improvement - it's possible to configure the cookiecutter directories (cookiecutters_dir and replay_dir). If we configure Briefcase to use a briefcase-specific cookiecutter folder, that would let us differentiate Briefcase cookiecutters from other cookiecutters that the user has, and makes it easier to clearly declare Briefcase bankruptcy by deleting the briefcase data folder.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly uncontroversial, once the core of the migration decision is made. Other than that, one fairly minor tweak inline.

@@ -268,19 +270,18 @@ def run(self, args, env=None, **kwargs):
docker_args.append(self.command.docker_image_tag(self.app))

# ... then add the command (and its arguments) to run in the container
for arg in args:
arg = str(arg)
for arg in map(str, args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant 👍

os.fsdecode(self.command.dot_briefcase_path),
"/home/brutus/.briefcase",
os.fsdecode(self.command.tools_path),
f"{docker_data_path / 'tools'}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replacing the tools path specifically, rather than the more generic base data path? I can't think of a specific case where this will be a problem, but it seems weird to specifically do a substitution on a path that is known to be a subdirectory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch. I think this carried over when I was experimenting with the support and tools directories being in separate base directories. Updated to remap the single base directory.

@@ -143,6 +145,85 @@ def __init__(self, base_path, home_path=Path.home(), apps=None, input_enabled=Tr
self.logger = Log()
self.save_log = False

def migrate_to_os_cache(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed this method, on the basis that it's likely going to be gutted based on the discussion about the light-touch migration strategy.

@rmartin16 rmartin16 force-pushed the dirs branch 9 times, most recently from eb92c6c to 80cffc4 Compare July 12, 2022 22:48
Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've flagged one minor testing issue, but I'm happy to fix that in the merge process. Thanks for another awesome contribution!

cmd.data_path = tmp_path / "data_dir"
cmd.data_path.mkdir(parents=True)
cmd.input.boolean_input = MagicMock()
cmd.input.boolean_input.return_value = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be called - we should add a check to verify there won't be any user input requested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check that boolean_input() is not called.

@rmartin16
Copy link
Member Author

I added a mkdir command to the end of the data dir check to ensure the new data directory exists. Since not all commands will create this directory, they user could get the full notice multiple times.

Moreover, during my testing, I noticed that if the briefcase was invoked in the right way at the right time, then Docker could create the data directory as owned by root. Given this is possible, it may be good to integrate a mkdir in to the BaseCommnd's initialization when this check is removed.

Finally, I'm not a big fan of a log file being created if the user chooses not to continue....but avoiding that felt like a lot more work than it was worth...

Comment on lines 177 to 195
** NOTICE: Briefcase is changing it's data directory **
*************************************************************************

Briefcase is moving it's data directory from:

{dot_briefcase_path}

to:

{self.data_path}

If you continue, Briefcase will re-download the tools and data it
uses to build and package applications.

To avoid potentially large downloads and long installations, you
can manually move the old data directory to the new location.

If you continue and allow Briefcase to re-download it's tools, the
old data directory can be safely deleted.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typos (3 occurrences): it's -> its

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me grumbles... :-)

@codecov
Copy link

codecov bot commented Jul 13, 2022

Codecov Report

Merging #777 (75b4e41) into main (c58095e) will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted Files Coverage Δ
src/briefcase/commands/create.py 99.62% <ø> (ø)
src/briefcase/integrations/java.py 100.00% <ø> (ø)
src/briefcase/integrations/wix.py 100.00% <ø> (ø)
src/briefcase/commands/base.py 98.32% <100.00%> (+0.08%) ⬆️
src/briefcase/exceptions.py 97.59% <100.00%> (+2.59%) ⬆️
src/briefcase/integrations/docker.py 92.40% <100.00%> (ø)

@freakboy3742 freakboy3742 merged commit baf2942 into beeware:main Jul 13, 2022
@rmartin16 rmartin16 deleted the dirs branch July 13, 2022 01:53
@dgelessus
Copy link

Thank you for implementing this! I have a couple of comments - unfortunately I only got notified when the original issue was resolved, so these are a bit late now...

The appdirs library hasn't received updates since 2020 and its maintenance status is unclear. There is a fork, platformdirs, which is actively updated and seems to be what most of the Python ecosystem uses nowadays. (Anecdata: my dev machine has platformdirs installed and not appdirs, and pip now vendors platformdirs rather than appdirs.)

I was going to complain about not using the cache directories, but after reading the Apple and XDG documentation linked above, I think you're actually right with choosing the normal "user data" directories. Briefcase's tools are indeed more than just caches, in that they are directly used during normal operation. It would be more correct to put downloaded archives into the cache directory, but AFAICT Briefcase already deletes all archives after they've been extracted, so this isn't a problem in practice.

(That said, I don't think there's a real danger of macOS randomly deleting only some files in the cache directory - the documentation says that "the system may delete the Caches/ directory", not "may delete files within Caches/". In any case, this only happens when you're quite low on disk space.)

I'm a bit disappointed that I'll still have to manually exclude the Briefcase data from my backups, but there's no easy solution to that I think - there are no standardized directories that are less transient than the cache directory, but still considered "unimportant" enough to not be backed up. At least on Windows the Briefcase tools will no longer be roamed across machines, which is good.

Finally, thanks for prompting the user before redownloading all of the files. I'm lucky enough that my home internet connection has decent bandwidth and no data cap, but it's not that way for everyone or all the time sadly. So the option to migrate the downloads is appreciated, even if it's a manual process.

@freakboy3742
Copy link
Member

@dgelessus Thanks for that heads up about platformdirs - that seems like a change that might be worth making, especially if that's what pip is relying on (even if that is via vendoring)

rmartin16 added a commit to rmartin16/briefcase that referenced this pull request Jul 14, 2022
@rmartin16 rmartin16 mentioned this pull request Jul 14, 2022
4 tasks
rmartin16 added a commit to rmartin16/briefcase that referenced this pull request Jul 14, 2022
rmartin16 added a commit to rmartin16/briefcase that referenced this pull request Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use appdirs to determine download cache directory instead of hardcoding ~/.briefcase
3 participants