-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update optimizations #496
Update optimizations #496
Conversation
@marc-hb how do you think these options should play together with submodules? |
First I would like to say the new options and command line interface seem to capture perfectly all the sometimes contradicting requirements that were expressed in old, long, convoluted and confusing discussions scattered across multiple github issues (and I plead guilty for some of that). Of course the devil is always in the details and time will tell but the first impression of the user interface is: wow, what still stands between west and world domination? Now back to your question, the long story short is: I'm not sure. After months (years?) discussing git optimizations in west, you just made me realize I never asked myself these same questions about submodules. Maybe because I try to avoid them as much as possible? I'm not sure how caching will work because I haven't looked at this PR yet and will probably not have any time this week. Note git submodules have for quite some time been using their parent's .git/ directory instead of their own, not sure whether that will help. Worst case west caching should cache nothing submodule related. I just had a look Submodules don't seem to have any Sorry for not having much more than only superficial compliments, I'll try to get back to my usual negative self ASAP! |
@czeslawmakarski and @KaSroka please test and review |
Looks good, but I think it would be valuable if the fetch / narrow / cache options can be specified with |
OK, let's bikeshed that before I implement:
|
I prefer to leave |
I think so too. Thanks. |
sure, let's start simple so feel free to leave out |
I kind of disagree, especially for the |
I firmly disagree for the
Emphasis mine. BTW this highlights an important mismatch between
Besides poor performance[*] and breaking basic git features unexpectedly, misunderstandings is yet another reason to avoid shallow clones. They're really a hack. As a coincidence I just experienced this error today after long minutes of git silence. No idea what it means:
[*] Try |
Now added. |
This allows the user to clone projects for the first time from existing local repositories instead of fetching from the network. For an example, consider this west.yml: manifest: defaults: remote: myorg remotes: - name: myorg url-base: https://github.com/myorg projects: - name: foo path: modules/foo revision: deadbeef - name: bar path: modules/bar revision: abcd1234 Suppose the 'foo' and 'bar' repositories are already available in a /var/my-name-cache directory, like this (.git locations shown for clarity): /var/my-name-cache ├── bar │ └── .git └── foo └── .git You can then run: west update --name-cache /var/my-name-cache And the 'foo' and 'bar' projects will be cloned from '/var/my-name-cache/foo' and '/var/my-name-cache/bar', respectively, before being updated. NOTE: It's /var/my-name-cache/foo, NOT /var/my-name-cache/modules/foo. Project names are unique in a west manifest, so putting them in the top level directory is safe. Doing it this way lets you use --name-cache without duplicating cache repositories if project paths might move around. By contrast, if you had this /var/my-path-cache directory: my-path-cache └── modules ├── bar │ └── .git └── foo └── .git You could run: west update --path-cache /var/my-path-cache And get a similar result: the 'foo' and 'bar' projects will be cloned from '/var/my-path-cache/modules/foo' and '/var/my-path-cache/modules/bar', respectively. Using either option, west update won't hit the network at all if the 'deadbeef' and 'abcd1234' SHAs are already available locally and the (default) '--fetch=smart' strategy for west update is used. If both options are given, both caches are checked; --name-cache is checked first. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
This moves the definition closer to the point of use, allows us to move the --stats bookkeeping into the method itself, and makes it more obvious that 'update' is the only command that ought to be hitting the network to fetch revisions. For now, we don't need any instance data, but that will change in a subsequent patch. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
GitHub now does allow fetching SHAs. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Update.do_run() stashes the command line arguments in self.args almost before doing anything else. There's no need to pass it around or repeat that work. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Replace some unused things with _. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
This can be used to influence the way west fetches. The motivating use case is to fetch with --depth=1. Since that's the case, the new option can be used to override any 'clone-depth' specified in the manifest for a project. Remove the extra 'with --depth=foo' in the output when the project has a clone depth accordingly, as this will be confusing if the user overrides it. However, this is only part of the story to make this optimization truly useful. We also need to allow the user to fetch something that might be a SHA directly. Without that, running 'west update -o --depth=1' on a project with a SHA as the revision will sync the entire remote ref space, including all tags, with --depth=1. Not quite what we're hoping for in terms of limiting network bandwidth in that case. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
This option tells west to fetch project.revision exactly, and nothing else. This skips fetching tags, and tries this even if the revision looks like a SHA, which does not work depending on the git host. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Updating manifest-rev is a separate operation from fetching, but the two are tracked and printed together in the --stats output. Now that this code is in an instance method that has access to the stats dict, it is easy to track manifest-rev and fetching separately, so make it happen. Also resolve manifest-rev to a SHA before calling _update_manifest_rev(), so the reflog entry says: west update: moving to <SHA> instead of something like: west update: moving to FETCH_HEAD^{commit} Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Exercise --narrow alone and with --fetch-opt. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
If true, 'west update' is always '--narrow'. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
We are setting up various bits and pieces of instance state before getting to the meat of the work. This is getting to be enough lines that a dedicated helper feels right. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
4351f01
to
73d2e19
Compare
Rebased to fix merge conflicts. |
f13a76d
to
30782b3
Compare
These set default values of the --name-cache and --path-cache command line options, respectively. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
30782b3
to
5599006
Compare
This could have really helped. Unfortunately it does not. I disconnected the network, ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me already.
I'm not on the approvers list, but I checked and it works great by us. So I do approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. As I have changed project I'm in I have no much room to test this but it should greatly speed up CI that only cares about given version and doesn't need history - this is what we needed in my old project.
Thanks for the reviews and testing! |
@mbolivar-nordic could you tell me how this should be used in a CI environment? |
@lucsegers I don't have any general advice; the results are always going to depend on the details. Example: In terms of using a cache, you need to set it up so your CI environment has a prepopulated cache available and mounted on the file system somewhere. The specifics for this will depend on your CI environment. Are you running CI in a bare metal server somewhere? Set up the cache in the file system and update it periodically. Are you using docker? Try to figure out a way to do a volume mount from the host machine, if you have control over the file system of the host machine. Etc. |
I just added a brand new "performance" label to about 20 issues and pull requests: AFAIK the most recent discussion happened in #638, have a look to at least this one. |
This implements new
west update
options that can be used to optimize performance:--name-cache
,--path-cache
: allows cloning project repositories from a known place on the file system before hitting the network to fetch the current revisions (improves performance by not fetching if possible)--narrow
: fetches the project revision directly even if it is a SHA, and does not fetch tags (improves performance by fetching fewer objects than the defaultwest update
)--fetch-opt
: allows specifying additional options to eachgit fetch
used bywest update
, such as--depth=1
(which may improve performance if many objects in history are not needed)It also adds some new configuration options:
update.narrow
: boolean, default false. If true,west update
uses--narrow
always.update.path-cache
: string, default no value. If nonempty, west update uses this as its--path-cache
if not otherwise specified.update.name-cache
: same asupdate.path-cache
, but for the--name-cache
option.