-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using reference repositories to share objects #695
Comments
There are other (and not mutually exclusive) optimizations
I don't think this is how
Did you look at |
The caching mechanism added by @mbolivar-nordic in c50d342 does not actually take advantage of Git's object sharing. It still clones all of the objects and the entire history of the repository, only does so locally rather than over the network, which of course is an improvement, but not what's being asked here. The crucial step is to set up a On a related note, using
is much better than
which seems to be how west works when given a cached repository. |
Much smaller disk space... if you don't count the initial repos.
There is no doubt So far you haven't provided any number, not even any order of magnitude. You don't sound like you've explored all available options either: your first sentence at the top is "--depth is the only efficient way I'm aware of", which is incorrect. Interactive users clone very rarely from scratch. In our CI, So what is your use case? Development normally happens to fix tangible and measurable issues, not just "cool ideas". Before implementing one of the existing optimizations, @mbolivar-ampere spent a lot of time performing some measurements. You can find those at one of the links I shared above if you're interested.
|
Think of many concurrent workspaces, not just a single one.
Multiple workspaces sharing the same Git objects is very clearly a huge advantage, both in terms of storage and speed of checkout. Imagine N users, or many concurrent CI jobs, using the same Git mirrors on some NFS share locally. N workspaces sharing the same repository histories is very clearly an advantage over N workspaces and N replications of the same history. And it's faster.
I'm not going to do that comparison. But feel free to do some Google searching on the advantages of sharing Git objects with a reference repository.
What exactly is the complexity? And no, it's not a few percents.
It should be self-evident. A single replicated history, which is a constant, versus N replicated histories.
I indeed have. As I said, the caching implementation in west is mediocre at best and doesn't address the issue of object sharing.
Now I am going to challenge statements like this -- please provide some numbers. How many users? How often?
The assumption in that statement is that the Git repositories involved are small. But what if large repositories are involved and they may not be using LFS?
I mentioned that earlier -- many concurrent workspaces using large repositories.
I appreciate that.
I'll take a look.
|
You're the one asking for a "clearly", "self-evident" new feature - without providing any number, reproducible use case, example, measurements of existing optimizations, prototype code or any offer to contribute or help[1]. You seem to have a performance problem to solve[2]. I don't. Now answering your question anyway:
Then share some reproducible example and actual data, not "self-evidence". [1] "I'll leave it to the feature designer/developer..." - who is that? "I'm not going to do that comparison. Feel free to Google..." |
This was just an example. Every feature and code addition increases complexity - and bugs, and maintenance costs. If you take a quick look at the git log, you'll notice this project is not really staffed with an army of full-time developers. Not even one full time in fact, very far from it. I have no idea what would the complexity be in this particular case but your description of the new feature is not exactly short while still leaving a lot of opens. If you think this would be a small effort then I can't wait for your pull requests (with some sample data to back them up). Don't forget the test code. |
It is desirable for a tool built on top of Git to allow using the facilities that it offers for dealing with various complexities, particularly cloning large repositories. west is deficient on that front because it does not support using Git's object sharing mechanism, which is a well-known and primary feature of this tool. While I'd like to motivate the need for my feature ask with numbers, suffice it to say that some development and test environments rely on object sharing. I ask that the reader refer to the wealth of literature available on this topic to learn more. About the problem statement being long, sure, it could have been more concise. But thoroughness was the goal. I realize and appreciate how with limited time and resources, feature requests have to be addressed judiciously. I can take a stab at extending west and adding the desired behavior. I'll make a pull request if I decide that what I have is presentable. And I certainly hope that then the conversation goes a little better. |
Turns out, this feature request is closely related to (really a duplicate of) #625. |
Today, the only way to efficiently handle cloning a large repository using
west update
that I am aware of is to limit the fetch depth as in-o=--depth=<n>
. And I have noticed that the depth value has to be chosen carefully by the user because, evidently based on my trial and errors as well as cursory look at the Python code,<n>
must be deep enough to include the specific SHA listed in thewest.yml
file, or else the update step fails.An alternative to limiting the fetch depth is to share objects with a local reference repository by setting up a
.git/objects/info/alternates
file. The request is for a feature akin togit-clone
's--reference[-if-able]
option. See the documentation for more.Ideally, one should be able to provide a reference for each repository separately. The format might also allow an option to specify a local prefix path for a particular URL base as a convenience. For example, a user may have locally cloned mirrors for all of the NCS repositories that they need under
/home/<user>/git-mirrors/nrf-connect
and want to associate that prefix path with the url-basehttps://github.com/nrfconnect
.I'll leave it to the feature designer/developer to determine when and in what format the reference repositories should be provided (and, of course, define precedence rules for ambiguities, i.e. when the local reference path for a project could be determined more than one way). One could think of an extension to the manifest format and add an optional
local-refs
(plural) counterpart toremotes
and/or a new fieldlocal-ref
(singular) for each project along withname
andrepo-path
.But since local references are very user- and environment-specific, perhaps paths to local references shouldn't be part of west.yml at all but rather part of another manifest, say,
local-refs.yml
that is similar to the format ofwest.yml
but only accepts a subset of it, and that can be provided towest init
orwest update
or both with an option like--reference[-if-able]
similar togit-clone
's. CLI option equivalents are also fine (e.g.--local-ref-base https://github.com/nrfconnect,/home/<user>/git-mirrors/nrf-connect
or--local-ref https://github.com/zephyrproject-rtos/zephyr,/home/<user>/git-mirrors/zephyrproject-rtos/zephyr.git
).The text was updated successfully, but these errors were encountered: