-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: consider switching from GitPython #2215
Comments
Also, looks like
Dulwich is pure-python. I wonder if we can do We had some troubles with Dulwich trying to compile its C extension or something. @efiop has better idea and even prepared a patch for Dulwich. I'm not sure why we are not just specifying the global option to compile it w/o C-extension. |
Dulwich doesn't have wheels yet and to install it in pure-python form, we need to explicitly specify special flag to pip install, which can not be included into pygit2 requires libgit2, but it has a pretty good wheel selection already https://pypi.org/project/pygit2/#files , so also might be promising. We chose to use gitpython because it just worked and is the most popular project among these :) With more advanced use it does has a lot of drawbacks indeed. |
I really enjoy using the |
I am also in favor of |
It looks like it's getting more and more important. A few people mentioned partial checkout problems with gitpython. And it's very important to solve get/dvc API dependency on command line and local filesystem. @efiop can we prioritize the initial research around pygit2? |
Sure @shcheklein , done. |
We might reconsider using pygit2, there are lots of issues with it:
GitPython for comparison:
|
P.S. Dulwich is somewhere inbetween. |
@Suor Dulwich - is there active development, at least? can you confirm that it does not depend on command line? If we decide to be involved in one those projects we need to pick one that is more or less active and we can be sure that we'll be able to release new versions, build wheels, etc. And it should be aligned with our long term goals - things like no dependency on CLI is important, for example. |
@shcheklein Dulwich is not actively developed, none of the three is. Support is also so so, I would not call it reliable. Why do you want to get away from CLI? I see using that as an advantage, this makes things more reliable and easier to set up for users, since they already did that. The only downsides I see is speed (presumably, and we need benching to really say this is an issue) and processes on windows, which we worked around so far. |
Because it's extremely strange to have git installed as a dependency for an API to work. I've seen already some push back on this. |
I see. But for everything else it's an advantage. If we go If we go |
k, how about we use both then? For api we need some limited functionality - only fetch the bunlde of files that belong to a revision. Can we do this somehow with some other library? |
I guess with some hack we may use |
@Suor what kind of hacks do you see? What else do we need from them to start using? How stable the "clone" part for them? |
@shcheklein looked through the code and it looks like simply changing calls in |
@Suor is there a function/command in Git to only checkout, I wonder? Anyway, sounds like a good workaround then. But it looks like we will need to find a way to support new stuff in GitPython. So, let's create a ticket for fixing GriPython + a ticket for fixing Dulwich and use it in API. |
@shcheklein what do you mean by "only checkout", isn't |
@MrOutis just download the workspace that corresponds to a revision as a tar bundle for example. Don't download .git, etc. Use simple http protocol, etc. I'm not sure if git server supports stuff like this, but it would be handy for our API. |
@shcheklein looks like it does, but GitHub doesn't allow that transaction https://twitter.com/GitHubHelp/status/322818593748303873 |
GitPython causes constant headache and considerable time loss investigating and fixing issues on windows with file not closed and processes not terminated soon enough. We have several places in our code where we were forced to use retries to handle that.
The notable alternatives are pygit2 and Dulwich.
Does anyone know of any downsides of any of these? Or other alternatives? Or maybe there is a good reason we are using GitPython, which I am unaware of?
The text was updated successfully, but these errors were encountered: