Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest commands RFC #8

Closed
wants to merge 1 commit into from

Conversation

ulfalizer
Copy link
Contributor

@ulfalizer ulfalizer commented Aug 9, 2018

This RFC adds manifest parsing and three basic commands (sync, diff, and
status).

More error checking needs to be added. This is mostly to get some
feedback on the approach. There are some cases that turn tricky if you
always keep a local branch to avoid a detached HEAD.

I'm wondering if 'revision' is supposed to always point to a branch (as
opposed to e.g. a SHA). SHAs would be more flexible, but make it even
trickier to keep a local branch.

I've written a bit in
zephyrproject-rtos/zephyr#6770 as well.
This is a quick-and-dirty version of the local branch strategy.

Signed-off-by: Ulf Magnusson Ulf.Magnusson@nordicsemi.no

@mbolivar
Copy link
Contributor

mbolivar commented Aug 10, 2018

I'm wondering if 'revision' is supposed to always point to a branch (as
opposed to e.g. a SHA). SHAs would be more flexible, but make it even
trickier to keep a local branch.

Repo supports branches (this is what the tip manifest of the AOSP uses), tags (via revision="refs/tags/xxx", this is what all Android releases use), SHAs (this is what pinned manifests generated by repo manifest -r contain), and mixes of the three.

The ability to support SHAs is important: this is how you can use a manifest to freeze and distribute an entire configuration of all the Git trees to others, including users.

@mbolivar mbolivar self-requested a review August 10, 2018 12:52
Copy link
Contributor

@mbolivar mbolivar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ulfalizer, thanks and sorry for missing the discussion in the zephyr tree.

I think the biggest questions are around sync, as might be expected.

Comments below.

west/cmd/git.py Outdated
@@ -0,0 +1,255 @@
# Copyright (c) 2018 Open Source Foundries Limited.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processed it as copy-paste. Glad I didn't go into law. :D

I'll put Nordic Semiconductor there instead.

west/cmd/git.py Outdated
def _git_helper(project, cmd, extra_args, cwd, capture_output):
# Runs a git command.
#
# cwd: Directory to switch to first (None = current directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but it'd be nice for these to be in the same order as the function signature, and to document the project argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... will fix

west/cmd/git.py Outdated

pipe = subprocess.PIPE if capture_output else None

return subprocess.run(('git', *args, *extra_args),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We unfortunately can't use subprocess.run() yet. Zephyr's minimum Python version is 3.4, and it was added in 3.5.
  • Might be nice to add checking for Git with shutil.which and erroring out like cmake.py's run_build to the TODO list

west/cmd/git.py Outdated

'''west git commands'''

import argparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

west/cmd/git.py Outdated
import collections
import os
import subprocess
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

(Mind running this through flake8 when taking it out of RFC? It can be annoying but it's of course nice to have a style and also catches real bugs, so I think it'd be nice for west to run cleanly through it.)

west/cmd/git.py Outdated
# Load project information from manifest

with open(manifest_filename) as f:
manifest = yaml.load(f)['manifest']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it's better to always use yaml.safe_load instead of yaml.load unless the unsafe version is explicitly desired. This is especially important here, since the manifest is coming from the Internet.

See https://pyyaml.org/wiki/PyYAMLDocumentation for details; tl;dr is yaml.load allows arbitrary code execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadn't noticed it. Will fix.


# TODO: Error checking (if _git(...).returncode != 0: ...)

for project in projects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a TODO to handle projects that have been removed as well.

(I guess this means persisting a representation of the current projects somewhere and comparing it to the updated manifest that west sync pulls down.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if they have been removed? Might be rude to just remove them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projects removed from manifests should be removed from working copies. Otherwise there's old cruft around which no longer matters.

What to do with any local branches, though? Perhaps repo keeps git repositories and objects under .repo, and symlinks to those from working trees, precisely to handle this case without deleting local work. No symlinks on Windows, though.

Maybe error out and ask for a re-run with a --force option if there are repositories with local branches that would be removed, as a starting point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be mad if it removed any non-checked-in files I happened to have in there. "Lemme remove this stuff you no longer need for you" is a bit dangerous...

Personally, I'd be fine if they were just left behind, but we'll see.

if not os.path.exists(project.path):
_git_top(project, 'clone -b (branch) (url)/(name) (path)')
else:
# Fetch first to make sure the project's branch exists. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the thinking here seems to be that we can assume that local branches with the same names track remote branches named in the manifest. As you've noticed, this doesn't play well if the manifest names SHAs (or I guess tags) in its revision fields.

That kind of makes the way repo behaves (as I understand it) seem at least not crazy:

  • if detached head, synchronize to latest manifest version
  • if there is a local branch checked out, rebase on top of the new manifest branch

Updating a detached HEAD to another detached HEAD seems fine.

I've said before that I think rebase by default is counter-intuitive and not a good idea for West, though -- in my experience teaching people Repo, this is one of the most common frustrations and stumbling blocks, to the point that I usually teach repo sync -d first.

If we want to avoid that for West, I think these are the potentially reasonable options if there is a local branch:

  1. act like git pull (fetch + merge)
  2. act like git fetch (fetch only)
  3. scream and die unless a --rebase or --local-branch-strategy={rebase,detach,merge} or so flag was given
  4. always fetch and detach HEAD (unless overridden by a flag)

Am I missing any?

I'm leaning against 1, because it encourages people to submit bad pull requests with upstream merges inside, and 2, because west sync ought to, well, synchronize with upstream whenever possible.

The other two seem potentially OK to me.

Thoughts?

Copy link
Contributor Author

@ulfalizer ulfalizer Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. act like git pull (fetch + merge)

Would need to check out the branch first in that case (and there would have to
be a local branch). The current version does that, and then switches back to
the previous position. That feels a bit wonky though, but maybe it could work
alright in practice.

  1. act like git fetch (fetch only)

Would want some way to rebase local work on top of new upstream stuff, so would
probably need additional commands in that case. Wonder if that might be too
complex.

  1. scream and die unless a --rebase or --local-branch-strategy={rebase,detach,merge} or so flag was given
  2. always fetch and detach HEAD (unless overridden by a flag)

IMO, it's nice if sync (by default at least) brings in upstream work while also
preserving local changes. I suspect that's what most people would expect.

Keeping a local branch gets a bit tricky when you can have SHAs. I wonder if
there might be gotchas related to changing the revision in the manifest file
too (e.g. to a different branch).

Always just doing a git rebase <revision> is conceptually simplest at least
(and will work with a detached HEAD as well).

The implementation gets a bit tricky though, because e.g. git status won't
know what the upstream is, and so won't tell you stuff like how many commits
behind you are. Gotta implement that stuff manually I think. Would have that
problem with an SHA regardless though...

Copy link
Contributor Author

@ulfalizer ulfalizer Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "robust" way I can think of to do it with a non-detached HEAD would be to have a single local branch (e.g. master), and updating it against the revision from the manifest (ignoring the branch name there).

That'd give you e.g. git status for free, but only if the revision from the manifest is a branch name (plus you'd have to update which branch the local branch tracks if the manifest is changed to use a different branch), so might have to implement that stuff manually anyway.

When the revision is an SHA, I don't think the local branch would give you much, beyond avoiding people having to see "detached HEAD" when they run git status manually.

Copy link
Contributor Author

@ulfalizer ulfalizer Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... thought of another thing that might work. Wonder if it's too magic though.

Could keep a ref (branch) that's always updated to point to the revision specified in the manifest, and that the user isn't expected to update (could put it in upstream/revision or the like, to make it clearer that it's "internal", though it'd be nice to explain it still).

That branch could then be set as the upstream of the local branch, and git status and the like would automatically give sensible information (might get confusing for git push and the like though).

Could be a horrible idea for reasons I haven't thought of yet though. It's something that's not a detached HEAD at least. :P

Copy link
Contributor Author

@ulfalizer ulfalizer Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems kinda neat in that you will always have an easy reference for the upstream revision at least, without having to go to the manifest.

Could have west fetch update the upstream/revision refs, and west sync rebase the local branches on top of them, if you wanted to split it up.

Copy link
Contributor

@mbolivar mbolivar Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. act like git pull (fetch + merge)

Would need to check out the branch first in that case (and there would have to
be a local branch).

Right -- note that this list was prefixed with (emphasis added): "I
think these are the potentially reasonable options if there is a
local branch
". Since we're discussing what to do in case of a local
branch (instead of detached HEAD), it's a safe assumption that there
is one :).

The current version does that, and then switches back to
the previous position. That feels a bit wonky though, but maybe it could work
alright in practice.

Sorry, I don't follow what you're saying. I'm thinking this option 1
would be something like:

git fetch <manifest-remote>/<manifest-revision>
git merge FETCH_HEAD

That doesn't require switching back to a previous position.

Note that I'm against this option as it encourages invalid pull
request submissions (i.e. PRs which include "merge upstream into my
feature branch" commits).

  1. act like git fetch (fetch only)
    Would want some way to rebase local work on top of new upstream stuff, so would
    probably need additional commands in that case. Wonder if that might be too
    complex.

Sorry, I don't follow this either. Why would we need a separate
rebase command, when there is already git rebase?

We haven't discussed the "submit a pull request" or "fetch a pull
request" commands, but updating local branches to rebase against
upstream might better belong as part of those.

  1. scream and die unless a --rebase or --local-branch-strategy={rebase,detach,merge} or so flag was given
  2. always fetch and detach HEAD (unless overridden by a flag)
    IMO, it's nice if sync (by default at least) brings in upstream work while also
    preserving local changes. I suspect that's what most people would expect.

Keeping a local branch gets a bit tricky when you can have SHAs. I wonder if
there might be gotchas related to changing the revision in the manifest file
too (e.g. to a different branch).

Always just doing a git rebase is conceptually simplest at least
(and will work with a detached HEAD as well).

I'm confused about what you're saying here. Do you think I've missed any
options? Do you lean towards or against any of the four I've listed
above?

The implementation gets a bit tricky though, because e.g. git status won't
know what the upstream is, and so won't tell you stuff like how many commits
behind you are. Gotta implement that stuff manually I think. Would have that
problem with an SHA regardless though...

Do you want to play around with repo start, repo sync, and repo status to get a baseline idea of how it works there? That might help
as a starting point, just seeing how at least one other tool works.

The way repo start works is that it creates a local branch that
tracks <manifest-remote>/<manifest-revision>. So repo start test <project> will create a test branch with the following in the
project's .git/config:

[branch "test"]
	remote = osf-dev
	merge = refs/heads/master

Then repo sync will rebase on top of there. And you also always know
the remote, so you can do things like:

$ repo info -o <project>
Manifest branch: master
Manifest merge branch: refs/heads/master
Manifest groups: all,-notdefault
----------------------------

Projects Overview
<project>
* test                              ( 1 commit, Mon Aug 13 13:13:35 2018 -0500)
                                      - 5136aa26 add test commit

Copy link
Contributor Author

@ulfalizer ulfalizer Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't require switching back to a previous position.

I was assuming that you had some local branch representing the revision at the latest sync (and possibly containing some local changes as well). I think you'd want to update that one whenever you sync, even if you aren't currently on it, and you can't rebase a branch without being on it.

The switching back was just to return to whatever branch (or detached HEAD state) the repository was on previously, after rebasing the "revision-tracking" branch.

If the revision-tracking branch (upstream/revision or the like) is just a direct reference to the upstream revision and not meant to hold any local changes, it simplifies things though. Could just do git update-ref upstream/revision <revision> or the like to update it.

Sorry, I don't follow this either. Why would we need a separate
rebase command, when there is already git rebase?

Because I think you'd want some way to rebase all projects to the latest revision, without having to manually rebase each project.

Internally, it could just be an alias for west forall rebase upstream/revision, if a command like that gets added. That'd in turn run git rebase upstream/revision in each project.

We haven't discussed the "submit a pull request" or "fetch a pull
request" commands, but updating local branches to rebase against
upstream might better belong as part of those.

I wonder if you'd already have all the information you need from upstream/revision there. Would probably have to tinker a bit to figure out if there's something missing though.

I'm confused about what you're saying here. Do you think I've missed any
options? Do you lean towards or against any of the four I've listed
above?

I like rebase-by-default personally, so something like (1) is what I prefer. I wouldn't expect a sync operation to throw away my local changes (or scream about them), just bring them up-to-date against the latest version from upstream.

With the approach I'm thinking of right now, west sync would actually do something like this though:

  1. git fetch new changes from the remote
  2. Update the ref upstream/revision (or whatever a good name would be) to the revision specified in the manifest (which could be a branch or an SHA)
  3. Rebase the currently checked-out branch (or detached HEAD state) against upstream/revision

To start some new work, you could do git checkout -b fix-xyz upstream/revision in each repository (could have the start command do that for each repository, for example).

The way repo start works is that it creates a local branch that
tracks <manifest-remote>/<manifest-revision>. So repo start test will create a test branch with the following in the
project's .git/config:

Wonder if it does something similar to the above in the case where is an SHA. Don't think you can track an SHA. Feels weird at least.

Or do you mean "tracks" just in the sense that repo knows to update the local branch against it?

In my version (which might be bad for reasons I haven't thought of yet), the branch would explicitly track the upstream/revision branch. That way, git itself would give sensible information when you run git status ("You are x commits behind upstream/revision"), and it'd be pretty easy to understand how the west commands operate internally.

try:
pykwalify.core.Core(
source_file=manifest_filename, schema_files=[schema_filename]
).validate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the pykwalify sources: they use safe_load, so this should be OK.

for mp in manifest['projects']:
# Fill in any missing fields in 'mp' with values from the 'defaults'
# dictionary
for key, val in manifest['defaults'].items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems safe for now, but note that the manifest-wide defaults might contain keys which do not apply to projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought it was defined as defaults for projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defaults for the manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the manifest was a project description. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repo lets you manage default values that are related to the behavior of the tool itself there. For example: <default sync-j="4"/> uses 4 concurrent TCP connections while syncing. I imagine we may end up with use cases that are similar eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, might have to change the approach at that point, if they get added to the defaults dict.

west/cmd/git.py Outdated
_git(project, 'status', extra_args=user_args)


def _add_common_git_flags(parser_adder, command):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, also: it would be nice to be able to take a nargs='*' argument specifying the list of projects to synchronize, diff against, or get status for.

Supporting either the name of the project in the manifest or its locally checked out path would be most convenient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, would be nice to have later

@ulfalizer
Copy link
Contributor Author

Fixed most of the code complaints.

This RFC adds manifest parsing and three basic commands (sync, diff, and
status).

More error checking needs to be added. This is mostly to get some
feedback on the approach. There are some cases that turn tricky if you
always keep a local branch to avoid a detached HEAD.

I'm wondering if 'revision' is supposed to always point to a branch (as
opposed to e.g. a SHA). SHAs would be more flexible, but make it even
trickier to keep a local branch.

I've written a bit in
zephyrproject-rtos/zephyr#6770 as well.

Signed-off-by: Ulf Magnusson <Ulf.Magnusson@nordicsemi.no>
@ulfalizer
Copy link
Contributor Author

Messed-up the branch name when I created this one. New PR is at #11.

@ulfalizer ulfalizer closed this Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants