-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Misc/NEWS from individual files #6
Comments
Rejected IdeasDeriving Misc/NEWS from the commit logsAs part of the discussion surrounding Handling Misc/NEWS , the suggestion has come up of deriving the file from the commit logs itself. In this scenario, the first line of a commit message would be taken to represent the news entry for the change. Some heuristic to tie in whether a change warranted a news entry would be used, e.g., whether an issue number is listed. This idea has been rejected due to some core developers preferring to write a news entry separate from the commit message. The argument is the first line of a commit message compared to that of a news entry have different requirements in terms of brevity, what should be said, etc. Deriving Misc/NEWS from bugs.python.orgA rejected solution to the NEWS file problem was to specify the entry on bugs.python.org [5] . This would mean an issue that is marked as "resolved" could not be closed until a news entry is added in the "news" field in the issue tracker. The benefit of tying the news entry to the issue is it makes sure that all changes worthy of a news entry have an accompanying issue. It also makes classifying a news entry automatic thanks to the Component field of the issue. The Versions field of the issue also ties the news entry to which Python releases were affected. A script would be written to query bugs.python.org for relevant new entries for a release and to produce the output needed to be checked into the code repository. This approach is agnostic to whether a commit was done by CLI or bot. A drawback is that there's a disconnect between the actual commit that made the change and the news entry by having them live in separate places (in this case, GitHub and bugs.python.org). This would mean making a commit would then require remembering to go back to bugs.python.org to add the news entry. |
The only other potential solution other than individual files is a bot or script which collects news entry from messages in PR comments. The pro/con list is: Individual filesPro
Con
BotPros
Cons
|
Managing individual files by hand also has the benefit of having the ability to communicate separately to reviewers via commit messages and consumers via the release notes. We've had good luck with that in OpenStack where the consumer of the software is often interested in very different information from the other contributors who may be reviewing patches. |
Hmm, it looks like the sample repo in python/cpython doesn't have any tags indicating when specific versions were released. Reno currently relies on tags for version identification. I could update it to look at something else -- how does one determine the current point release on a given branch? Is it in a file somewhere? |
@dhellmann , yeah, that's a problem with the current python/cpython repo. Tags were not replicated and pushed. We will have to repush the repo with tags (perhaps using tool and/or instructions from https://github.com/orsenthil/cpython-hg-to-git) which has been tried. |
@orsenthil oh, good, if the plan is to eventually have the tags in place then that's no problem. I assumed there was some other mechanism in place already. |
This tool should probably make this pretty easy https://pypi.org/project/towncrier/ |
I started a thread on python-dev because Misc/NEWS became a blocker point with the new GitHub workflow: FYI I wrote a tool which computes the released Python versions including a change from a list of commits to generate a report on Python vulnerabilities: The core of the feature is "git tag --contains sha1". Then I implemented a logic to select which versions should be displayed. Output: The tool also computes the number of days between the vulnerability disclosure date, commit date and release date. I chose to ignore beta and release candidate versions. But I guess that there are other existing projects which would fit Misc/NEWS requirements better than my tool! (reno, twoncrier, something else?) |
changelog filenames
|
Additional requirements/requests for Misc/NEWS'ification
|
Since there seem to be multiple options for pre-existing tools and writing our own, we should start by identifying the requirements of what we want to be supported in the final NEWS file: Features we want
Nice-to-haves would be:
Am I missing anything we need/want the solution to cover? What if we were writing a tool from scratch?Now what would a greenfield solution look like (to help set a baseline of what we might want a tool to do)? To me, we would have a directory to contain all news-related details. In that top-level directory would be subdirectories for each subsection of the release (e.g. "Core and Built-ins", "Library", etc.). Each NEWS entry file would then go into the appropriate subdirectory. The filename would be the issue related to the change, e.g. Then upon release the RM would run a script which would read all the files, generate the appropriate text (which includes line-wrapping, etc.), and write out the file. Now we can either keep a single file with all entries (which gets expensive to load and view online, which also means we should add an appropriate Next stepsFirst step is to figure out what our requirements are of any solution. So if I'm missing anything please reply here (I think I covered what @warsaw mentioned on python-dev above). Once we have the requirements known we can look at the available tools to see how close they come to meeting our needs and decide if we like any of them or want to write our own. |
Thanks @brettcannon I was thinking something similar, but slightly different to your implementation outline. I like naming files with their bug number, e.g. I was thinking maybe we don't need subcategory subdirectories within there. Everything else can be specified using metadata in the .rst file. Kind of like the way Pelican supports metadata directives for the date, category, tags, and slugs. E.g.
Generally, you wouldn't have an I don't know whether those directives are a reST feature or something added by Pelican. We could also use the RFC-822 style headers found in PEPs. |
More about using major release numbers as the organizational structure:
|
Do we want to keep the news fragments around forever? I assumed that once they've been integrated into a actual news file that they would be deleted. (+1 is supporting deletion, -1 is to keep files around forever.) |
I was thinking we'd keep them around forever. They probably don't take much space, and they compress well, so it would be very handy to be able to do the historical greps from the master branch. |
The problem I see for keeping them around is simply the amount of files the repo will grow by which might affect checkout speed and maybe upset some file systems. Do we have any idea how many files 3.4 would have if we kept the files around for the life of the feature release? And if we generate a single file per feature release then you can grep that one file plus entry files to get your version search without keeping duplicate data around. As for the feature version subdirectories, the reason I didn't suggest that is it seemed unnecessary in a cherry picking workflow. If a file is in the e.g. 3.6 branch then you know it's for 3.6 thanks to its mere existence. And so even if we did use feature branch directories it won't matter if a fix is in the 3.7 directory in the 3.6 branch as its existence means it applies to Python 3.6 itself, it just happens to have been committed to 3.7 first. But as I said, if the existence of the file means the change applies then sticking it in a feature branch directory only comes up if you keep the individual files around. Lastly, adding cross-references between feature branch subdirectories complicates cherry picks as it adds a mandatory extra step. By leaving out cross-references it allows cherry picks that merge cleanly to not require any more work beyond opening the PR while cross-referencing adds a required step for every cherry pick PR. |
+1 to Barry's variation. I like self-identifying files. For one thing, if a contributor submits a file with the wrong section, I presume it would be easier to edit the section line than to move the file. What I would really like is auto-generation of backport pull-requests. If this should not always be done, then use a symbol like '@' to trigger the auto-generation. On dstufft's comment, I am not sure if thumbs-up is for 'keep forever' or 'delete when integrated'. 1000s of files under 500 bytes on file systems using 4000 bytes'file (Windows, last I knew) is a bit wasteful. I am for deleting individual files after they are sorted and concatenated into a 'raw' (unformatted) listing. This would accomplish compression without loss of information. (One would still grep the metadata.) It would allow more than one formatting. News entries can have non-ascii chars (for names) but must be utf-8 encoded. I suspect that we will occasionally get mis-encoded submissions. Will there be an auto encoding check somewhere? |
Moving a file isn't difficult; git will just delete the old one and add it again under the new name (and implicitly pick up the file was moved). As for auto-generating cherry picks, see GH-8. For voting on Donald's comment, I view voting +1 is for deletion. I have edited the comment to make it more clear (I also just looked at how @warsaw voted and voted the opposite 😉 ). And for the encoding check, I'm sure we will implement a status check that will do things like verify formatting, check the text encoding is UTF-8, etc. |
@brettcannon At least in the case of reno, the directory structure isn't needed for organizing the files because it pulls the content out of the git history, not from the currently checked out workspace. If we're worried about the number of note files, then a bunch of individual notes could be collapsed into one big file just before each release. Backports of fixes could still use separate files to allow for clean cherry-picks. |
In order to JOIN this metadata {description, issue numbers) with other metadata; something like YAML (YAML-LD?) with some type of URI would be most helpful. JSON-LD: {"@id": "http://.../changelog#entry/uri/<checksum?>",
"description": "...",
"issues": [
{"n": n,
"description":""}]
}
|
One other thing I forgot to mention about why I suggested subdirectories for classification is it does away with having to look up what potential classification options there are. I know when I happen to need to add an entry early on in a branch I always have to scroll back to see what sections we have used previously to figure out which ones fits best. If we had directories we never deleted then that guesswork is gone. @dhellmann so are you saying reno looks at what files changed in the relevant commit to infer what to classify the change as (e.g. if only stuff in Lib/ changed then it would automatically be classified as "Library")? |
Easily solved. Consider that git itself is storing kajillions of files in its object store. It manages this by employing a fascinating feature--available in all modern operating systems!--called a "subdirectory". In the case of git's object store, it snips off the first two characters of the object's hexified hash, and that becomes the subdirectory name. So there are potentially 256 subdirectories, and 1/256 on average of the files go in each subdir. In our case, I'd suggest that
|
I'll say that I don't see a whole lot of value to keeping the files around once they've been "compiled" into the relevant NEWS file. |
In that case, why have the discrete files in the first place! Just have people add their entries directly to the NEWS file. ... y'see, that's the problem we're trying to solve. When I cut a Python release, I always have a big painful merge I have to do by hand on the NEWS file. If we keep the discrete files, I can just regenerate NEWS and know it's going to be correct. |
Because many people adding and changing lines to the NEWS files causes merge conflicts. One person periodically "compiling" that NEWS file during a release does not. |
I don't want to go down that road.
I propose that we design the tool to use the discrete files, and Misc/NEWS is only ever generated from that tool. If, down the road, we decide that it's really okay, we can anoint Misc/NEWS as a second canonical location and delete the discrete files. What problem does keeping the discrete files cause? Is it just that you don't like it for some aesthetic reason? |
(i think codelabels are helpful signal that's relevant to writing releasenotes) commit messages and codelabels
... codelabels are worth the effort because:
... Releaselogs and Codelabels are part of |
@larryhastings yep, I personally don't see a need for anything beyond NEWS section and What's New relevance. To be perfectly honest, I say we drop the NEWS sections and just use the What's New sections, but I would want people like @warsaw and @doko42 who have to care more about the NEWS file to say whether it is useful to them to know a change updated an extension nodule vs pure Python code. As for the extra metadata like authors, backports, etc. I don't think that's necessary as that's covered by the git repo data. And I think I have seen security suggested, but that I believe that should just be a classification in and of itself and trump all other classifications of the news entry. |
I've found the Misc/NEWS sections (including Library vs. Core/Builtin vs. Extension Modules) to be helpful when trying to sleuth out whether a downstream reported bug was caused by a change in upstream Python or not. |
+1 for what @warsaw noted - if something breaks after an upgrade, it's helpful to be able to go:
Sometimes "find in file" will short-circuit actually relying on the categorisation, but not always. |
If something breaks after an upgrade,
You should write a test (an uncaught Exception sys.exits nonzero),
And then try and ballpark a range for `git bisect`.
How does `git bisect` work with the CPython branching / cherry-picking
strategy? Can I just start with the most recent known good revision /
version tag?
What is the syntax for calling one CPython test method?
This information could also be useful for documenting / automating a
strategy for the 'which changed caused a fault` root cause analysis process:
Bisection⬅
| Wikipedia: https://en.wikipedia.org/wiki/Bisection_(software_engineering)
Bisection is a method for determining which change causes a fault (or a
specific test to change from passing to failing or vice-versa).
Many bisection algorithms take a start and end (“between here and here”)
and do a binary search (“this half or that half”); checking out each
revision and running a script that should return 0 for OK, or non-zero.
Code bisection with Git:
- https://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
- https://www.kernel.org/pub/software/scm/git/docs/git-blame.html
- http://git-scm.com/book/en/Git-Tools-Debugging-with-Git
|
@westurner This is not the thread for you to try to tell operating system developers how to do our jobs (no thread is that thread). Reading docs can be done anywhere (and relatively quickly), while writing and running tests (especially under bisect) is far more time consuming and environmentally constrained (so you don't want to do it unless there's some reason to believe an upstream change may be at fault in the first place). |
Output from this collaborative effort could include devguide documentation.
I think that git diff and git bisect are relevant to solving the same
problem as a releaselog.
That's all I have to add now.
…On Monday, March 20, 2017, Nick Coghlan ***@***.***> wrote:
@westurner <https://github.com/westurner> This is not the thread for you
to try to tell operating system developers how to do our jobs (no thread is
that thread). Reading docs can be done anywhere (and relatively quickly),
while writing and running tests (especially under bisect) is far more time
consuming and environmentally constrained (so you don't want to do it
unless there's some reason to believe an upstream change may be at fault in
the first place).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADGy1RsiraX3QHFUVFrWzsOcouQavNgks5rn0GUgaJpZM4LJgJI>
.
|
@westurner they really are not even vaguely related. The issue is that for a consumer of an externally delivered package like Python, their code, that we can reasonably expect them to bisect is a consumer: the thing that has changed has changed as a large atomic unit comprising thousands of commits, bundled through at least two levels of delivery: Upstream -> distro, distro -> user. To successfully bisect in such a situation (and this is ignoring the complexities of interactions with non stdlib components that also required changes over the evolution of the change) requires the consumer to learn how to build Python; how to build it using the distro package rules; adjust those rules for older tree revisions where intermediate commits were never packaged... Its a huge effort vs 'oh look, the release notes say that the traceback package has been reimplemented, and my failure was in traceback, so I should look closely at that bit of code and maybe the upstream commits for it'. |
FWIW, I prefer a custom tool that lives in the Python organization on GitHub. I don't really want to left comments like "can you please take a look at this?" every two weeks in order to get a simple fix merged. Since it will be on the Python organization and all core devs will have commit rights, I don't think maintaining such tool will be an issue. I can help Larry with maintaining it if you want to see a list of maintainers. |
I doubt that bugging people to get fixes merged is going to be that big of a problem, and if it ends up being so we can easily fork the thing at that point. Deciding up front that we need our own thing on the off chance we can't get fixes merged seems a bit wasteful when we can, at any time, fork whatever solution we use and start maintaining it if it becomes an issue. |
I agree with @dstufft that worrying about maintenance is a bit premature (e.g. we already rely on Sphinx for building our docs which has its own dependencies). Now if Larry makes blurb more attractive because he simply makes it do the exact thing we want and that happens to not be what towncrier does then that's a legitimate reason for having our own tool. |
That was always my plan! Now if only we knew exactly what we wanted... |
This. Let's agree on the actual workflow and let Larry implement it. FWIW argument clinic's clinic.py is a pleasure to work with and maintaining it is frictionless since it's part of CPython repo. |
I've rejected reno as an option. Thanks to @dhellmann and @rbtcollins for the time and effort put into proposing it. In the end the fact that no one voted positively for it and the YAML format led to more formatting than necessary for the common case compared to towncrier or blurb makes me think it isn't the best fit for us. |
A thought in regards to With a CPython-specific tool, CPython-specific service integrations aren't a problem. With a general purpose tool, we'd either need additional scripting around it (effectively creating our own tool anyway), or else come up with configurable solutions, rather than just handling the specific services we care about. Given how much more complex the CPython development process is than a more typical single-release-stream Python project, that seems like it could be a recipe for future conflict (to put it in relative terms: when it comes to process complexity, CPython is to most other projects as OpenStack is to CPython) |
@ncoghlan I can agree on querying bpo about the issues solved in past release (using that REST endpoint), is that sufficient for the release notes? |
@soltysh @WadeFoster @mikeknoop
(modified) From #6 (comment) ("JSON-LD") : {"@context": {
"py": "https://schema.python.org/v1#",
"bpo": "https://bug.python.org/issue",
"pr": "https://github.com/python/cpython/pulls/",
"ver": "https://schema.python.org/v1#releases/",
"t": "https://schema.python.org/v1#releaselog/tag/",
"pyreltag": "https://schema.python.org/v1#releaselog/tag/",
"label": "https://github.com/python/cpython/labels/",
"cved": "https://cvedetails.com/cve/CVE-",
"name": { "@id": "schema:name" },
"description": { "@id": "schema:description" },
"notes": { "@id": "py:notes", "@container": "@list"},
"issue": { "@id": "py:issue", "@container": "@list"},
"mentionedIssue": { "@id": "py:issue", "@container": "@list"},
"versions": { "@id": "py:versions", "@container": "@list"},
"pr": { "@id": "py:versions", "@container": "@list"},
"cve": { "@id": "py:versions", "@container": "@list"},
"tags": { "@id": "py:versions", "@container": "@list"}
},
"@graph": [{
"@type": "py:ReleaseLog",
"name": "Python Misc/NEWS.rst",
"notes": [{
"@type": "py:ReleaseNote",
"name": None,
"description":
"Add *argument* to function (closes #21) #feature #security #cve-2011-1015 #pr123 ",
"issue": [ "bpo:22" ],
"mentionedIssue": [ "bpo:21", "bpo:22" ],
"cve": [ "cved:2011-1015" ],
"pullRequests": [ "cpypr:123" ],
"versions": [ "ver:2.7", "ver:3.6.1" ],
"tags": [ "t:feature", "t:security" ]
},
{
"@type": "py:ReleaseNote",
"name": None,
"description":
"Fix thing #22 (closes #22) #bugfix #pr124 ",
"issue": [ "bpo:22" ],
"mentionedIssue": [ "bpo:22" ],
"pullRequest": [ "cpypr:124" ],
"versions": [ "ver:2.7", "ver:3.6.1" ],
"tags": [ "t:bugfix" ]
}
]
} |
... The |
@westurner please stop dumping your personal notes here. They are not contributing anything useful to this thread (e.g. none of us need a link to GitHub's API just pasted in a list as we all know how to use a search engine and there is no relevancy here for JSON-LD). You have now been warned twice in this thread about your posting habits. I know you mean well, but please keep your posts concise and on-point or else I will block you from this issue tracker for not being respectful of other people's time. |
Or you could store release note links/edges as restructuredtext line blocks: Entry 1
========
"Add *argument* to function (closes #21) #feature #security #cve-2011-1015 #pr123
| issue: bpo-22
| cve: cved:2011-1015
| pullRequests: cpypr:123
| versions: 2.7, 3.6.1
| tags: "feature", "security"
|
I haven't found the time to sit down and write this properly, so here's a quick note on this topic before Brett makes up his mind. sorry if it's a bit long / messy. First, I don't have a strong opinion about what the input format to blurb should look like. If there was a consensus about "it should look like X", then I'd make it look like X. We don't seem to have a consensus about what the input format should look like, because I don't think we've reached consensus about what metadata the tool needs. We need to figure that out first. Obviously necessary:
I would also like: I believe Brett is also asking for: By the way, towncrier's approach of pre-created directories named for the categories (2.) is a nice idea. That ensures people don't misspell the category name. blurb could easily switch to that. the only downside I know of is that iiuc git doesn't track directories as first-class objects, so we'd have to have an extra empty file in each directory. blurb current supports 1-3. it uses the filename for two bits of metadata (stable sorting order and category), and the contents of the file are simply the news entry. but it's kind of reached my comfort level regarding storing metadata in the filename. if we only want to add 4, a simple "consider for what's new please", then okay I think we could live with sticking that in the filename. Like we add .wn just before the extension, for example. if we want to add 5, then my inclination is to add a simple metadata blob to the contents of the file:
If we do that, then my inclination is further to move all the metadata into that blob:
(uncomment "what's new" to use it) The "blurb" tool would make it easy to add these, but users could also create the file by hand using a web page form that formats the output for them. (Making the entry entirely by hand might be tricky, since the nonce should be in a standardized format. Maybe we could give them a short blob of Python they run to generate one?) If all the metadata lives inside the file, then we don't care what the filename is, it just needs to be unique. |
For metadata-in-the-file, I quite like the format that the Nikola static blog generator uses:
As an added bonus, when the file has the Using structured metadata like that would also open up future options for acknowledgements that aren't directly tracked in the git metadata - cases where we built on a patch written by someone else, or someone contributed API design ideas that someone else implemented, etc. At the moment we put that in the snippet body ("Initial patch by ..." and so forth), but a metadata field could more easily feed into ideas like auto-generating Misc/ACKS in addition to Misc/NEWS. As far as a stable sort algorithm for display goes, we could then define that as:
|
I've decided to go with blurb. The fact that we're still discussing what we want to carry forward in the entries suggests we need as much flexibility in the tooling as possible. Thanks @dstufft for putting the work in to put towncrier forward (and once again to @dhellmann and @rbtcollins for reno). I assume we will check it into I've started #66 for discussing how we want to format the entries since this issue is gotten rather long and slightly unwieldy. |
I'd suggest putting it in the core-workflow repo or keeping it in its own repo, rather than putting it into Tools. Tools is OK for things that don't change very often (e.g. reindent.py), and for things where it's OK for new features to only go into new versions (e.g. Argument Clinic), but it's a pain for anything that's still under active development and needs to behave consistently across branches. |
As pulled from PEP 512:
Traditionally the Misc/NEWS file [19] has been problematic for changes which spanned Python releases. Oftentimes there will be merge conflicts when committing a change between e.g., 3.5 and 3.6 only in the Misc/NEWS file. It's so common, in fact, that the example instructions in the devguide explicitly mention how to resolve conflicts in the Misc/NEWS file [21] . As part of our tool modernization, working with the Misc/NEWS file will be simplified.
The planned approach is to use an individual file per news entry, containing the text for the entry. In this scenario each feature release would have its own directory for news entries and a separate file would be created in that directory that was either named after the issue it closed or a timestamp value (which prevents collisions). Merges across branches would have no issue as the news entry file would still be uniquely named and in the directory of the latest version that contained the fix. A script would collect all news entry files no matter what directory they reside in and create an appropriate news file (the release directory can be ignored as the mere fact that the file exists is enough to represent that the entry belongs to the release). Classification can either be done by keyword in the new entry file itself or by using subdirectories representing each news entry classification in each release directory (or classification of news entries could be dropped since critical information is captured by the "What's New" documents which are organized). The benefit of this approach is that it keeps the changes with the code that was actually changed. It also ties the message to being part of the commit which introduced the change. For a commit made through the CLI, a script could be provided to help generate the file. In a bot-driven scenario, the merge bot could have a way to specify a specific news entry and create the file as part of its flattened commit (while most likely also supporting using the first line of the commit message if no specific news entry was specified). If a web-based workflow is used then a status check could be used to verify that a new entry file is in the pull request to act as a reminder that the file is missing. Code for this approach has been written previously for the Mercurial workflow at http://bugs.python.org/issue18967 . There is also tools from the community like https://pypi.python.org/pypi/towncrier , https://github.com/twisted/newsbuilder , and http://docs.openstack.org/developer/reno/ .
Discussions at the Sep 2016 Python core-dev sprints led to this decision compared to the rejected approaches outlined in the Rejected Ideas section of this PEP. The separate files approach seems to have the right balance of flexibility and potential tooling out of the various options while solving the motivating problem.
The text was updated successfully, but these errors were encountered: