Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Misc/NEWS from individual files #6

Closed
brettcannon opened this issue Dec 10, 2016 · 126 comments
Closed

Generate Misc/NEWS from individual files #6

brettcannon opened this issue Dec 10, 2016 · 126 comments
Labels

Comments

@brettcannon
Copy link
Member

As pulled from PEP 512:

Traditionally the Misc/NEWS file [19] has been problematic for changes which spanned Python releases. Oftentimes there will be merge conflicts when committing a change between e.g., 3.5 and 3.6 only in the Misc/NEWS file. It's so common, in fact, that the example instructions in the devguide explicitly mention how to resolve conflicts in the Misc/NEWS file [21] . As part of our tool modernization, working with the Misc/NEWS file will be simplified.

The planned approach is to use an individual file per news entry, containing the text for the entry. In this scenario each feature release would have its own directory for news entries and a separate file would be created in that directory that was either named after the issue it closed or a timestamp value (which prevents collisions). Merges across branches would have no issue as the news entry file would still be uniquely named and in the directory of the latest version that contained the fix. A script would collect all news entry files no matter what directory they reside in and create an appropriate news file (the release directory can be ignored as the mere fact that the file exists is enough to represent that the entry belongs to the release). Classification can either be done by keyword in the new entry file itself or by using subdirectories representing each news entry classification in each release directory (or classification of news entries could be dropped since critical information is captured by the "What's New" documents which are organized). The benefit of this approach is that it keeps the changes with the code that was actually changed. It also ties the message to being part of the commit which introduced the change. For a commit made through the CLI, a script could be provided to help generate the file. In a bot-driven scenario, the merge bot could have a way to specify a specific news entry and create the file as part of its flattened commit (while most likely also supporting using the first line of the commit message if no specific news entry was specified). If a web-based workflow is used then a status check could be used to verify that a new entry file is in the pull request to act as a reminder that the file is missing. Code for this approach has been written previously for the Mercurial workflow at http://bugs.python.org/issue18967 . There is also tools from the community like https://pypi.python.org/pypi/towncrier , https://github.com/twisted/newsbuilder , and http://docs.openstack.org/developer/reno/ .

Discussions at the Sep 2016 Python core-dev sprints led to this decision compared to the rejected approaches outlined in the Rejected Ideas section of this PEP. The separate files approach seems to have the right balance of flexibility and potential tooling out of the various options while solving the motivating problem.

@brettcannon
Copy link
Member Author

Rejected Ideas

Deriving Misc/NEWS from the commit logs

As part of the discussion surrounding Handling Misc/NEWS , the suggestion has come up of deriving the file from the commit logs itself. In this scenario, the first line of a commit message would be taken to represent the news entry for the change. Some heuristic to tie in whether a change warranted a news entry would be used, e.g., whether an issue number is listed.

This idea has been rejected due to some core developers preferring to write a news entry separate from the commit message. The argument is the first line of a commit message compared to that of a news entry have different requirements in terms of brevity, what should be said, etc.

Deriving Misc/NEWS from bugs.python.org

A rejected solution to the NEWS file problem was to specify the entry on bugs.python.org [5] . This would mean an issue that is marked as "resolved" could not be closed until a news entry is added in the "news" field in the issue tracker. The benefit of tying the news entry to the issue is it makes sure that all changes worthy of a news entry have an accompanying issue. It also makes classifying a news entry automatic thanks to the Component field of the issue. The Versions field of the issue also ties the news entry to which Python releases were affected. A script would be written to query bugs.python.org for relevant new entries for a release and to produce the output needed to be checked into the code repository. This approach is agnostic to whether a commit was done by CLI or bot. A drawback is that there's a disconnect between the actual commit that made the change and the news entry by having them live in separate places (in this case, GitHub and bugs.python.org). This would mean making a commit would then require remembering to go back to bugs.python.org to add the news entry.

@brettcannon
Copy link
Member Author

The only other potential solution other than individual files is a bot or script which collects news entry from messages in PR comments. The pro/con list is:

Individual files

Pro

  • Keeps NEWS entry with commit
  • All PR submitters learn about the NEWS file and thus a little bit more about our process

Con

  • Requires the PR submitter to create the NEWS entry which makes contributions slower

Bot

Pros

  • Don't need to lean on PR submitters to create the NEWS entry

Cons

  • Another bot to maintain would be annoying
  • Script would need a way to keep track of what entries have been added to the NEWS file thus far so that there wasn't just a constant growth of the number of PRs that had to be scraped

@dhellmann
Copy link
Member

Managing individual files by hand also has the benefit of having the ability to communicate separately to reviewers via commit messages and consumers via the release notes. We've had good luck with that in OpenStack where the consumer of the software is often interested in very different information from the other contributors who may be reviewing patches.

@dhellmann
Copy link
Member

Hmm, it looks like the sample repo in python/cpython doesn't have any tags indicating when specific versions were released. Reno currently relies on tags for version identification. I could update it to look at something else -- how does one determine the current point release on a given branch? Is it in a file somewhere?

@orsenthil
Copy link
Member

@dhellmann , yeah, that's a problem with the current python/cpython repo. Tags were not replicated and pushed. We will have to repush the repo with tags (perhaps using tool and/or instructions from https://github.com/orsenthil/cpython-hg-to-git) which has been tried.

@dhellmann
Copy link
Member

@orsenthil oh, good, if the plan is to eventually have the tags in place then that's no problem. I assumed there was some other mechanism in place already.

@dstufft
Copy link
Member

dstufft commented Feb 11, 2017

This tool should probably make this pretty easy https://pypi.org/project/towncrier/

@vstinner
Copy link
Member

I started a thread on python-dev because Misc/NEWS became a blocker point with the new GitHub workflow:
https://mail.python.org/pipermail/python-dev/2017-February/147417.html


FYI I wrote a tool which computes the released Python versions including a change from a list of commits to generate a report on Python vulnerabilities:
https://github.com/haypo/python-security/blob/master/render_doc.py

The core of the feature is "git tag --contains sha1". Then I implemented a logic to select which versions should be displayed. Output:
http://python-security.readthedocs.io/vulnerabilities.html

The tool also computes the number of days between the vulnerability disclosure date, commit date and release date. I chose to ignore beta and release candidate versions.

But I guess that there are other existing projects which would fit Misc/NEWS requirements better than my tool! (reno, twoncrier, something else?)

@westurner
Copy link

westurner commented Feb 26, 2017

changelog filenames

  • CHANGELOG.rst
  • HISTORY.rst
  • whatsnew.rst
  • Misc/NEWS

Escaping Markup

NOTE: commit logs may contain (executable) markup

Tools

Projects:

@westurner
Copy link

Additional requirements/requests for Misc/NEWS'ification

  • SEC: Security
    • indicate that a releasenotedchange is security-relevant
    • link w/ @Haypo render_doc.py
      • reno YAML would be flexible here

@brettcannon
Copy link
Member Author

Since there seem to be multiple options for pre-existing tools and writing our own, we should start by identifying the requirements of what we want to be supported in the final NEWS file:

Features we want

  • Sectioned by Python release with the release date
  • Release sub-sectioned by topic
  • Issues referenced along with explanation of what changed
  • Simple bullet list
  • Single file

Nice-to-haves would be:

  • Reference changed/affected module(s) for per-module grouping (and a "general" for multi-module changes)

Am I missing anything we need/want the solution to cover?

What if we were writing a tool from scratch?

Now what would a greenfield solution look like (to help set a baseline of what we might want a tool to do)? To me, we would have a directory to contain all news-related details. In that top-level directory would be subdirectories for each subsection of the release (e.g. "Core and Built-ins", "Library", etc.). Each NEWS entry file would then go into the appropriate subdirectory. The filename would be the issue related to the change, e.g. bpo-12345.rst (in all honesty we should have an issue per change in the NEWS file as a requirement since if it's important enough to be listed in the NEWS files then we need a way to track any past and future discussions related to the change, and yes, we can come up with a standard for listing multiple issue numbers). If we wanted to support listing affected module(s) we could have some convention in the files to list that sort of detail (e.g. "modules: importlib").

Then upon release the RM would run a script which would read all the files, generate the appropriate text (which includes line-wrapping, etc.), and write out the file. Now we can either keep a single file with all entries (which gets expensive to load and view online, which also means we should add an appropriate .rst file extension), or a file per feature release (which makes searching the complete history a little harder as you then have to use something like grep to search multiple files quickly). I'm assuming we won't feel the need to drag NEWS entries forward -- e.g. entries made for 3.6.1 won't be pulled into the master branch for 3.7 -- since most changes in bugfix branches will also be fixed in the master branch and thus be redundant. But if people really care then a single PR to copy the update NEWS file(s) to master upon release could be done.

Next steps

First step is to figure out what our requirements are of any solution. So if I'm missing anything please reply here (I think I covered what @warsaw mentioned on python-dev above). Once we have the requirements known we can look at the available tools to see how close they come to meeting our needs and decide if we like any of them or want to write our own.

@warsaw
Copy link
Member

warsaw commented Feb 27, 2017

Thanks @brettcannon

I was thinking something similar, but slightly different to your implementation outline. I like naming files with their bug number, e.g. bpo-12345.rst but I was thinking that the top-level organizational directory would contain subdirectories per release, e.g. 3.7, 3.6, 3.5, etc. Symlinks or hardlinks would take care of issues that span multiple versions. If that doesn't work because of platform limitations, then a simple cross reference inside similarly named files in different directories would work just as well.

I was thinking maybe we don't need subcategory subdirectories within there. Everything else can be specified using metadata in the .rst file. Kind of like the way Pelican supports metadata directives for the date, category, tags, and slugs. E.g.

News/3.7/bpo-28598.rst

:section: core
:alsofixes: bpo-12345
:versionadded: 3.7.0a1
:backports: 3.6.1
:authors: Martijn Pieters

Support __rmod__ for subclasses of str being called before str.__mod__.

Generally, you wouldn't have an :alsofixes: directive, and you wouldn't have a :backports: directive for new features. Oh, and +1 for strongly recommending (and maybe requiring) anything in NEWS to have a bug. We could even add a CI gate so that any merge must have an entry in News/X.Y/

I don't know whether those directives are a reST feature or something added by Pelican. We could also use the RFC-822 style headers found in PEPs.

@warsaw
Copy link
Member

warsaw commented Feb 27, 2017

More about using major release numbers as the organizational structure:

  • It would make grepping for a change's scope easier (I could grep the entire News/ subdirectory, or just News/3.6/
  • ls News/3.7 would give a really useful high level view of changes in a particular version.
  • The tool to generate NEWS.rst could be version limited by passing the sub-subdirectory, or generate news for all tracked releases by passing in News/

@dstufft
Copy link
Member

dstufft commented Feb 27, 2017

Do we want to keep the news fragments around forever? I assumed that once they've been integrated into a actual news file that they would be deleted.

(+1 is supporting deletion, -1 is to keep files around forever.)

@warsaw
Copy link
Member

warsaw commented Feb 27, 2017

I was thinking we'd keep them around forever. They probably don't take much space, and they compress well, so it would be very handy to be able to do the historical greps from the master branch.

@brettcannon
Copy link
Member Author

The problem I see for keeping them around is simply the amount of files the repo will grow by which might affect checkout speed and maybe upset some file systems. Do we have any idea how many files 3.4 would have if we kept the files around for the life of the feature release? And if we generate a single file per feature release then you can grep that one file plus entry files to get your version search without keeping duplicate data around.

As for the feature version subdirectories, the reason I didn't suggest that is it seemed unnecessary in a cherry picking workflow. If a file is in the e.g. 3.6 branch then you know it's for 3.6 thanks to its mere existence. And so even if we did use feature branch directories it won't matter if a fix is in the 3.7 directory in the 3.6 branch as its existence means it applies to Python 3.6 itself, it just happens to have been committed to 3.7 first. But as I said, if the existence of the file means the change applies then sticking it in a feature branch directory only comes up if you keep the individual files around.

Lastly, adding cross-references between feature branch subdirectories complicates cherry picks as it adds a mandatory extra step. By leaving out cross-references it allows cherry picks that merge cleanly to not require any more work beyond opening the PR while cross-referencing adds a required step for every cherry pick PR.

@terryjreedy
Copy link
Member

+1 to Barry's variation. I like self-identifying files. For one thing, if a contributor submits a file with the wrong section, I presume it would be easier to edit the section line than to move the file.

What I would really like is auto-generation of backport pull-requests. If this should not always be done, then use a symbol like '@' to trigger the auto-generation.

On dstufft's comment, I am not sure if thumbs-up is for 'keep forever' or 'delete when integrated'. 1000s of files under 500 bytes on file systems using 4000 bytes'file (Windows, last I knew) is a bit wasteful. I am for deleting individual files after they are sorted and concatenated into a 'raw' (unformatted) listing. This would accomplish compression without loss of information. (One would still grep the metadata.) It would allow more than one formatting.

News entries can have non-ascii chars (for names) but must be utf-8 encoded. I suspect that we will occasionally get mis-encoded submissions. Will there be an auto encoding check somewhere?

@brettcannon
Copy link
Member Author

Moving a file isn't difficult; git will just delete the old one and add it again under the new name (and implicitly pick up the file was moved).

As for auto-generating cherry picks, see GH-8.

For voting on Donald's comment, I view voting +1 is for deletion. I have edited the comment to make it more clear (I also just looked at how @warsaw voted and voted the opposite 😉 ).

And for the encoding check, I'm sure we will implement a status check that will do things like verify formatting, check the text encoding is UTF-8, etc.

@dhellmann
Copy link
Member

@brettcannon At least in the case of reno, the directory structure isn't needed for organizing the files because it pulls the content out of the git history, not from the currently checked out workspace.

If we're worried about the number of note files, then a bunch of individual notes could be collapsed into one big file just before each release. Backports of fixes could still use separate files to allow for clean cherry-picks.

@westurner
Copy link

In order to JOIN this metadata {description, issue numbers) with other metadata; something like YAML (YAML-LD?) with some type of URI would be most helpful.

JSON-LD:

{"@id": "http://.../changelog#entry/uri/<checksum?>",
 "description": "...",
 "issues": [
   {"n": n,
     "description":""}]
}
  • reno adds a unique id to the filenames

@brettcannon
Copy link
Member Author

One other thing I forgot to mention about why I suggested subdirectories for classification is it does away with having to look up what potential classification options there are. I know when I happen to need to add an entry early on in a branch I always have to scroll back to see what sections we have used previously to figure out which ones fits best. If we had directories we never deleted then that guesswork is gone.

@dhellmann so are you saying reno looks at what files changed in the relevant commit to infer what to classify the change as (e.g. if only stuff in Lib/ changed then it would automatically be classified as "Library")?

@larryhastings
Copy link
Contributor

The problem I see for keeping them around is simply the amount of files the repo will grow by which might affect checkout speed and maybe upset some file systems.

Easily solved. Consider that git itself is storing kajillions of files in its object store. It manages this by employing a fascinating feature--available in all modern operating systems!--called a "subdirectory".

In the case of git's object store, it snips off the first two characters of the object's hexified hash, and that becomes the subdirectory name. So there are potentially 256 subdirectories, and 1/256 on average of the files go in each subdir.

In our case, I'd suggest that

  • the tool globs all files in the entire directory tree automatically, sorting them internally, and
  • the tool creates the text file per checkin in a "yyyy.mm" subdirectory.

@dstufft
Copy link
Member

dstufft commented Feb 27, 2017

I'll say that I don't see a whole lot of value to keeping the files around once they've been "compiled" into the relevant NEWS file.

@larryhastings
Copy link
Contributor

I'll say that I don't see a whole lot of value to keeping the files around once they've been "compiled" into the relevant NEWS file.

In that case, why have the discrete files in the first place! Just have people add their entries directly to the NEWS file.

... y'see, that's the problem we're trying to solve. When I cut a Python release, I always have a big painful merge I have to do by hand on the NEWS file. If we keep the discrete files, I can just regenerate NEWS and know it's going to be correct.

@dstufft
Copy link
Member

dstufft commented Feb 27, 2017

In that case, why have the discrete files in the first place! Just have people add their entries directly to the NEWS file.

Because many people adding and changing lines to the NEWS files causes merge conflicts. One person periodically "compiling" that NEWS file during a release does not.

@larryhastings
Copy link
Contributor

I don't want to go down that road.

  • TOOWTDI. There should be one canonical place for NEWS entries. You're proposing there be two.
  • If we allow the canonical place to be either, then somebody's going to say "I don't want to deal with this new file format! Why do you change things when they were perfectly fine before!" or "It doesn't integrate well with my workflow!" and then they'll be the special snowflake who edits NEWS on their own, and causes conflicts.

I propose that we design the tool to use the discrete files, and Misc/NEWS is only ever generated from that tool. If, down the road, we decide that it's really okay, we can anoint Misc/NEWS as a second canonical location and delete the discrete files.

What problem does keeping the discrete files cause? Is it just that you don't like it for some aesthetic reason?

@westurner
Copy link

westurner commented Mar 18, 2017

(i think codelabels are helpful signal that's relevant to writing releasenotes)

commit messages and codelabels

API: an (incompatible) API change
BLD: change related to building numpy
BUG: bug fix
DEP: deprecate something, or remove a deprecated object
DEV: development tool or utility
DOC: documentation
ENH: enhancement
MAINT: maintenance commit (refactoring, typos, etc.)
REV: revert an earlier commit
STY: style fix (whitespace, PEP8)
TST: addition or modification of tests
REL: related to releasing numpy
ENH: Feature implementation
BUG: Bug fix
STY: Coding style changes (indenting, braces, code cleanup)
DOC: Sphinx documentation, docstring, or comment changes
CMP: Compiled code issues, regenerating C code with Cython, etc.
REL: Release related commit
TST: Change to a test, adding a test. Only used if not directly related to a bug.
REF: Refactoring changes

Codelabels⬅

Codelabels (code labels) are three-letter codes with which commit messages can be prefixed.

CODE Label          color name      background  text
---- -------------- --------------- ----------  -------
BLD  build          light green     #bfe5bf     #2a332a
BUG  bug            red             #fc2929     #ffffff  (github default)
CLN  cleanup        light yellow    #fef2c0     #333026
DOC  documentation  light blue      #c7def8     #282d33
ENH  enhancement    blue            #84b6eb     #1c2733  (github default)
ETC  config
PRF  performance    deep purple     #5319e7     #ffffff
REF  refactor       dark green      #009800     #ffffff
RLS  release        dark blue       #0052cc     #ffffff
SEC  security       orange          #eb6420     #ffffff
TST  test           light purple    #d4c5f9     #2b2833
UBY  usability      light pink      #f7c6c7     #332829

DAT  data
SCH  schema

REQ  requirement
ANN  announcement

# Workflow Labels   (e.g. for waffle.io kanban board columns)
ready               dark sea green  #006b75     #ffffff
in progress         yellow          #fbca04     #332900

# GitHub Labels
duplicate           darker gray     #cccccc     #333333  (github default)
help wanted         green           #159818     #ffffff  (github default)
invalid             light gray      #e6e6e6     #333333  (github default)
question            fuschia         #cc317c     #ffffff  (github default)
wontfix             white           #ffffff     #333333  (github default)

Note: All of these color codes (except for fuschia)
are drawn from the default GitHub palette.

Note: There are 23 labels listed here.

Note
For examples with color swatches in alphabetical order, see https://github.com/westurner/dotfiles/labels

... codelabels are worth the effort because:

  • codelabels are great for free-coding
    • because nobody remembers what "added thing to fix it" from n years
      ago was
    • ENH,DOC,UBY: site.css: decrease header navbar padding (#22)
    • BUG,TST: module.py, test_module.py: handle bytestrings (fixes #21)
    • BUG,TST: handle bytestrings in module.py (fixes #21)
    • handle bytestrings in module.py
    • BLD,TST: Makefile: test: #tox -> pytest --pdb (#)
  • codelabels are great for maintainers
    • merging
    • cherry-picking
    • writing releaselog entries from commit logs

... Releaselogs and Codelabels are part of
a broader need for Change Management and Requirements Traceability:

@brettcannon
Copy link
Member Author

@larryhastings yep, I personally don't see a need for anything beyond NEWS section and What's New relevance. To be perfectly honest, I say we drop the NEWS sections and just use the What's New sections, but I would want people like @warsaw and @doko42 who have to care more about the NEWS file to say whether it is useful to them to know a change updated an extension nodule vs pure Python code.

As for the extra metadata like authors, backports, etc. I don't think that's necessary as that's covered by the git repo data. And I think I have seen security suggested, but that I believe that should just be a classification in and of itself and trump all other classifications of the news entry.

@warsaw
Copy link
Member

warsaw commented Mar 20, 2017

I've found the Misc/NEWS sections (including Library vs. Core/Builtin vs. Extension Modules) to be helpful when trying to sleuth out whether a downstream reported bug was caused by a change in upstream Python or not.

@ncoghlan
Copy link
Contributor

+1 for what @warsaw noted - if something breaks after an upgrade, it's helpful to be able to go:

  • first check the What's New porting guide
  • then check the rest of What's New
  • then check the relevant section of Misc/NEWS
  • then check the whole of Misc/NEWS
  • then check the commit history

Sometimes "find in file" will short-circuit actually relying on the categorisation, but not always.

@westurner
Copy link

westurner commented Mar 21, 2017 via email

@ncoghlan
Copy link
Contributor

@westurner This is not the thread for you to try to tell operating system developers how to do our jobs (no thread is that thread). Reading docs can be done anywhere (and relatively quickly), while writing and running tests (especially under bisect) is far more time consuming and environmentally constrained (so you don't want to do it unless there's some reason to believe an upstream change may be at fault in the first place).

@westurner
Copy link

westurner commented Mar 21, 2017 via email

@rbtcollins
Copy link
Member

@westurner they really are not even vaguely related. The issue is that for a consumer of an externally delivered package like Python, their code, that we can reasonably expect them to bisect is a consumer: the thing that has changed has changed as a large atomic unit comprising thousands of commits, bundled through at least two levels of delivery: Upstream -> distro, distro -> user. To successfully bisect in such a situation (and this is ignoring the complexities of interactions with non stdlib components that also required changes over the evolution of the change) requires the consumer to learn how to build Python; how to build it using the distro package rules; adjust those rules for older tree revisions where intermediate commits were never packaged...

Its a huge effort vs 'oh look, the release notes say that the traceback package has been reimplemented, and my failure was in traceback, so I should look closely at that bit of code and maybe the upstream commits for it'.

@berkerpeksag
Copy link
Member

FWIW, I prefer a custom tool that lives in the Python organization on GitHub. I don't really want to left comments like "can you please take a look at this?" every two weeks in order to get a simple fix merged.

Since it will be on the Python organization and all core devs will have commit rights, I don't think maintaining such tool will be an issue. I can help Larry with maintaining it if you want to see a list of maintainers.

@dstufft
Copy link
Member

dstufft commented Mar 22, 2017

I doubt that bugging people to get fixes merged is going to be that big of a problem, and if it ends up being so we can easily fork the thing at that point. Deciding up front that we need our own thing on the off chance we can't get fixes merged seems a bit wasteful when we can, at any time, fork whatever solution we use and start maintaining it if it becomes an issue.

@brettcannon
Copy link
Member Author

I agree with @dstufft that worrying about maintenance is a bit premature (e.g. we already rely on Sphinx for building our docs which has its own dependencies). Now if Larry makes blurb more attractive because he simply makes it do the exact thing we want and that happens to not be what towncrier does then that's a legitimate reason for having our own tool.

@larryhastings
Copy link
Contributor

That was always my plan! Now if only we knew exactly what we wanted...

@1st1
Copy link
Member

1st1 commented Mar 22, 2017

FWIW, I prefer a custom tool that lives in the Python organization on GitHub. I don't really want to left comments like "can you please take a look at this?" every two weeks in order to get a simple fix merged.

This. Let's agree on the actual workflow and let Larry implement it. FWIW argument clinic's clinic.py is a pleasure to work with and maintaining it is frictionless since it's part of CPython repo.

@brettcannon
Copy link
Member Author

I've rejected reno as an option. Thanks to @dhellmann and @rbtcollins for the time and effort put into proposing it. In the end the fact that no one voted positively for it and the YAML format led to more formatting than necessary for the common case compared to towncrier or blurb makes me think it isn't the best fit for us.

@ncoghlan
Copy link
Contributor

A thought in regards to blurb as a CPython-specific tool: with it being just-for-us, there'd be more opportunities to make it aware of other workflow tools like bugs.python.org itself (especially if we follow up on @soltysh's efforts to incorporate the GSoC work that added a Roundup REST API: http://psf.upfronthosting.co.za/roundup/meta/issue581 ).

With a CPython-specific tool, CPython-specific service integrations aren't a problem. With a general purpose tool, we'd either need additional scripting around it (effectively creating our own tool anyway), or else come up with configurable solutions, rather than just handling the specific services we care about.

Given how much more complex the CPython development process is than a more typical single-release-stream Python project, that seems like it could be a recipe for future conflict (to put it in relative terms: when it comes to process complexity, CPython is to most other projects as OpenStack is to CPython)

@soltysh
Copy link
Collaborator

soltysh commented Mar 28, 2017

@ncoghlan I can agree on querying bpo about the issues solved in past release (using that REST endpoint), is that sufficient for the release notes?

@westurner
Copy link

westurner commented Mar 30, 2017

@soltysh @WadeFoster @mikeknoop

(modified) From #6 (comment) ("JSON-LD") :

{"@context": {
    "py": "https://schema.python.org/v1#",
    "bpo": "https://bug.python.org/issue",
    "pr": "https://github.com/python/cpython/pulls/",
    "ver": "https://schema.python.org/v1#releases/",
    "t": "https://schema.python.org/v1#releaselog/tag/",
    "pyreltag": "https://schema.python.org/v1#releaselog/tag/",

    "label": "https://github.com/python/cpython/labels/",

    "cved": "https://cvedetails.com/cve/CVE-",

    "name": { "@id": "schema:name" },
    "description": { "@id": "schema:description" },

    "notes": { "@id": "py:notes", "@container": "@list"},
    "issue": { "@id": "py:issue", "@container": "@list"},
    "mentionedIssue": { "@id": "py:issue", "@container": "@list"},
    "versions": { "@id": "py:versions", "@container": "@list"},
    "pr": { "@id": "py:versions", "@container": "@list"},
    "cve": { "@id": "py:versions", "@container": "@list"},
    "tags": { "@id": "py:versions", "@container": "@list"}
 },
 "@graph": [{
    "@type": "py:ReleaseLog",
    "name": "Python Misc/NEWS.rst",
    "notes": [{
        "@type": "py:ReleaseNote",
        "name": None,
        "description":
            "Add *argument* to function (closes #21) #feature #security #cve-2011-1015 #pr123 ",
        "issue": [ "bpo:22" ],
        "mentionedIssue": [ "bpo:21", "bpo:22" ],
        "cve": [ "cved:2011-1015" ],
        "pullRequests": [ "cpypr:123" ],
        "versions": [ "ver:2.7", "ver:3.6.1" ],
        "tags": [ "t:feature", "t:security" ]
    },
    {
        "@type": "py:ReleaseNote",
        "name": None,
        "description":
            "Fix thing #22 (closes #22) #bugfix #pr124 ",
        "issue": [ "bpo:22" ],
        "mentionedIssue": [ "bpo:22" ],
        "pullRequest": [ "cpypr:124" ],
        "versions": [ "ver:2.7", "ver:3.6.1" ],
        "tags": [ "t:bugfix" ]
    }
    ]
}

@westurner
Copy link

... The builds: matrix is obviously more information than is necessary for a release log.

@brettcannon
Copy link
Member Author

brettcannon commented Mar 30, 2017

@westurner please stop dumping your personal notes here. They are not contributing anything useful to this thread (e.g. none of us need a link to GitHub's API just pasted in a list as we all know how to use a search engine and there is no relevancy here for JSON-LD).

You have now been warned twice in this thread about your posting habits. I know you mean well, but please keep your posts concise and on-point or else I will block you from this issue tracker for not being respectful of other people's time.

@westurner
Copy link

Or you could store release note links/edges as restructuredtext line blocks:

Entry 1
========
 "Add *argument* to function (closes #21) #feature #security #cve-2011-1015 #pr123 
| issue: bpo-22
| cve: cved:2011-1015
| pullRequests: cpypr:123
| versions: 2.7, 3.6.1
| tags: "feature", "security"
-  "Add *argument* to function (closes #21) #feature #security #cve-2011-1015 #pr123 
  - | issue: bpo-22
  - | cve: cved:2011-1015
  - | pullRequests: cpypr:123
  - | versions: 2.7, 3.6.1
  - | tags: "feature", "security"

@larryhastings
Copy link
Contributor

larryhastings commented Mar 31, 2017

I haven't found the time to sit down and write this properly, so here's a quick note on this topic before Brett makes up his mind. sorry if it's a bit long / messy.

First, I don't have a strong opinion about what the input format to blurb should look like. If there was a consensus about "it should look like X", then I'd make it look like X.

We don't seem to have a consensus about what the input format should look like, because I don't think we've reached consensus about what metadata the tool needs. We need to figure that out first.

Obviously necessary:

  1. the Misc/NEWS text
  2. the Misc/NEWS category

I would also like:
3. some datestamp/nonce that ensures the news entries remain in some sort of stable order (I prefer chronological ordering, sadly git doesn't maintain timestamps)

I believe Brett is also asking for:
4. an optional "please consider for the next What's New document" flag
5. a suggested category for "What's New"

By the way, towncrier's approach of pre-created directories named for the categories (2.) is a nice idea. That ensures people don't misspell the category name. blurb could easily switch to that. the only downside I know of is that iiuc git doesn't track directories as first-class objects, so we'd have to have an extra empty file in each directory.

blurb current supports 1-3. it uses the filename for two bits of metadata (stable sorting order and category), and the contents of the file are simply the news entry. but it's kind of reached my comfort level regarding storing metadata in the filename.

if we only want to add 4, a simple "consider for what's new please", then okay I think we could live with sticking that in the filename. Like we add .wn just before the extension, for example.

if we want to add 5, then my inclination is to add a simple metadata blob to the contents of the file:

  • simple name=value (or name: value) pairs
  • # is a line comment
  • empty line or some explicit marker line ("--") ends the metadata blob

If we do that, then my inclination is further to move all the metadata into that blob:

  • category=Library
  • nonce=20170513062235.ef4c88a1
  • # what's new = Improved Modules

(uncomment "what's new" to use it)

The "blurb" tool would make it easy to add these, but users could also create the file by hand using a web page form that formats the output for them. (Making the entry entirely by hand might be tricky, since the nonce should be in a standardized format. Maybe we could give them a short blob of Python they run to generate one?)

If all the metadata lives inside the file, then we don't care what the filename is, it just needs to be unique.

@ncoghlan
Copy link
Contributor

For metadata-in-the-file, I quite like the format that the Nikola static blog generator uses:

.. title: The Python Packaging Ecosystem
.. slug: python-packaging-ecosystem
.. date: 2016-09-17 03:46:31 UTC
.. tags: python
.. category: python
.. link: 
.. description: Overview of the Python Packaging Ecosystem
.. type: text

Post starts here...

As an added bonus, when the file has the .rst extension, my editor automatically grays out the metadata as line comments, and assuming we're planning to use ReST in the snippets for ease of Sphinx integration, that would also apply here.

Using structured metadata like that would also open up future options for acknowledgements that aren't directly tracked in the git metadata - cases where we built on a patch written by someone else, or someone contributed API design ideas that someone else implemented, etc. At the moment we put that in the snippet body ("Initial patch by ..." and so forth), but a metadata field could more easily feed into ideas like auto-generating Misc/ACKS in addition to Misc/NEWS.

As far as a stable sort algorithm for display goes, we could then define that as:

  • a date field in the snippet metadata (e.g. the date string Nikola uses is just datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S UTC"))
  • the filename used for the snippet (since that has to be non-conflicting or git will complain)

@brettcannon
Copy link
Member Author

I've decided to go with blurb. The fact that we're still discussing what we want to carry forward in the entries suggests we need as much flexibility in the tooling as possible. Thanks @dstufft for putting the work in to put towncrier forward (and once again to @dhellmann and @rbtcollins for reno). I assume we will check it into Tools/ so it is carried with the repo for easy use by everyone and to make updating it easy (if I thought other teams might use it then I might argue for putting it into its own repo, but I don't think anyone will so I'm not going to suggest that).

I've started #66 for discussing how we want to format the entries since this issue is gotten rather long and slightly unwieldy.

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 1, 2017

I'd suggest putting it in the core-workflow repo or keeping it in its own repo, rather than putting it into Tools.

Tools is OK for things that don't change very often (e.g. reindent.py), and for things where it's OK for new features to only go into new versions (e.g. Argument Clinic), but it's a pain for anything that's still under active development and needs to behave consistently across branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests