Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyYAML 4.2 Release Plan #193

Closed
ingydotnet opened this issue Jun 29, 2018 · 48 comments
Closed

PyYAML 4.2 Release Plan #193

ingydotnet opened this issue Jun 29, 2018 · 48 comments

Comments

@ingydotnet
Copy link
Member

ingydotnet commented Jun 29, 2018

Synopsis

See the project planning page: https://github.com/yaml/pyyaml/projects/1

The PyYAML Release Situation

The most recent PyYAML, 3.12, was released Aug 2016. At that time, Kirill
turned over maintenance of PyYAML and LibYAML to @sigmavirus24 and @ingy .
Since then about 20 PRs have been applied to PyYAML and about 40 to LibYAML.

PyYAML has a release builder: https://github.com/yaml/pyyaml-build
It builds PyYAML wheels against specific versions combinations of
(Python, PyYAML, LibYAML).

This builder no longer works and it's complicated by the fact that the build
process for libyaml has been changed. The PyYAML team is working hard to fix it.

The 4.1 release attempt was rushed out because we knew that PyYAML 3.12 doesn't
work with Python 3.7 which went out this week. We had a fix for that in master,
and so we tried to get it out in time for 3.7.

We thought we had a Jenkins build system that would build the wheels as soon as
the sdist was uploaded. So we pushed the release only to find out minutes later
that this build system wasn't set up to build with libyaml. We were going to
have to use the pyyaml-build system.

After 48 hours of work on the windows/wheels system we decided to pull the plug
on 4.1. We didn't have wheels and we were getting reports of other things that
were wrong. We didn't have a sense that the build system was going to get fixed
soon, and we are all volunteers with limited time.

Soon after the release I learned about PR #74 and was completely surprised to
find that something this big went in without my seeing it. Looking back now I
remember that I had a lot going on in my life at that specific time.

#74 is a non backwards compatible change at the most basic level. It changes
how the dump and load functions behave.

The intent of the change is a good one:

  • Currently PyYAML has the sugar-API: dump, load, safe_dump, safe_load
  • PyYAML has had that API since version 3.1 (April 2006)
  • load() is trivial to exploit on untrusted data
  • Change load and dump to be aliases to safe_load and safe_dump
  • Add alarming (danger_*) new functions for the old load and dump

But this change has contentions:

  • It's non-back-compat and is going to affect a ton of existing code
  • The name danger is misleading when used in completely safe ways
    • In addition, danger_dump is not known to be exploitable in any way
  • PyYAML has known about this and had a safe_ solution in place from the start
    • People are just not feeling comfortable with the defaults

The change is important, worthy of a major release, but is not ready to
be part of PyYAML in its current form. A new PR, building from #74 and #189
should be worked on.

The Current Plan Forward

We need to get PyYAML released soon, if only for the Python 3.7 release. We
can't make any release at all until the build system works again. IOW, we
couldn't even re-release 3.12 right now.

The #74 API change is big and it is more important to get it right than to rush
it out. ie It load() may be a big can of gasoline, but nothing's on fire. ie
#74 doesn't "fix" anything. It just changes a default to something that's
always been safe and available.

There are 60 other changes that I'd like to tackle in the next release, while
at the same time taming a broken release process. My hope is that when we
figure this out, it will be easy to put out PyYAML releases on a regular basis.

We went from 3.12 to 4.x because this was a big release. It's big with or
without #74. I would like to see PyYAML 4.2 get out in the next few days.

If the successor to #74 / #189 is ready and approved by the time we are ready
to upload 4.2, it can go in.

If not, then I think it should be the focus of a 5.1 release. It's a big enough
change to trigger a major release. It should be in the first release of either
4.x or 5.x.

@asottile
Copy link
Contributor

imo #74 is the most important change to keep in the 4.x release -- without it this is still a thing:

>>> yaml.danger_load("!!python/object/apply:os.system\nargs: ['echo I am a shell, wow']")
I am a shell, wow
0

I'd be pretty sad to see load() go back to its dangerous defaults.

@penguinolog
Copy link

Bump of major version number allows compatibility break, but it should be explained for everyone. And removal of release is worse, than just blacklisting release number as buggy (you can see openstack global requirements with a lot blacklisted versions).
P.S.
Today is bench of open non-destructive pull requests with bugfixes, which affect users and I expect, that some of them can be in the next release

@adamcharnock
Copy link

Thank you for working on this @ingydotnet. I really appreciate both the work going into sorting this out, and the clear communication above. I'm very pleased to hear that the project is moving forward again 👍

@cdent
Copy link

cdent commented Jun 30, 2018

This is clearly a tricky situation but it seems like this plan can address it:

So this plan seems okay. It addresses the need for a py3.7 release, avoids breaking people (other than #192) and gives time to figure out how to do safe-by-default well.

@ddanier
Copy link

ddanier commented Jul 2, 2018

Perhaps a bare Python 3.7 compatibility release could go into something like 3.13 and 4.2 could then include #74 and #189 and recent changes?

The missing compatibility with 3.7 currently breaks usage of PyYAML. This is definitely worse than not having #74 and #189 done properly I think.
Note: Homebrew just rolled out the Python 3.7 update. All packages using PyYAML are broken right now.

@asottile
Copy link
Contributor

asottile commented Jul 2, 2018

that would unblock python3.7 for me:

git checkout 3.12 -b 3.12.post1
sed -i 's/3\.12/3.12.post1/g' -- setup.py
virtualenv venv
venv/bin/pip install cython
venv/bin/python setup.py sdist

This produces a source distribution with working .c file that's installable in python3.7:

$ ls dist/
PyYAML-3.12.post1.tar.gz
$ venv/bin/pip uninstall cython -y >& /dev/null
$ venv/bin/pip install dist/PyYAML-3.12.post1.tar.gz
Processing ./dist/PyYAML-3.12.post1.tar.gz
Building wheels for collected packages: PyYAML
  Running setup.py bdist_wheel for PyYAML ... done
Successfully built PyYAML
Installing collected packages: PyYAML
Successfully installed PyYAML-3.12.post1
$ venv/bin/python --version
Python 3.7.0
$ venv/bin/python -c 'from yaml.cyaml import CLoader'
$

EDIT (so it's clear what the above does):

it applies this diff:

diff --git a/setup.py b/setup.py
index 9dc5e8d..0a1db91 100644
--- a/setup.py
+++ b/setup.py
@@ -1,6 +1,6 @@
 
 NAME = 'PyYAML'
-VERSION = '3.12'
+VERSION = '3.12.post1'
 DESCRIPTION = "YAML parser and emitter for Python"
 LONG_DESCRIPTION = """\

(just a version bump)

and then produces a new source distribution with freshly built .c file (due to new-enough cython being installed alongside the setup.py invocation)

@ingydotnet
Copy link
Member Author

@ddanier @asottile your wish has (almost) come true...

@nitzmahone and I just released https://pypi.org/project/PyYAML/3.13b1/#files which is a rebuild of 3.12 with the latest cython. It includes 9 out of the 10 windows wheels we want to ship.

I expect the final 3.13 to be out in 24-48 hours. @nitzmahone is running tons of ansible tests to give us confidence.

We realized today that putting out a 3.13 PyYAML with a 0.1.7 LibYAML could happen a couple days sooner than a 4.2 with 0.2.2, so we've decided that having PyYAML work on Python 3.7 comes first.

This hasn't slowed down our work on 4.2, which we expect out possibly a week after 3.13. There's even a decent chance it can include a proper safe-load-by-default fix. @ddanier, remember that 4.2 is first about 60+ PRs to PyYAML and LibYAML since Aug 2016 (not just the #74 #189 #198 change).

@ddanier
Copy link

ddanier commented Jul 3, 2018

Thanks a lot!

@jakirkham
Copy link

cc @kalefranz @msarahan (for awareness)

@ingydotnet
Copy link
Member Author

Just a follow up to yesterday's comment about the 3.13 release. We are scheduled to upload 3.13rc1 in about 6 hours from now.

We realized the tomorrow is a major US holiday and probably a terrible day to make a final release, so we are hoping to get 3.13 out ASAP on July 5th.

@ingydotnet
Copy link
Member Author

As promised, PyYAML 3.13rc1 is on PyPI now.

We'll try to get a final 3.13 released asap on July 5th.

@asottile
Copy link
Contributor

asottile commented Jul 6, 2018

congrats on the 3.13 release 🎉

can confirm it's functioning on python3.7 for me :D

$ tox -e py37
...
___________________________________ summary ____________________________________
  py37: commands succeeded
  congratulations :)

@ingydotnet
Copy link
Member Author

PyYAML-3.13 was released yesterday.

Onwards and upwards to 4.2

(thanks @asottile :)

@ingydotnet
Copy link
Member Author

FYI. Added this link to the main issue text: https://github.com/yaml/pyyaml/projects/1

JoshData added a commit to GovReady/govready-q that referenced this issue Jul 6, 2018
This reverts commit 1de29c4.

The 4.1 release of pyaml was dropped from pypi because of breakages similar to what we had to fix in 94d646e (which I'm keeping). See yaml/pyyaml#193.
@fantix
Copy link

fantix commented Mar 11, 2019

A quick TL;DR for people like me who was confused in the situation and ended up here:

  • 4.2 is not releasing (this key comment is folded by GitHub)
  • 5.1 is coming soon
  • It will be deprecated to callload() without Loader= in 5.1, the default behavior is changing to FullLoader (or full_load()) which is almost as complete for serialization as UnsafeLoader/Loader, but it avoids arbitrary code execution.

@ingydotnet
Copy link
Member Author

@fantix That is a good summary. Thanks.

The longer version is here: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
which is redirected to by https://msg.pyyaml.org/load which is included in the deprecation warning issued by load() in PyYAML 5.1.

The release is expected to land in the next 16 hours.

@ingydotnet
Copy link
Member Author

PyYAML 5.1 has been released.

https://pypi.org/project/PyYAML/5.1/

@wimglenn
Copy link

wimglenn commented Mar 13, 2019

Thanks @ingydotnet. Where can users find the release notes and latest docs? The different default_flow_style is also a breaking change. (https://pyyaml.org/wiki/PyYAMLDocumentation still seems to be 3.13 docs)

@ingydotnet
Copy link
Member Author

@wimglenn Thanks for pointing that out. We'll update the docs soon.

@waylan
Copy link

waylan commented Mar 13, 2019

As the docs are not yet updated, there is something I would like to ask for clarification on.

The change to the API is clear enough from https://msg.pyyaml.org/load, but what is not clear is the actual difference in behavior between SafeLoader, FullLoader, and UnsafeLoader. Which set of YAML tags is accepted by each?

For example, assuming SafeLoader has not changed, it presumably accepts "Standard YAML tags" but not "Python-specific tags" or "Complex Python tags". And presumably, UnsafeLoader accepts all three types of tags. But which subset specifically is accepted by FullLoader? I get the impression it accepts "Standard YAML tags" and "Python-specific tags" but not "Complex Python tags". However, without any clear documentation, I can't be certain.

@perlpunk
Copy link
Member

@waylan yes, the docs need to be updated.
FullLoader accepts the same tags as UnsafeLoader, but it will not import modules, and it will not do function calls.

@waylan
Copy link

waylan commented Mar 14, 2019

FullLoader accepts the same tags as UnsafeLoader, but it will not import modules, and it will not do function calls.

But don't the "Complex Python tags" import modules and call functions, etc?

!!python/name:module.name
!!python/module:package.module
!!python/object:module.cls
!!python/object/new:module.cls
!!python/object/apply:module.f

Are you saying those don't work on both FullLoader and UnsafeLoader. I would have assumed they worked on UnsafeLoader but not FullLoader?

@ingydotnet
Copy link
Member Author

@waylan The commit message here might be useful for you: 0cedb2a

There are simple command line examples you can play with at the end of that message.

All of the interesting stuff happens in this module: https://github.com/yaml/pyyaml/blob/master/lib/yaml/constructor.py (and https://github.com/yaml/pyyaml/blob/master/lib3/yaml/constructor.py for Python 3).

This list of Full (also inherited by Unsafe) tags is here: https://github.com/yaml/pyyaml/blob/master/lib/yaml/constructor.py#L630

@perlpunk
Copy link
Member

Also see #265

@yilmi
Copy link

yilmi commented Sep 7, 2021

For the record, how the early decision to not move faster has affected other projects - https://www.bleepingcomputer.com/news/security/googles-tensorflow-drops-yaml-support-due-to-code-execution-flaw/

I'm sure it's not the only example... something to have in mind for future discussions

@perlpunk
Copy link
Member

perlpunk commented Sep 7, 2021

For the record, how the early decision to not move faster has affected other projects - https://www.bleepingcomputer.com/news/security/googles-tensorflow-drops-yaml-support-due-to-code-execution-flaw/

I think this is a funny example, because the code from TensorFlow deliberately uses unsafe_load, and then complains about its unsafety. But note that I only skimmed the article.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests