Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce startup time by 55% #719

Merged
merged 4 commits into from
Dec 11, 2019
Merged

Conversation

maebert
Copy link
Contributor

@maebert maebert commented Nov 8, 2019

Let's have a look at import times before the changes (visualised with tuna):

poetry run poetry run python -X importtime -m jrnl 2> imports.txt
tuna imports.txt

image

We can see that the main culprits ispkg_resources, taking 54% of our total startup time. So instead of going through the system to ask which version of jrnl is installed, this PR bakes a file with the version number before building it on Travis.

Let's look at it after the change:

image

Note

This only measures the import time for modules, not any execution that happens after the imports. There's a bunch of stuff we can do to optimize that too, for example YAML loading is notoriously slow.

Other minor changes

  • I noticed that in about 10% of runs, asteval takes suspiciously long to import, so I've delayed the import of that too.
  • This PR also includes the version of python in jrnl --version (closes new version output #697)

Checklist

  • The code change is tested and works locally.
  • Tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?

@pspeter
Copy link
Contributor

pspeter commented Nov 8, 2019

Good catch on the startup time!

However, I don't really like the version handling.
I think the standard way is to put it either in a __version__.py file in the base directory of the package and just import that into __init__.py, or just put it directly into the base __init__.py. To pick out some examples, that's how poetry, scikit-learn and pandas (although a bit more complicated) do it.

That way you don't have to mess with absolute paths that might not work if the package is zipped, for example. Also, if the file is only created on travis for the build, then it will not be there for people installing from source.

@maebert
Copy link
Contributor Author

maebert commented Nov 8, 2019

@pspeter I tend to agree with you, actually. The idea that @wren implemented in b4fda4d was essentially to use the git tag as the source of truth for versioning. I think that's clever, but it also feels backwards - the version should depend on the code, not on how it's stored.

@wren What was the rationale for doing it that way? What if we use the pyproject.toml as the source of truth for the version, and the maintainers will bump the version for new releases?

@pspeter
Copy link
Contributor

pspeter commented Nov 8, 2019

As a developer I can understand the need for a single source of truth to avoid mismatches. That's why tools like bump2version exist, I guess.
However, the current situation feels like a bit of a hack.

@maebert
Copy link
Contributor Author

maebert commented Nov 8, 2019

Agreed - let's have @wren weigh in here and decide on a cleaner way to handle version authority.

@wren
Copy link
Member

wren commented Nov 10, 2019

For the versioning question: I strongly believe that the version should have a single source of truth. And since the git tag is easiest to update, I tend to prefer using that.

I completely agree that it seems backwards, but I believe that's because the process was incomplete in this repo. I put 0.0.0-source in the repo as a placeholder, and then filed an issue ( #715 ) to fix the problem. The has always been to keep the version updated in the repo as releases happen, but I prefer to have that sort of repetitive work to be done for me. So, the goal is to be able to tag a release on Github, and have everything else from that point should happen automatically (runs tests, pypi deploy, increment version in repo, and soon brew deploy, choco deploy, update changelog, and so on and so forth).

That being said, we are relying on the toml file to read in the version, which turns out to be pretty slow (great catch on the startup time, by the way, @maebert). But there's really no reason we can't store the version in multiple places as long as nobody has to manually update anything (because then there's the potential to wreck our single source of truth). With this in mind, I added a bot to the build process that will update the version during deployment and commit the change back into the repo (this was my original intent with my partially completed work; I'm sorry I didn't make that more clear to everyone). With this bot in place, we can store the version in multiple places (like the toml file, and a version file) without having to worry about our source of truth.

What this bot doesn't fix, though, is the startup slowness. For that, I prefer the approach of storing the version in a separate file that is easier to access. I personally don't think the difference between VERSION.txt and __version__.py is any different, and it doesn't seem like it would have any performance impact either. I have seen many more projects use the __version__.py approach, though, so it's probably best if we follow the community on that one.

I think this commit in the Poetry repo is a good example of what our bot commits will look like when everything is in place.

@wren
Copy link
Member

wren commented Nov 10, 2019

Oh! Also, for the version output update (#697): I know it's an easy fix, but I want to encourage new contributors to submit more PRs. If we yank the work out from under them, they're less likely to submit more PRs in the future. Can you leave that out of this PR for now? We can always put it in later if the PR doesn't pan out.

@alichtman
Copy link
Contributor

alichtman commented Nov 10, 2019

I strongly believe that the version should have a single source of truth.

Agreed.

And since the git tag is easiest to update, I tend to prefer using that.

I'm not sure I agree here. I don't see why it's any more difficult to have the source of truth be the __version__.py file (or a constant stored in __init__.py). This comes with the added benefit of not having the version be "invisible" (not in the source code). This can be auto-updated by a bot with a sed command or some simple string parsing (or I'm sure someone's written a semver library for python already). I'd strongly argue against using git tags as the source of truth.

I know it's an easy fix, but I want to encourage new contributors to submit more PRs. If we yank the work out from under them, they're less likely to submit more PRs in the future. Can you leave that out of this PR for now?

Great call.

@wren
Copy link
Member

wren commented Nov 10, 2019

I'm not sure I explained what I was saying very well, because reading your response sounds to me like we're saying the same thing. My intent is to keep the version in the source (both in the pyproject.toml file in a __version__.py file). Both of those files will be updated automatically whenever a new tag is created.

This comes with the added benefit of not having the version be "invisible" (not in the source code).

The version won't be (and currently isn't) invisible in the source code (the 0.0.0-source thing was a placeholder, but is already fixed).

This can be auto-updated by a bot with a sed command or some simple string parsing (or I'm sure someone's written a semver library for python already).

This is already what we're doing (although it's only happening in the toml file right now, but we can add a version file pretty easily). What you'll see when a new release comes out, is a commit by our bot account (like this one) that will update the relevant files (along with all of the other release stuff it needs to do).

So, to clarify, when I say the source of truth, it's because that's the only place that I am editing the value. We can't pass the version number by reference without incurring some performance penalties, so we rely on the bot to propagate it for us to the correct locations whenever we tag a release. Therefore, nobody should ever edit the value anywhere else. I realize that isn't the strict definition of SSOT, so maybe that's where our miscommunication lies.

Am I making more sense? Or was I misunderstanding you?

@pspeter
Copy link
Contributor

pspeter commented Nov 10, 2019

This is not really the place to discuss this, but I agree with @alichtman here. One problem of having a bot change the version after the tag is that the tagged commit now always has the wrong version in the source code - it will always be one version behind.
I'm a bigger fan of having a little script that updates the version strings in all necessary files and either have the script also tag the commit with the same version or doing it manually afterwards.

@pspeter
Copy link
Contributor

pspeter commented Nov 10, 2019

I found one script that looks pretty good to me. Maybe you can draw some inspiration from it:

https://github.com/Kwpolska/python-project-template/blob/master/%7B%7Bcookiecutter.repo_name%7D%7D/release

@alichtman
Copy link
Contributor

alichtman commented Nov 10, 2019

I realize that isn't the strict definition of SSOT, so maybe that's where our miscommunication lies.

Ah, ok. Makes more sense now. I do think it's important for there to be a way to bump the version manually for arbitrary test installs.

@C0DK C0DK mentioned this pull request Nov 11, 2019
7 tasks
@maebert maebert force-pushed the maebert-fast-import branch from 9db17eb to 80b376b Compare November 11, 2019 19:48
@maebert
Copy link
Contributor Author

maebert commented Nov 11, 2019

One thought on VERSION.txt vs __version__.py:

In general, I would prefer the latter, however in this case, the contents are written as part of the production pipeline; if we put that into a file that gets imported, that means we're writing potentially arbitrary code into the release - at best that might break the release (eg. if we use a git tag that includes a ", at worst it could be an attack vector for code injection -- either way it's hard to spot because it's not visible in the source. Hence, as long as we're dealing with the current flow, I strongly support reading from a text file, not executing a file that we don't have source control over.

On pkg_resouces

Right now on master, we don't actually read the version from pyproject.toml (that doesn't even get included in the wheel), instead we ask the python package manager to find the installed distribution of jrnl for us and get the metadata from it. That also means that if we're developing in an environment where we don't have an installation of jrnl and running it with python -m jrnl instead, the whole thing fails. I know that it was me who put it in there, and I honestly didn't know better then :)

Suggestion

@wren, what about a workflow where we add a makefile command that both updates the version in pyproject.toml and a __version__.py (since it's now part of the source), and automatically creates a git tag? In the pipeline, we can test that those three versions are congruent too to avoid any accidental version changes.

Finally...

Oh! Also, for the version output update (#697): [...] Can you leave that out of this PR for now?

100% agree, done.

@wren
Copy link
Member

wren commented Nov 13, 2019

Okay, it sounds like we need to have more talks about this. How about we take it to the original ticket (#715)?

@wren
Copy link
Member

wren commented Nov 30, 2019

@maebert Did you have a chance to look at my comments in (#715)? I'm good with keeping VERSION.txt if you think still it's more secure given the checks I mentioned. Just let me know what you think and we can get this merged in.

@wren
Copy link
Member

wren commented Dec 9, 2019

Hi! This is good. I'll get this tested on and merged in on Tuesday. Thank you!

@wren wren changed the base branch from master to develop December 11, 2019 04:39
Copy link
Member

@wren wren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you!

@wren wren merged commit f5e47f6 into jrnl-org:develop Dec 11, 2019
@wren wren added the enhancement New feature or request label Jan 10, 2020
@lock
Copy link

lock bot commented May 21, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the 🔒 Outdated label May 21, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request 🔒 Outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants