-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infra: spring software updates #1222
Comments
Sounds like a great idea to me.
In my opinion the biggest source of bitrot is that we don't run our Ansible scripts on the machines regularly, so we can't trust that they'll work on the machines (so we just update things manually because we don't have time etc. etc.) My ideal update scenario is a weekly job that runs the scripts against all the machines. If that is how we want to progress, the first step is to document the list of machines we can't use Ansible on (@rvagg has more info here), either because the scripts haven't been implemented/ported yet, or because the machines can't be updated as it will break custom things we've done to them. |
As per today's meeting today, "Error fetching remote repo" errors seem to be fixed by upgrading The other error relates to git but I the stacktrace suggests it's more to do with the remote call mechanism of Jenkins. We've had these errors for a long time and they seem to have been solved variously by: restarting jenkins, restarting machines, clearing workspaces, upgrading slave.jar and upgrading java. As I've already mentioned, I can't solve the error on one of the two smartos16 machines so I took it offline this week: https://ci.nodejs.org/computer/test-joyent-smartos16-x64-2/, the only thing I haven't tried is changing the Java version is used but I'm not sure I can even do that on SmartOS. |
Oh and re updating slave.jar, I'd be happy to see that done as part of the init/upstart/systemd scripting. It used to be built in to start.sh on the Raspberry Pi's and a bunch of other machines but we've stripped that out of most builds. That requires a bit of work of course but it wouldn't be hard to deploy. btw there is also ansible/playbooks/jenkins/worker/upgrade-jar.yml that you could try using. I haven't used it myself but it's worth playing with cause it could be run across most of our infra. |
https://ci.nodejs.org/computer/test-joyent-smartos16-x64-2/ fixed by restarting Another assumption I had as to the cause of the failures was related to the owner of |
+1, this has been part of the Windows script for a few years now and works great. The only drawback is that this is not straightforward for ci-release because it is locked, but this shouldn't stop us for test ci. |
@joaocgreis I would reccomend using https://adoptopenjdk.net/ java binaries if we plan to upgrade all of our machines. There is a nice API (https://api.adoptopenjdk.net/README) detailing how it can be used. I'd be happy to work through the playbooks and switch out the java sections to use this if everyone is happy with that? |
@gdams we started using Oracle Java at some point because it seemed to have better performance than the Open JDK that was installed in the machines. This was noticeable in the Jenkins server that is frequently under heavy load, and in the Raspberry Pis. However, this was only one of the things we did at the time and I'm not completely sure it was the cause of the improvement. If you feel sure about Open JDK performance, I wouldn't object to try it again (provided @rvagg is ok with that as well). To be clear, when I mentioned updating |
I wouldn't expect it to have different performance characteristics since it's fundamentally the same code. If there are scenarios in which the performance isn't the same, that would be useful for adoptopenjdk to be aware of, so I would be in favour of giving it another shot. |
The java code is mostly the same, but there are differences in the VM performance. Check out [1] for some more information about OpenJDK with OpenJ9, including some performance advantages that come with the OpenJ9 VM. |
Yes thanks @keithc-ca! It's worth pointing out that you can also fetch OpenJ9 binaries from AdoptOpenJDK! https://adoptopenjdk.net/releases.html?variant=openjdk8-openj9 |
Yes openjdk+openj9 will have different performance characterstics as @keith-ca says but the openjdk+hotspot builds from adoptopenjdk should be pretty much the same as oracle's current ones |
I have no objections to switching to openjdk, I don't know if it buys us anything here but being able to get on to Java 9 might be helpful I suppose? |
What's the status here? Should this stay open or is this resolved? |
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
I'm not sure how to coordinate this, but it IMHO we should do some systematic updates to the software on our infra. I'm referring to peripheral software such as Java,
slave.jar
andgit
(not OS or compilers).Besides minimizing potential bit-rot, and making us feel better in general, I have an intuiting it is already casing failures, and blocking process improvments. For example:
Jenkins java.io.IOException: remote file operation failed #173 (comment) Jenkins failing to communicate with workers — might be related to stale
slave.jar
:on failing machines there was an old agent running
after restart it bumps to:
"Log" show this warning:
Old git (I mean 1.8 when the latest is 2.17) doesn't handle sparse checkouts, which degrades the overall performance of the cluster:
My estimation is that we also have on some platforms outdated
sshd
with potential security issues (also we should disable plain-text password login where possible RE: aix/
drive overflowing #866)Since I now have time for such tasks, I'm seeking feedback / pitfalls / warnings. And also ideas on how to coordinate such efforts (RE @gibfahn and the Java8 project).
The text was updated successfully, but these errors were encountered: