-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyscf: 1.7.6.post1 -> 2.0.1 #144253
pyscf: 1.7.6.post1 -> 2.0.1 #144253
Conversation
As written I've enabled the test suite. However, it is huge and runs for at least 2 h on my 10 cores and some single test are failing and need to be disabled. I guess it might be better to disable the full test suite? What do you think? |
Let's disable the test suite for now. It is too big and not reliable. We still have a test in NixOS-QChem to catch basic errors. |
Even PySCF mainline doesn't run all tests that they include in their source: https://github.com/pyscf/pyscf/blob/master/.github/workflows/run_tests.sh Here are the tests that I excluded, feel free to use it as a starting point. I think tests typically ran for about 20-30 mins with all these disabled? https://github.com/drewrisinger/nur-packages/blob/77d8267ff1b432c76ae90fc57db9e9787640137f/pkgs/python-modules/pyscf/default.nix#L94-L255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments about structure of cppe derivation, and a few details on the hash types & build variables
That might be true, but the issue is that any packaging errors should ideally be caught before merge, not after. Do you expect a potential reviewer of future |
I agree with you, but unreliable test suites are worse than (or as good as) not having a test at all. I would not expect that some reviewer uses an external test. I look at the NixOS-QChem tests as a fallback option, which catches the error at some point later. |
b7126e3
to
f5d9be5
Compare
Thank you for the valuable feedback @drewrisinger , this solved some problems. I am doing final tunings on the test suite to get it running properly, and then it should be done. |
FWIW, release 2.0.1 was posted a few hours ago. Might want to roll that into this PR if possible, since it's not finalized? https://github.com/pyscf/pyscf/releases/tag/v2.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor suggestions
I am afraid I cannot reproduce any of those 😐 I have it running now on all of my three different CPUs (i7-7700K, Xeon W2155 and the old Xeon E5420). Unfortunately I do not have access to others, such as the AMD Epyc machines of @markuskowa . But in general numerical effects this large are worrying. However, I don't see what we could change about it, except disabling those tests. I am forcing the test to be always single core in the Regarding your failing NWChem parser test; I suppose this might actually depend on how many other files you have open during your build? I don't think we can change this. Upstream builds with netlib BLAS in the CI apparently. I mean we could force |
@ofborg build python3.pkgs.pyscf |
ofBorg built the current version just fine without any errors. 🙂 |
Ok, I'm willing to chalk the errors up to a local build issue, I'd never seen them on my previous builds of my NUR package version. |
libcint: formatting and features libcint: platforms
libxc: formatting libxc: platforms
fields: expose package fields: formatting fields: platforms fields: platforms fields: remove redundant platform
polarizationsolver: expose polyrizationsolver: formatting polarizationsolver: platforms polarizationsolver: platforms polarizationsolver: license polarizationsolver: remove redundant platform
cppe: move pytestCheckHook to checkInputs cppe: hash cppe: license and hash cppe: formatting python3.pkgs.cppe: more tests cppe: formatting cppe: formatting cppe: platforms cppe: platforms
pyscf: hash pyscf: limit test suite to single core pyscf: adapting test suite pyscf: fix pythonpath for tests pyscf: formatting pyscf: platforms remove log pyscf: enable uadc module pyscf: platforms pyscf: formatting pyscf: disable instable N3 CI test pyscf: formating pyscf: increase ulimit pyscf: ulimit files pyscf: remove ulimit -n
I am still getting random errors from the test suite:
Note, that I ran this on a machine that had over 100 GB of free RAM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to come up with a solution for the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce the errors anymore. Let's merge it and give it try.
Successfully created backport PR #147740 for |
Hydra can not build it at all (pyscf shows multiple test failures): |
Hm this is similar to this problem. Neither my machines, nor ofBorg had problems with those. The number of open files can unfortunately not be increased from within the job, but is a kernel setting. Interestingly they all fail by the NWChem parser: I will take a look if I can spot something there. |
I've tried to figure out what's going on in the code there. The nwchem basis parser opens a file for each atom (or at least element) for each basis set, but it does so safely in a bracket pattern and closes them immediately. So there is no apparent reason why especially the nwchem basis set parser (with which one of the hydra jobs fails) should have too many open files. I've no idea why this is not working on Hydra and for this build. The other Hydra jobs fails with numerical issues (huge ones), which very much looks like some faulty behaviour with respect to BLAS/LAPACK - numpy interaction or something like this. I still can't reproduce any of these errors on any of my Intel machines. Hydra has some AMD Epyc instances, correct? I've also built with mkl as the blas and lapack provider for all packages (global overlay) and can also not reproduce any of the errors with MKL. If MKL also does not solve the problem I will open an issue upstream and try to get an idea from the developers. |
I can give it a try with MKL, but my suspicion is that this will not solve all the problems. It fails also on aarch64 (which we could deactivate?), where MKL does not work. I think the problem is somewhere else: |
FYI: building pyscf with MKL leads to the following errors. Note that they fail just above the test's threshold.
|
Motivation for this change
This updates PySCF to its latest release. The build system has changed somewhat and now uses a lot of CMake. I've enabled the tests, as inspired by an older commit of @drewrisinger .
The CPPE library for polarisable embedding has been added, which is an optional dependency of PySCF. I am not entirely sure if i did the python binding stuff correctly there. CPPE is at least not in the
PYTHONPATH
from within a nix-shell.Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)