-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRW - Discrepancies in the animated flux report on different servers #335
Comments
The calculation can be found here - https://alpha.sirepo.com/srw#/source/nJvy1IIy. |
This is weird. I tried to run it with and without |
$ mpiexec -version
mpiexec (OpenRTE) 1.8.3
Report bugs to http://www.open-mpi.org/community/help/ |
On mine, mpirun (Open MPI) 1.8.3, which should be the same as alpha.
|
Did you try to run it with mpi in Sirepo on your machine this morning? |
Here is test data to check on different servers: |
With HYDRA mpiexec I got the same wrong results.
|
|
Tried it on different OS and with different mpi installations. Conclusion - the calculation is always wrong when |
Confirmed: I'm running with mpirun with 4 cores, and it is showing ~8e15. This morning I was running with one core. |
I created an issue in mrakitin/SRW to debug this issue - mrakitin/SRW#6. |
I debugged the issue and found out that incorrect values appear when @robnagler, what do you think? |
Fixed in 813df49. Now 5 @robnagler, it looks strange that import srwl_bl
v = srwl_bl.srwl_uti_parse_options(varParam, use_sys_argv=False)
source_type, mag = srwl_bl.setup_source(v)
v.wm_na = v.sm_na = 1
# Number of "iterations" per save is best set to num processes
v.wm_ns = v.sm_ns = 24
op = set_optics()
srwl_bl.SRWLBeamline(_name=v.name).calc_all(v, op) The following condition should be False on these systems in https://github.com/radiasoft/sirepo/blob/master/sirepo/pkcli/srw.py#L57: if pkconfig.channel_in('dev'):
p['particles_per_core'] = 1 Are we on the Anyway, according to the information from Oleg, |
I think it should be "dev" if you are starting the server using "service".
https://github.com/radiasoft/devops/blob/master/debian/sirepo/root/etc/default/bivio-service#L2
Rob
|
Thanks Rob, I see about "service". However on cpu-001 I have the same configuration as on alpha:
However the generated file contained |
I think I fixed this in radiasoft/devops@7106b33. Download and install grep PYKERN_PKCONFIG_CHANNEL /var/lib/sirepo/init.log |
Restart both celery and sirepo |
Done, thanks. |
Strange, the number of macroelectrons is not the limitation for the calculation:
fluxAnimation# cat run.log
../../../../../../../../home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pksubprocess.py:63:check_call_with_signals 1078: started: ['mpiexec', '--bind-to', 'none', '-n', '24', '/home/vagrant/.pyenv/versions/2.7.10/bin/python', 'mpi_run.py']
../../../../../../../../home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pksubprocess.py:72:check_call_with_signals 1078: exception: ['mpiexec', '--bind-to', 'none', '-n', '24', '/home/vagrant/.pyenv/versions/2.7.10/bin/python', 'mpi_run.py'] RuntimeError: error exit(1)
File "/home/vagrant/.pyenv/versions/2.7.10/bin/sirepo", line 9, in <module>
load_entry_point('sirepo==20160907.133350', 'console_scripts', 'sirepo')()
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/sirepo_console.py", line 18, in main
return pkcli.main('sirepo')
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pkcli/__init__.py", line 131, in main
argh.dispatch(parser, argv=argv)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/pkcli/srw.py", line 71, in run_background
mpi.run_script(script)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/mpi.py", line 67, in run_script
return run_program([sys.executable or 'python', fn])
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/mpi.py", line 41, in run_program
env=env,
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pksubprocess.py", line 67, in check_call_with_signals
raise RuntimeError('error exit({})'.format(rc))
Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/2.7.10/bin/sirepo", line 9, in <module>
load_entry_point('sirepo==20160907.133350', 'console_scripts', 'sirepo')()
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/sirepo_console.py", line 18, in main
return pkcli.main('sirepo')
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pkcli/__init__.py", line 131, in main
argh.dispatch(parser, argv=argv)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/pkcli/srw.py", line 71, in run_background
mpi.run_script(script)
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/mpi.py", line 67, in run_script
return run_program([sys.executable or 'python', fn])
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/sirepo-20160907.133350-py2.7.egg/sirepo/mpi.py", line 41, in run_program
env=env,
File "/home/vagrant/.pyenv/versions/2.7.10/lib/python2.7/site-packages/pykern-20160902.164834-py2.7.egg/pykern/pksubprocess.py", line 67, in check_call_with_signals
raise RuntimeError('error exit({})'.format(rc))
RuntimeError: error exit(1) |
It wasn't reported correctly in the |
When I try to perform a calculation within the animated flux report on different installations of Sirepo (alpha vs. my dev installation) with the same input JSON file and related magn_meas_esm.zip, I get different results on different servers:
Note that on alpha maximum Flux value is ~8e15 [ph/s/.1%bw], but correct result should be ~8e14 [ph/s/.1%bw].
I checked the folders with calculations from both servers, and didn't find any differences in the
.py
files, all the parameters look the same except:v.wm_ns = v.sm_ns = 2
on alpha;v.wm_ns = v.sm_ns = 1
on localhost.Here are the corresponding archives:
It's hard to reproduce the situation. Originally @ochubar noticed this issue on our internal installation at BNL (nsls2expdev1 server), then I reproduced it on alpha. During our meeting @moellep reproduced it on his dev installation, but @robnagler didn't see the wrong result with exactly the same inputs. Need to find the reason of this strange bug.
The text was updated successfully, but these errors were encountered: