Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun -np 2 python setup.py test --no-pysparse hangs on bunter #275

Closed
wd15 opened this issue Sep 19, 2014 · 15 comments
Closed

mpirun -np 2 python setup.py test --no-pysparse hangs on bunter #275

wd15 opened this issue Sep 19, 2014 · 15 comments

Comments

@wd15
Copy link
Contributor

wd15 commented Sep 19, 2014

It hangs at examples.chemotaxis.input2D.

  • When the chemotaxis tests are removed it runs through and all tests pass.
  • When the chemotaxis tests are run alone they pass.
  • When examples/test.py is run (including the chemotaxis tests) it works.
  • When the complete test suite is run with some modules test excluded it works. The module tests have the following exclusions
def _suite():
return _LateImportTestSuite(testModuleNames = (
'solvers.test',
'models.test',
##            'terms.test',
##            'tools.test',
##        'matrices.test',
##        'meshes.test',
##        'variables.test',
##      'viewers.test',
##      'boundaryConditions.test',
), base = __name__) 

Imported from trac ticket #371, created by wd15 on 11-01-2011 at 11:31, last modified: 06-12-2013 at 14:14

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

  • Running with
def _suite():
return _LateImportTestSuite(testModuleNames = (
'solvers.test',
'models.test',
'terms.test',
'tools.test',
'matrices.test',
##        'meshes.test',
##        'variables.test',
##      'viewers.test',
##  'boundaryConditions.test',
), base = __name__) 

passes.

Trac comment by wd15 on 11-01-2011 at 11:42

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

  • Running with
def _suite():
return _LateImportTestSuite(testModuleNames = (
'solvers.test',
'models.test',
'terms.test',
'tools.test',
'matrices.test',
'meshes.test',
'variables.test',
##      'viewers.test',
##  'boundaryConditions.test',
), base = __name__)

passes.

Trac comment by wd15 on 11-01-2011 at 11:45

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

  • Running with
def _suite():
return _LateImportTestSuite(testModuleNames = (
'solvers.test',
'models.test',
'terms.test',
'tools.test',
'matrices.test',
'meshes.test',
'variables.test',
'viewers.test',
##  'boundaryConditions.test',
), base = __name__) 

hangs!

Trac comment by wd15 on 11-01-2011 at 11:53

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Running with

def _suite():
return _LateImportTestSuite(testModuleNames = (
##            'solvers.test',
# 'models.test',
# 'terms.test',
# 'tools.test',
# 'matrices.test',
# 'meshes.test',
# 'variables.test',
'viewers.test',
##  'boundaryConditions.test',
), base = __name__) 

passes :-(

Trac comment by wd15 on 11-01-2011 at 11:56

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

I closed this out with r4818. This isn't the most satisfactory fix. The reason I settled for this without fully understanding the issue is because of the very strange patterns.

  • Requiring three module tests to be included to induce hanging.
  • Requiring 28 steps to eventually hang.
  • Changing solver to LU also fixed the problem.
  • Not being able to use the debugger in parallel.

I might not close this out actually and try and see where its failing.

Trac comment by wd15 on 11-01-2011 at 16:22

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Closing it.

Trac comment by wd15 on 11-01-2011 at 16:23

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

I've spent way to long trying to set up a debug environment, see blog:DebugEnvi. Going to use print statements now to try and debug.

Trac comment by wd15 on 11-08-2011 at 16:54

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Trying to get a stack trace, but for some reason gdb just hangs when I call pystack??? I know I've captured the process correctly because the tests stop when the process is attached and it restarts after pushing "c". It hangs at a different place though 39 steps rather than 28. That may be because of print statements also. Actually, adding debug PRINT statements changed the number of steps to hang.

Trac comment by wd15 on 11-08-2011 at 16:54

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Also the hang isn't occurring in the same place as Jon's hang in issue #264. I put print statements around the map calls in trilinosMatrix.py and it doesn't hang there.

Trac comment by wd15 on 11-08-2011 at 16:54

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Crazy, but this no longer hangs on bunter.

Trac comment by wd15 on 11-14-2011 at 10:46

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

This no longer occurs on bunter anymore so I'm closing it out. Another ticket is still open for a similar issue on the macs issue #288.

Trac comment by wd15 on 11-16-2011 at 17:06

@wd15
Copy link
Contributor Author

wd15 commented Sep 19, 2014

Take that back, still hangs :-(

Trac comment by wd15 on 11-16-2011 at 17:22

@guyer
Copy link
Member

guyer commented Sep 19, 2014

Replying to wd15:

This no longer occurs on bunter anymore so I'm closing it out. Another ticket is still open for a similar issue on the macs issue #288.

that should be issue #264

Trac comment by guyer on 12-05-2011 at 13:16

@guyer
Copy link
Member

guyer commented Sep 19, 2014

The details of the deadlock seem very inconsistent, but this is clearly the same issue as reported in issue #264

Trac comment by guyer on 12-08-2011 at 09:41

@fipymigrate
Copy link

In 88a1acd:

#CommitTicketReference repository="fipy" revision="88a1acde7f853deceed9892709c911b60430ba82"
* Hacked a fix for a weird bug with the no-pysparse tests on bunter. See issue #275 for more info.



git-svn-id: svn+ssh://code.matforge.org/fipy/branches/bunter@4818 d80e17d7-ff13-0410-a124-85740d801063

Trac comment by Daniel Wheeler daniel.wheeler@nist.gov on 06-12-2013 at 14:14

@wd15 wd15 closed this as completed Sep 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants