Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NN for remote jobs #1089

Merged
merged 4 commits into from
Aug 18, 2014
Merged

Fix NN for remote jobs #1089

merged 4 commits into from
Aug 18, 2014

Conversation

matthewrmshin
Copy link
Contributor

Also add tests to ensure NN behaves correctly to submit number equals 100.

(For some reason, log/job/${CYCLE}/${TASK}/NN is not getting created for remote jobs when I submitted #1069. I am certain I tested this sort of things at various points while I maintained the branch, but that might have been lost after so many merge and re-base operations.)

Add tests to ensure NN behaves correctly to submit number equals 100.
@matthewrmshin matthewrmshin added this to the cylc-6 milestone Aug 14, 2014
@matthewrmshin matthewrmshin self-assigned this Aug 14, 2014
@matthewrmshin
Copy link
Contributor Author

(Not yet ready for merge, as I am unable to test this until I am at work tomorrow morning.)

@matthewrmshin
Copy link
Contributor Author

@hjoliver please review.

N.B. the following are failing on master and on this branch, and I believe they are independent of this fix:

  • tests/cyclers/25-r1_initial_immortal.t
  • tests/cyclers/27-no_initial_but_final_cycle_point.t

@@ -40,6 +42,7 @@ class background( JobSubmit ):
"; " +
# Retry "mkdir" once to avoid race to create log/job/CYCLE/
" (mkdir -p %(jobfile_dir)s || mkdir -p %(jobfile_dir)s)" +
" && ln -fs $(basename %(jobfile_dir)s) $(dirname %(jobfile_dir)s)/NN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my tests, this does not have the intended effect after the first submit, e.g for a task that tried 4 times:

$ ls -l junk/log/job/1/foo/
total 320
drwxr-xr-x 2 oliverh niwa-users 32768 Aug 17 21:47 01
drwxr-xr-x 2 oliverh niwa-users 32768 Aug 17 21:22 02
drwxr-xr-x 2 oliverh niwa-users 32768 Aug 17 21:22 03
drwxr-xr-x 2 oliverh niwa-users 32768 Aug 17 21:43 04
lrwxrwxrwx 1 oliverh niwa-users     2 Aug 17 21:45 NN -> 01

and

$ ls -l junk/log/job/1/foo/NN/
total 320
lrwxrwxrwx 1 oliverh niwa-users    2 Aug 17 21:46 02 -> 02
lrwxrwxrwx 1 oliverh niwa-users    2 Aug 17 21:47 03 -> 03
lrwxrwxrwx 1 oliverh niwa-users    2 Aug 17 21:47 04 -> 04
-rwxr-xr-x 1 oliverh niwa-users 3864 Aug 17 21:30 job
-rw-r--r-- 1 oliverh niwa-users  217 Aug 17 21:30 job.err
-rw-r--r-- 1 oliverh niwa-users  324 Aug 17 21:30 job.out
-rw-r--r-- 1 oliverh niwa-users  115 Aug 17 21:30 job.status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Is this on a shared file system between the suite host and the job host?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no it wasn't, but I should check that too...)

@matthewrmshin
Copy link
Contributor Author

Problem should now be fixed. New test added.

@hjoliver
Copy link
Member

As commented above, my original problem did not involve a shared filesystem ... maybe a different or buggy version of ln on that host?? This definitely fixes the problem though, so I have not investigated further.

hjoliver added a commit that referenced this pull request Aug 18, 2014
@hjoliver hjoliver merged commit 2a3fadd into cylc:master Aug 18, 2014
@matthewrmshin matthewrmshin deleted the fix-remote-nn branch August 19, 2014 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants