Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX: pseudo-tty/no_dropped_stdio failures on new machine #7973

Closed
mhdawson opened this issue Aug 4, 2016 · 24 comments
Closed

AIX: pseudo-tty/no_dropped_stdio failures on new machine #7973

mhdawson opened this issue Aug 4, 2016 · 24 comments
Assignees
Labels
aix Issues and PRs related to the AIX platform. ppc Issues and PRs related to the Power architecture. test Issues and PRs related to the tests. tty Issues and PRs related to the tty subsystem.

Comments

@mhdawson
Copy link
Member

mhdawson commented Aug 4, 2016

  • Version: master
  • Platform: AIX
  • Subsystem: io

Failures on new AIX machine: https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/283/console

# 
# assert.js:89
#   throw new assert.AssertionError({
#   ^
# AssertionError: 1 == 42
#     at ChildProcess. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-stdio-closed.js:25:10)
#     at ChildProcess. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:411:15)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:204:12)
  ---


  ...
length differs.
expect=21
actual=0
patterns:
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^oooooooooooooooooooooooooooooooooooooooooooooooo$
pattern = ^ooooooooooooooooooooooooO$
outlines:
not ok 1170 pseudo-tty/no_dropped_stdio
  ---
  duration_ms: 0.186
@mscdex mscdex added test Issues and PRs related to the tests. tty Issues and PRs related to the tty subsystem. labels Aug 4, 2016
@Fishrock123
Copy link
Contributor

@mhdawson Anything different about this new machine? Didn't it pass on the old ones?

@jbergstroem
Copy link
Member

Related to JOBS= ?

@mhdawson mhdawson assigned Fishrock123 and mhdawson and unassigned Fishrock123 and mhdawson Aug 9, 2016
@mhdawson
Copy link
Member Author

mhdawson commented Aug 9, 2016

It did pass on the old one. The previous machine was a vm from siteox with different levels of the OS as opposed to the new ones that are from osusol. I have asked @gireeshpunathil to investigate.

@gireeshpunathil
Copy link
Member

The test case want to make sure every byte that is written to the standard stream is drained before the process exits, basically 1025 characters with 21 new lines.

The test case seem to have devised to validate an OS X behavior on the 1025th character. However in AIX case, at times it receives none while the python parent expects 21 lines.

The stdout of the child is piped into a tempfile, which is later read by the python parent.

I tried to isolate the issue in a number of manners without success: (though many of these different variant of one or two discrete scenarios)

  1. node no_dropped_stdio.js | xargs echo
  2. node no_dropped_stdio.js | wc -c
  3. node no_dropped_stdio.js > out.txt 2>&1
  4. A ndoe parent with piped streams for the child
  5. A node parent with inherited streams for the child
  6. A node parent with file fd passed for child streams and later read from the file

So it looks like the python parent in the test framework has something peculiar or the machine I am running is different from the new CI one.

@mhdawson how easy it is to get hold of the failing CI system for debugging?

@mhdawson
Copy link
Member Author

Created this request to get Gireesh access nodejs/build#463

@Fishrock123
Copy link
Contributor

@gireeshpunathil None of those will work correctly. Your examples are all pipes.

You need to use a faked TTY. Either via pty.js or via the python test runner in pseudo-tty, which uses a faked TTY. (Code mostly in tools/test.py.)

To be clear: if AIX actually fails this then there is a big problem on AIX even if the test was designed for OS X.

@gireeshpunathil
Copy link
Member

Thanks @Fishrock123 - yes, I agree that all the above cases are variants of pipes.

I actually debugged the python code, and validated that the spawing code does not use faketty in this specific scenario, rather creates a temp file, its fd piped into the child node's streams, and then upon child termination, close the fd and read from the file - hence my attempt to mimic the behavior through case 6 above.

However, this is really useful information, I will study the faketty and debug further by taking lead from that. Thanks for the suggestion.

@Fishrock123
Copy link
Contributor

@gireeshpunathil it is possible that the python fake tty does not work on AIX -- in which case these tests should be disabled there.

@mhdawson
Copy link
Member Author

@gireeshpunathil I just want to make sure you are investigating both failures

parallel/test-stdio-closed
pseudo-tty/no_dropped_stdio

parallel/test-stdio-closed - fails in every run
pseudo-tty/no_dropped_stdio - I've not seen fail in the community CI again, however, internally (IBM) we've reproduced with 316/1000 if you need help to reproduce follow up with @sxa555 or Jeremiah Akpotohwo in hursley

@mhdawson
Copy link
Member Author

Have provided @gireeshpunathil with access to community machine

mhdawson added a commit to mhdawson/io.js that referenced this issue Aug 11, 2016
We now have adequate AIX hardware to add AIX to
the regular regression runs.

However, there are a couple of failing tests even
though AIX was green at one point.  This PR
marks those tests as flaky so that we can add AIX
so that we can spot any new regressions without making
the builds RED

The tests are being worked under the following PRs

- being worked under nodejs#7564
test-async-wrap-post-did-throw
test-async-wrap-throw-from-callback
test-crypto-random

- being worked under nodejs#7973
test-stdio-closed

- covered by nodejs#3796
test-debug-signal-cluster
mhdawson added a commit that referenced this issue Aug 11, 2016
We now have adequate AIX hardware to add AIX to
the regular regression runs.

However, there are a couple of failing tests even
though AIX was green at one point.  This PR
marks those tests as flaky so that we can add AIX
so that we can spot any new regressions without making
the builds RED

The tests are being worked under the following PRs

- being worked under #7564
test-async-wrap-post-did-throw
test-async-wrap-throw-from-callback
test-crypto-random

- being worked under #7973
test-stdio-closed

- covered by #3796
test-debug-signal-cluster

PR-URL: #8065
Reviewed-By: joaocgreis - João Reis <reis@janeasystems.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
cjihrig pushed a commit that referenced this issue Aug 11, 2016
We now have adequate AIX hardware to add AIX to
the regular regression runs.

However, there are a couple of failing tests even
though AIX was green at one point.  This PR
marks those tests as flaky so that we can add AIX
so that we can spot any new regressions without making
the builds RED

The tests are being worked under the following PRs

- being worked under #7564
test-async-wrap-post-did-throw
test-async-wrap-throw-from-callback
test-crypto-random

- being worked under #7973
test-stdio-closed

- covered by #3796
test-debug-signal-cluster

PR-URL: #8065
Reviewed-By: joaocgreis - João Reis <reis@janeasystems.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@mhdawson
Copy link
Member Author

@gireeshpunathil have you had a change to look at this ?

@gireeshpunathil
Copy link
Member

@mhdawson - yes, but is a slow progress. Any major findings, I will post here.

@gireeshpunathil
Copy link
Member

just a status update: no major insights so far. Issue seen as consistent, trying to localize it into a python / python + C test case: to eliminate node's role in it.

@mhdawson
Copy link
Member Author

mhdawson commented Sep 1, 2016

Does this investigation apply to one or both of:

  1. parallel/test-stdio-closed
  2. pseudo-tty/no_dropped_stdio

?

@gireeshpunathil
Copy link
Member

This is only for 2) pseudo-tty/no_dropped_stdio

@gireeshpunathil
Copy link
Member

ok, came up with a small python code which works well in Linux but hangs in AIX. This is detached from node, which means that the problem is outside the node scope. While the code comes out immediately in Linux, the usage of fake_out for the stdout causes the python thread to hang in close() call.

import os
import subprocess
import pty

arg = ['ls', '-lrt']
(out_master, fd_out) = pty.openpty()
process = subprocess.Popen(
  args = arg,
  stdout = fd_out
)

os.close(fd_out)

I think the psuedo-tty tests should be omitted for AIX, while the python behavioral difference is understood and or addressed - as they don't reveal any issues in node instead blocks the test logic due to the python bug.

@gireeshpunathil
Copy link
Member

parallel/test-stdio-closed failure - test case performs the following:

  1. spawns a child (shell) with inhertied streams from parent.
  2. the shell executes a child node, whose streams are inherently closed.
  3. Child writes to stdout, and then onto stderr, and then exits with a specific error code.
  4. Parent asserts on the error code.

In Linux, the child executes the stream writes and then exits with the specified error code (42).
In AIX, the child exits on the first attempt to write on to the closed stream (error code:1)

And hence the failure:
AssertionError: 1 == 42

Investigating further.

@Fishrock123
Copy link
Contributor

Fishrock123 commented Sep 1, 2016

Should we be marking pseudo-tty/no_dropped_stdio as flaky on AIX? Or is it already and I missed it?

@mhdawson
Copy link
Member Author

mhdawson commented Sep 2, 2016

Pulled out stdio failure from description in initial text for report since it is not being worked in this issue and does not seem related. #8375

@mhdawson mhdawson changed the title AIX: parallel/test-stdio-closed and pseudo-tty/no_dropped_stdio failures on new machine AIX: pseudo-tty/no_dropped_stdio failures on new machine Sep 2, 2016
@mhdawson
Copy link
Member Author

mhdawson commented Sep 2, 2016

Just talking with @gireeshpunathil , I'll submit a PR to mark as flaky

@mhdawson
Copy link
Member Author

mhdawson commented Sep 2, 2016

PR to mark as flaky https://github.com/nodejs/node/pull/8385/files

@mscdex mscdex added aix Issues and PRs related to the AIX platform. ppc Issues and PRs related to the Power architecture. labels Nov 16, 2016
@mhdawson
Copy link
Member Author

Make sure to un-exclude this one as well once resolved: #9765

mhdawson added a commit to mhdawson/io.js that referenced this issue Nov 23, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
nodejs#7973.
Excluding this additional test until we can
resolve the python issue.
mhdawson added a commit that referenced this issue Nov 25, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes #9765
PR-URL: #9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
addaleax pushed a commit that referenced this issue Dec 5, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes #9765
PR-URL: #9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
addaleax pushed a commit to addaleax/node that referenced this issue Dec 8, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
nodejs#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes nodejs#9765
PR-URL: nodejs#9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
@Fishrock123 Fishrock123 assigned mhdawson and unassigned Fishrock123 and mhdawson Dec 16, 2016
MylesBorins pushed a commit that referenced this issue Dec 20, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes #9765
PR-URL: #9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 21, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes #9765
PR-URL: #9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
MylesBorins pushed a commit that referenced this issue Dec 21, 2016
pseudo-tty/no_interleaved_stdio has hung a few times
in the last couple of days on AIX.  We believe
it is not a Node.js issue but an issue with python
on AIX. Its being investigated under:
#7973.
Excluding this additional test until we can
resolve the python issue.

Fixes #9765
PR-URL: #9772
Reviewed-By: Sam Roberts <sam@strongloop.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
italoacasas pushed a commit to italoacasas/node that referenced this issue Mar 20, 2017
The tests in pseudo-tty takes the form of child node writing some data
and exiting, while parent python consume them through pseudo tty
implementations, and validate the result.

While there is no synchronization between child and parent, this works
for most platforms, except AIX, where the child exits even before the
parent could setup the read loop, under race conditions

Fixing the race condition is ideally done through sending ACK messages
to and forth, but involves massive changes and have side effect. The
workaround is to address them in AIX alone, by adding a reasonable
delay.

PR-URL: nodejs#11715
Fixes: nodejs#7973
Fixes: nodejs#9765
Fixes: nodejs#11541
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this issue Mar 21, 2017
The tests in pseudo-tty takes the form of child node writing some data
and exiting, while parent python consume them through pseudo tty
implementations, and validate the result.

While there is no synchronization between child and parent, this works
for most platforms, except AIX, where the child exits even before the
parent could setup the read loop, under race conditions

Fixing the race condition is ideally done through sending ACK messages
to and forth, but involves massive changes and have side effect. The
workaround is to address them in AIX alone, by adding a reasonable
delay.

PR-URL: nodejs#11715
Fixes: nodejs#7973
Fixes: nodejs#9765
Fixes: nodejs#11541
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aix Issues and PRs related to the AIX platform. ppc Issues and PRs related to the Power architecture. test Issues and PRs related to the tests. tty Issues and PRs related to the tty subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants