Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subprocess][go] run the subprocesses from go-land #843

Merged
merged 19 commits into from
Nov 24, 2017
Merged

Conversation

truthbk
Copy link
Member

@truthbk truthbk commented Nov 23, 2017

What does this PR do?

Runs the subprocess from go-land as opposed to forking from within the python interpreter. This should hopefully address memory leak issues we have experienced, as well as resolve the zombie fork issue (triggered by an obscure race-condition) we have experienced.

Motivation

The subprocess32 approach in #830 has helped with the memory leak, but wasn't a truly satisfactory fix as it didn't quite fix the issue on windows. The alternative, setting a global lock on windows would likely be less desirable than a pure-go cross-platform fix. Hopefully this addresses that.

Additional Notes

I've been unable to get the tests to fly on gitlab, they seem fine locally. Let's see what happens on CircleCI. Let's be thorough with the GIL, reference count during the review 😅

@truthbk truthbk requested a review from a team as a code owner November 23, 2017 11:14
@truthbk
Copy link
Member Author

truthbk commented Nov 23, 2017

We have a data race when reading stdout/stderr. Must address that. Addressed.

Found another race condition on CircleCI (not locally) fixing that.... In this case the race only surfaces with golang 1.8.3 (and I presume others), but not with our default 1.9.1 - I believe it's related to the implementation of ioutil.ReadAll.

Data race on 1.8.3:

=================                                                                                                                                                                                                                                   [89/4566]
WARNING: DATA RACE
Write at 0x00c4201d7fa0 by goroutine 43:
  os.(*file).close()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file_unix.go:143 +0x10a
  os.(*File).Close()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file_unix.go:132 +0x55
  os/exec.(*Cmd).closeDescriptors()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/exec/exec.go:263 +0x67
  os/exec.(*Cmd).Wait()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/exec/exec.go:448 +0x2b8
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:170 +0x777
  github.com/DataDog/datadog-agent/pkg/collector/py._cgoexpwrap_53af0d9533cd_GetSubprocessOutput()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_obj_test/_cgo_gotypes.go:969 +0x7f
  runtime.call32()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:514 +0x47
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call.func1()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0xef
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0x8c
  github.com/DataDog/datadog-agent/pkg/collector/py.TestSubprocessBindings()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:114 +0x4fa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107

Previous read at 0x00c4201d7fa0 by goroutine 44:
  os.(*File).Read()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file.go:98 +0x70
  bytes.(*Buffer).ReadFrom()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/bytes/buffer.go:179 +0x1bc
  io/ioutil.readAll()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/io/ioutil/ioutil.go:33 +0x1d1
  io/ioutil.ReadAll()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/io/ioutil/ioutil.go:42 +0x52
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput.func3()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:149 +0x94

Goroutine 43 (running) created at:
  testing.(*T).Run()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:697 +0x543
  testing.runTests.func1()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:882 +0xaa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107
  testing.runTests()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:888 +0x4e0
  testing.(*M).Run()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:822 +0x1c3
  github.com/DataDog/datadog-agent/pkg/collector/py.TestMain()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:37 +0x1ab
  main.main()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_testmain.go:106 +0x20f

Goroutine 44 (finished) created at:
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:150 +0x502
  github.com/DataDog/datadog-agent/pkg/collector/py._cgoexpwrap_53af0d9533cd_GetSubprocessOutput()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_obj_test/_cgo_gotypes.go:969 +0x7f
  runtime.call32()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:514 +0x47
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call.func1()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0xef
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0x8c
  github.com/DataDog/datadog-agent/pkg/collector/py.TestSubprocessBindings()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:114 +0x4fa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107
==================
==================
WARNING: DATA RACE
Write at 0x00c42034a000 by goroutine 43:
  os.(*file).close()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file_unix.go:143 +0x10a
  os.(*File).Close()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file_unix.go:132 +0x55
  os/exec.(*Cmd).closeDescriptors()
  /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/exec/exec.go:263 +0x67
  os/exec.(*Cmd).Wait()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/exec/exec.go:448 +0x2b8
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:170 +0x777
  github.com/DataDog/datadog-agent/pkg/collector/py._cgoexpwrap_53af0d9533cd_GetSubprocessOutput()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_obj_test/_cgo_gotypes.go:969 +0x7f
  runtime.call32()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:514 +0x47
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call.func1()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0xef
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0x8c
  github.com/DataDog/datadog-agent/pkg/collector/py.TestSubprocessBindings()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:114 +0x4fa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107

Previous read at 0x00c42034a000 by goroutine 45:
  os.(*File).Read()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/os/file.go:98 +0x70
  bytes.(*Buffer).ReadFrom()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/bytes/buffer.go:179 +0x1bc
  io/ioutil.readAll()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/io/ioutil/ioutil.go:33 +0x1d1
  io/ioutil.ReadAll()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/io/ioutil/ioutil.go:42 +0x52
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput.func6()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:164 +0x94

Goroutine 43 (running) created at:
  testing.(*T).Run()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:697 +0x543
  testing.runTests.func1()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:882 +0xaa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107
  testing.runTests()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:888 +0x4e0
  testing.(*M).Run()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:822 +0x1c3
  github.com/DataDog/datadog-agent/pkg/collector/py.TestMain()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:37 +0x1ab
  main.main()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_testmain.go:106 +0x20f

Goroutine 45 (finished) created at:
  github.com/DataDog/datadog-agent/pkg/collector/py.GetSubprocessOutput()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/datadog_agent.go:165 +0x755
  github.com/DataDog/datadog-agent/pkg/collector/py._cgoexpwrap_53af0d9533cd_GetSubprocessOutput()
      github.com/DataDog/datadog-agent/pkg/collector/py/_test/_obj_test/_cgo_gotypes.go:969 +0x7f
  runtime.call32()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:514 +0x47
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call.func1()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0xef
  github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python.(*PyObject).Call()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/vendor/github.com/sbinet/go-python/object.go:312 +0x8c
  github.com/DataDog/datadog-agent/pkg/collector/py.TestSubprocessBindings()
      /home/ubuntu/Development/go/src/github.com/DataDog/datadog-agent/pkg/collector/py/utils_test.go:114 +0x4fa
  testing.tRunner()
      /home/ubuntu/.gimme/versions/go1.8.3.linux.amd64/src/testing/testing.go:657 +0x107
==================
--- FAIL: TestSubprocessBindings (0.00s)

No trace of the problem on 1.9.1

Related material:

@truthbk
Copy link
Member Author

truthbk commented Nov 23, 2017

It looks like there's still a GIL related bug, it could also be a "sticky" thread issue - will address tomorrow.

@truthbk
Copy link
Member Author

truthbk commented Nov 24, 2017

It may be fixed now after locking the go routine to the thread. Haven't experienced the weird crashes for a while now but I will need to verify tomorrow.

@olivielpeau
Copy link
Member

Feel free to merge now to test this a bit with a packaged agent, I went through the code quickly and it looks good.

I'll make comments from a thorough review on this PR, a bit later :)

@truthbk truthbk merged commit fbcb0ab into master Nov 24, 2017
Copy link
Member

@olivielpeau olivielpeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a bunch of comments, but this looks great overall :) Let me know what your thoughts are

One important thing we need to check is the safety of calling UnlockOSThread from GetSubprocessOutput.

Py_RETURN_NONE;
}

if (!PyList_Check(cmd_args)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for subprocess.Popen, cmd_args could also be a sequence (i.e. not strictly a list) or a string. We don't have to handle this here (we can handle the conversion in the python get_subprocess_output), but we should if we want to preserve the api.

if (!PyArg_ParseTuple(args, "O|O:get_subprocess_output", &cmd_args, &cmd_raise_on_empty)) {
PyGILState_Release(gstate);
PyErr_SetString(PyExc_TypeError, "unable to parse arguments");
Py_RETURN_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use return NULL instead of Py_RETURN_NONE when we set an error on the interpreter (otherwise, in python land, the function returns None and doesn't raise an exception).

if (!PyList_Check(cmd_args)) {
PyGILState_Release(gstate);
PyErr_SetString(PyExc_TypeError, "command args not a list");
Py_RETURN_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here

if (cmd_raise_on_empty != NULL && !PyBool_Check(cmd_raise_on_empty)) {
PyGILState_Release(gstate);
PyErr_SetString(PyExc_TypeError, "bad raise on empty argument - should be bool");
Py_RETURN_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

if(!(subprocess_args = (char **)malloc(sizeof(char *)*subprocess_args_sz))) {
PyGILState_Release(gstate);
PyErr_SetString(PyExc_MemoryError, "unable to allocate memory, bailing out");
Py_RETURN_NONE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

cErr := C.CString("unable to import subprocess empty output exception")
C.PyErr_SetString(C.PyExc_Exception, cErr)
C.free(unsafe.Pointer(cErr))
return C._none()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil

cErr := C.CString("get_subprocess_output expected output but had none.")
C.PyErr_SetString((*C.PyObject)(unsafe.Pointer(excClass)), cErr)
C.free(unsafe.Pointer(cErr))
return C._none()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil

C.free(unsafe.Pointer(cModuleName))
if utilModule == nil {
cErr := C.CString("unable to import subprocess empty output exception")
C.PyErr_SetString(C.PyExc_Exception, cErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we should overwrite the original exception

C.free(unsafe.Pointer(cExcName))
if excClass == nil {
cErr := C.CString("unable to import subprocess empty output exception")
C.PyErr_SetString(C.PyExc_Exception, cErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we should overwrite the original exception

@@ -75,6 +75,67 @@ func TestFindSubclassOf(t *testing.T) {
assert.Equal(t, 1, sclass.RichCompareBool(barClass, python.Py_EQ))
}

func TestSubprocessBindings(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also write a test that executes a python check that uses get_subprocess_output? (a bit like what the tests in check_test.go do)

@masci masci deleted the jaime/subprocessgo branch December 1, 2017 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants