Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

191: Update existing memory triggers to be able to take a list of mem… #192

Merged
merged 6 commits into from
Jun 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Prerequisites
- clang/llvm v10+
- gcc v10+
- zlib
- zlib (Debian: zlib1g-dev, Rocky: zlib-devel)
- make

## Build
Expand Down
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,33 +22,33 @@ Please see build instructions [here](BUILD.md).
**BREAKING CHANGE** With the release of ProcDump 1.3 the switches are now aligned with the Windows ProcDump version.
```
procdump [-n Count]
[-s Seconds]
[-c|-cl CPU_Usage]
[-m|-ml Commit_Usage]
[-tc Thread_Threshold]
[-fc FileDescriptor_Threshold]
[-sig Signal_Number]
[-e]
[-f Include_Filter,...]
[-pf Polling_Frequency]
[-o]
[-log]
{
[-s Seconds]
[-c|-cl CPU_Usage]
[-m|-ml Commit_Usage1[,Commit_Usage2,...]]
[-tc Thread_Threshold]
[-fc FileDescriptor_Threshold]
[-sig Signal_Number]
[-e]
[-f Include_Filter,...]
[-pf Polling_Frequency]
[-o]
[-log]
{
{{[-w] Process_Name | [-pgid] PID} [Dump_File | Dump_Folder]}
}
}

Options:
-n Number of dumps to write before exiting.
-s Consecutive seconds before dump is written (default is 10).
-c CPU threshold above which to create a dump of the process.
-cl CPU threshold below which to create a dump of the process.
-m Memory commit threshold in MB at which to create a dump.
-ml Trigger when memory commit drops below specified MB value.
-m Memory commit thresholds (MB) above which to create dumps.
-ml Memory commit thresholds (MB) below which to create dumps.
-tc Thread count threshold above which to create a dump of the process.
-fc File descriptor count threshold above which to create a dump of the process.
-sig Signal number to intercept to create a dump of the process.
-e [.NET] Create dump when the process encounters an exception.
-f [.NET] Filter (include) on the (comma seperated) exception name(s) and exception message(s).
-f [.NET] Filter (include) on the (comma seperated) exception name(s) and exception message(s). Supports wildcards.
-pf Polling frequency.
-o Overwrite existing dump file.
-log Writes extended ProcDump tracing to syslog.
Expand Down Expand Up @@ -86,6 +86,10 @@ The following will create a core dump when CPU usage is >= 65% or memory usage i
```
sudo procdump -c 65 -m 100 1234
```
The following will create a core dump when memory usage is >= 100 MB followed by another dump when memory usage is >= 200MB.
MarioHewardt marked this conversation as resolved.
Show resolved Hide resolved
```
sudo procdump -m 100,200 1234
```
The following will create a core dump in the `/tmp` directory immediately.
```
sudo procdump 1234 /tmp
Expand Down
1 change: 1 addition & 0 deletions include/GenHelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ static inline void cancel_pthread(unsigned long* val)
#define auto_free_file __attribute__ ((__cleanup__(cleanup_file)))
#define auto_cancel_thread __attribute__ ((__cleanup__(cancel_pthread)))

int* GetSeparatedValues(char* src, char* separator, int* numValues);
bool ConvertToInt(const char* src, int* conv);
bool IsValidNumberArg(const char *arg);
bool CheckKernelVersion();
Expand Down
5 changes: 3 additions & 2 deletions include/ProcDumpConfiguration.h
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,9 @@ struct ProcDumpConfiguration {
// Options
int CpuThreshold; // -c
bool bCpuTriggerBelowValue; // -cl
int MemoryThreshold; // -m
bool bMemoryTriggerBelowValue; // -m
int* MemoryThreshold; // -m
int MemoryCurrentThreshold;
bool bMemoryTriggerBelowValue; // -m or -ml
int ThresholdSeconds; // -s
bool bTimerThreshold; // -s
int NumberOfDumpsToCollect; // -n
Expand Down
6 changes: 3 additions & 3 deletions procdump.1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ procdump \- generate coredumps based off performance triggers.
procdump [-n Count]
[-s Seconds]
[-c|-cl CPU_Usage]
[-m|-ml Commit_Usage]
[-m|-ml Commit_Usage1[,Commit_Usage2,...]]
[-tc Thread_Threshold]
[-fc FileDescriptor_Threshold]
[-sig Signal_Number]
Expand All @@ -24,8 +24,8 @@ Options:
-s Consecutive seconds before dump is written (default is 10).
-c CPU threshold above which to create a dump of the process.
-cl CPU threshold below which to create a dump of the process.
-m Memory commit threshold in MB at which to create a dump.
-ml Trigger when memory commit drops below specified MB value.
-m Memory commit thresholds (MB) above which to create dumps.
-ml Memory commit thresholds (MB) below which to create dumps.
-tc Thread count threshold above which to create a dump of the process.
-fc File descriptor count threshold above which to create a dump of the process.
-sig Signal number to intercept to create a dump of the process.
Expand Down
55 changes: 34 additions & 21 deletions src/CoreDumpWriter.c
Original file line number Diff line number Diff line change
Expand Up @@ -175,14 +175,16 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
gcorePrefixName = GetCoreDumpName(self->Config->ProcessId, name, self->Config->CoreDumpPath, self->Config->CoreDumpName, self->Type);

// assemble the command
if(snprintf(command, BUFFER_LENGTH, "gcore -o %s %d 2>&1", gcorePrefixName, pid) < 0){
if(snprintf(command, BUFFER_LENGTH, "gcore -o %s %d 2>&1", gcorePrefixName, pid) < 0)
{
Log(error, INTERNAL_ERROR);
Trace("WriteCoreDumpInternal: failed sprintf gcore command");
exit(-1);
}

// assemble filename
if(snprintf(coreDumpFileName, PATH_MAX, "%s.%d", gcorePrefixName, pid) < 0){
if(snprintf(coreDumpFileName, PATH_MAX, "%s.%d", gcorePrefixName, pid) < 0)
{
Log(error, INTERNAL_ERROR);
Trace("WriteCoreDumpInternal: failed sprintf core file name");
exit(-1);
Expand All @@ -196,7 +198,8 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
}

// check if we're allowed to write into the target directory
if(access(self->Config->CoreDumpPath, W_OK) < 0) {
if(access(self->Config->CoreDumpPath, W_OK) < 0)
{
Log(error, INTERNAL_ERROR);
Trace("WriteCoreDumpInternal: no write permission to core dump target file %s",
coreDumpFileName);
Expand All @@ -216,17 +219,14 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
Log(info, "Core dump %d generated: %s", self->Config->NumberOfDumpsCollected, coreDumpFileName);

self->Config->NumberOfDumpsCollected++; // safe to increment in crit section
if (self->Config->NumberOfDumpsCollected >= self->Config->NumberOfDumpsToCollect) {
SetEvent(&self->Config->evtQuit.event); // shut it down, we're done here
rc = 1;
}
}
}
else
{
// allocate output buffer
outputBuffer = (char**)malloc(sizeof(char*) * MAX_LINES);
if(outputBuffer == NULL){
if(outputBuffer == NULL)
{
Log(error, INTERNAL_ERROR);
Trace("WriteCoreDumpInternal: failed gcore output buffer allocation");
exit(-1);
Expand All @@ -237,22 +237,26 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
commandPipe = popen2(command, "r", &gcorePid);
self->Config->gcorePid = gcorePid;

if(commandPipe == NULL){
if(commandPipe == NULL)
{
Log(error, "An error occurred while generating the core dump");
Trace("WriteCoreDumpInternal: Failed to open pipe to gcore");
exit(1);
}

// read all output from gcore command
for(i = 0; i < MAX_LINES && fgets(lineBuffer, sizeof(lineBuffer), commandPipe) != NULL; i++) {
for(i = 0; i < MAX_LINES && fgets(lineBuffer, sizeof(lineBuffer), commandPipe) != NULL; i++)
{
lineLength = strlen(lineBuffer) + 1; // get # of characters read

outputBuffer[i] = (char*)malloc(sizeof(char) * lineLength);
if(outputBuffer[i] != NULL) {
if(outputBuffer[i] != NULL)
{
strcpy(outputBuffer[i], lineBuffer);
outputBuffer[i][lineLength-1] = '\0'; // append null character
}
else {
else
{
Log(error, INTERNAL_ERROR);
Trace("WriteCoreDumpInternal: failed to allocate gcore error message buffer");
exit(-1);
Expand All @@ -271,7 +275,8 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
bool gcoreFailedMsg = false; // in case error sneaks through the message output

// check if gcore was able to generate the dump
if(gcoreStatus != 0 || pcloseStatus != 0 || (gcoreFailedMsg = (strstr(outputBuffer[i-1], "gcore: failed") != NULL))){
if(gcoreStatus != 0 || pcloseStatus != 0 || (gcoreFailedMsg = (strstr(outputBuffer[i-1], "gcore: failed") != NULL)))
{
Log(error, "An error occurred while generating the core dump:");
if (gcoreStatus != 0)
Log(error, "\tDump exit status = %d", gcoreStatus);
Expand All @@ -281,8 +286,10 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
Log(error, "\tgcore failed");

// log gcore message
for(int j = 0; j < i; j++){
if(outputBuffer[j] != NULL){
for(int j = 0; j < i; j++)
{
if(outputBuffer[j] != NULL)
{
Log(error, "GCORE - %s", outputBuffer[j]);
}
}
Expand All @@ -295,29 +302,35 @@ int WriteCoreDumpInternal(struct CoreDumpWriter *self, char* socketName)
sleep(1);

// validate that core dump file was generated
if(access(coreDumpFileName, F_OK) != -1) {
if(self->Config->nQuit){
if(access(coreDumpFileName, F_OK) != -1)
{
if(self->Config->nQuit)
{
// if we are in a quit state from interrupt delete partially generated core dump file
int ret = unlink(coreDumpFileName);
if (ret < 0 && errno != ENOENT) {
if (ret < 0 && errno != ENOENT)
{
Trace("WriteCoreDumpInternal: Failed to remove partial core dump");
exit(-1);
}
}
else{
else
{
// log out sucessful core dump generated
Log(info, "Core dump %d generated: %s", self->Config->NumberOfDumpsCollected, coreDumpFileName);

self->Config->NumberOfDumpsCollected++; // safe to increment in crit section
if (self->Config->NumberOfDumpsCollected >= self->Config->NumberOfDumpsToCollect) {
if (self->Config->NumberOfDumpsCollected >= self->Config->NumberOfDumpsToCollect)
{
SetEvent(&self->Config->evtQuit.event); // shut it down, we're done here
rc = 1;
}
}
}
}

for(int j = 0; j < i; j++) {
for(int j = 0; j < i; j++)
{
free(outputBuffer[j]);
}
free(outputBuffer);
Expand Down
68 changes: 68 additions & 0 deletions src/GenHelpers.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,74 @@
//--------------------------------------------------------------------
#include "Includes.h"

//--------------------------------------------------------------------
//
// GetSeparatedValues -
// Returns a list of values separated by the specified separator.
//
//--------------------------------------------------------------------
int* GetSeparatedValues(char* src, char* separator, int* numValues)
{
int* ret = NULL;
int i = 0;

if(src == NULL || numValues == NULL)
{
return NULL;
}

char* dup = strdup(src); // Duplicate to avoid changing the original using strtok
if(dup == NULL)
{
return NULL;
}

char* token = strtok((char*)dup, separator);
while (token != NULL)
{
i++;
token = strtok(NULL, separator);
}

free(dup);

if(i > 0)
{
ret = malloc(i*sizeof(int));
if(ret)
{
i = 0;
dup = strdup(src);
if(dup == NULL)
{
free(ret);
ret = NULL;
return NULL;
}

token = strtok((char*)dup, separator);
while (token != NULL)
{
if(!ConvertToInt(token, &ret[i]))
{
free(ret);
ret = NULL;
return NULL;
}

i++;
token = strtok(NULL, separator);
}

free(dup);
}
}

*numValues = i;
return ret;
}


//--------------------------------------------------------------------
//
// ConvertToInt - Helper to convert from a char* to int
Expand Down
20 changes: 13 additions & 7 deletions src/Monitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,7 @@ int CreateMonitorThreads(struct ProcDumpConfiguration *self)
}
}

if (self->MemoryThreshold != -1 && !tooManyTriggers)
if (self->MemoryThreshold != NULL && !tooManyTriggers)
{
if (self->nThreads < MAX_TRIGGERS)
{
Expand Down Expand Up @@ -817,25 +817,29 @@ int SetQuit(struct ProcDumpConfiguration *self, int quit)
bool ContinueMonitoring(struct ProcDumpConfiguration *self)
{
// Have we reached the dump limit?
if (self->NumberOfDumpsCollected >= self->NumberOfDumpsToCollect) {
if (self->NumberOfDumpsCollected >= self->NumberOfDumpsToCollect)
{
return false;
}

// Do we already know the process is terminated?
if (self->bTerminated) {
if (self->bTerminated)
{
return false;
}

// check if any process are running with PGID
if(self->bProcessGroup && kill(-1 * self->ProcessGroup, 0)) {
if(self->bProcessGroup && kill(-1 * self->ProcessGroup, 0))
{
self->bTerminated = true;
return false;
}

// Let's check to make sure the process is still alive then
// note: kill([pid], 0) doesn't send a signal but does perform error checking
// therefore, if it returns 0, the process is still alive, -1 means it errored out
if (self->ProcessId != NO_PID && kill(self->ProcessId, 0)) {
if (self->ProcessId != NO_PID && kill(self->ProcessId, 0))
{
self->bTerminated = true;
Log(warn, "Target process %d is no longer alive", self->ProcessId);
return false;
Expand Down Expand Up @@ -889,8 +893,8 @@ void *CommitMonitoringThread(void *thread_args /* struct ProcDumpConfiguration*
memUsage += (proc.nswap * pageSize_kb) >> 10; // get Swap size

// Commit Trigger
if ((config->bMemoryTriggerBelowValue && (memUsage < config->MemoryThreshold)) ||
(!config->bMemoryTriggerBelowValue && (memUsage >= config->MemoryThreshold)))
if ((config->bMemoryTriggerBelowValue && (memUsage < config->MemoryThreshold[config->MemoryCurrentThreshold])) ||
(!config->bMemoryTriggerBelowValue && (memUsage >= config->MemoryThreshold[config->MemoryCurrentThreshold])))
{
Log(info, "Trigger: Commit usage:%ldMB on process ID: %d", memUsage, config->ProcessId);
rc = WriteCoreDump(writer);
Expand All @@ -899,6 +903,8 @@ void *CommitMonitoringThread(void *thread_args /* struct ProcDumpConfiguration*
SetQuit(config, 1);
}

config->MemoryCurrentThreshold++;

if ((rc = WaitForQuit(config, config->ThresholdSeconds * 1000)) != WAIT_TIMEOUT)
{
break;
Expand Down
Loading