WIP: logging behavior policies and rate-limited logging #92

syedriko · 2019-12-09T17:45:22Z

A PR to address #84

lgtm-com · 2019-12-09T18:19:32Z

This pull request introduces 1 alert when merging 81c8b75 into fd5ac47 - view on LGTM.com

new alerts:

1 for Local variable hides global variable

portante

Great start, it would be great if the periods were calculated as absolute (monotonically) mathematically rather than relatively. And then being sure that following a write system call (which can block for very long time periods) we evaluate the current period we are in then.

portante · 2019-12-09T20:23:38Z

src/log_rate.c

+		nexit("negative sleep");
+	do {
+		ret = nanosleep(&sleep, &sleep);
+	} while (ret == -1 && errno == EINTR);


Not sure you want a loop that doesn't check the time change. Shouldn't we be checking to see if the absolute time has reached or passed the calculated absolute time for the sleep?

My reading of nanosleep(2) is that it sleeps for relative time intervals and when interrupted by a signal, it will store the remaining time to sleep in the timespec pointed to by the second argument, which we will use on the next pass through the loop as the relative time to sleep.

And I think I'm not doing the right thing for the negative sleep - if it's negative, it means we're in a new period and we should just return without sleeping. We start a new period after every sleep anyway.

portante · 2019-12-09T20:24:46Z

src/log_rate.c

+
+void start_new_period() {
+	bytes_written_this_period = 0;
+	clock_gettime(CLOCK_MONOTONIC, &start_of_this_period);


Isn't this going to cause the periods to drift? Instead, could we just use math to calculate the periods from some initial startup time?

Could you illustrate with a bit of code how you propose to calculate the periods?

Cautionary note: clock_gettime() is relatively expensive in tight inner loops. I'm not saying this code is calling it needlessly, but keep in mind as you fiddle with the timing calculations. A common strategy is to cache the time for re-use and only refresh it after blocking IO calls.

I think this code is pretty close to that idea. It finds out the time the first thing on entering and then the usual sequence is
write()
clock_gettime() // how long to sleep for?
nanosleep() // sleep
clock_gettime() // when does the new period start
Not sure if it's worth/good idea eliminating the last clock_gettime() and just adding the sleep time to the previous timestamp. Seems like a topic for a profiling session once we get to that point.

portante · 2019-12-09T20:29:44Z

src/log_rate.c

+	if (diff_nano < SECS_PER_PERIOD * BILLION) {
+		ssize_t bytes_we_can_write = BYTES_PER_PERIOD - bytes_written_this_period;
+		if (num_read <= bytes_we_can_write) {
+			write_io_bufs(pipe, buf_start, num_read);


Does this assume that after a write we are still in the same period?

It seems that we have to re-evaluate what period we are in following each write_to_logs() call.

giuseppe · 2019-12-09T20:34:13Z

src/log_rate.c

+			bytes_written_this_period += bytes_we_can_write;
+			buf_start += bytes_we_can_write;
+			bytes_remaining = num_read - bytes_we_can_write;
+			sleep_for_the_rest_of_this_period();


I think we shouldn't block the thread execution here.

We should either propagate the condition back (with the remaining time) and set a glibc timer, or we need to use a much shorter period so that it is acceptable, in the worst case, to hang for that duration

This code implements the backpressure policy, I'm still adding the others.
I'm not sure I follow your concern or how backpressure is going to be maintained if we return from here without blocking. If we do, aren't we going to quickly enough be reading from the container's pipe again?

My first reaction to the sleep was also "yeuch" but given that conmon is written as a sync read()/write() loop there's no other way to idle but to sleep.

If conmon was written as a poll/epoll loop then you could set a timer and flip the read fd out of the loop till it fired. That would be neater and more efficient, but a bigger overhaul of conmon.

Added implementations of the drop, ignore and passthrough log policies.

portante

Good start, but seems like we need to be sure the rate limit is applied to what is being written to disk and to journald, since conmon adds metadata to it.

portante · 2019-12-12T17:38:16Z

src/log_rate.c

+		return true;
+	}
+	char* endptr;
+	errno = 0;


Why do we have to set errno to zero here? We don't before nanosleep below.

It's a shortcut, also used in

conmon/src/conmon.c

Line 247 in c8f7443

errno = 0;

. I'll fix it according to strtol(3).

Oh, and with nanosleep() we don't because we only look at errno if nanosleep() returns an error.

portante · 2019-12-12T17:40:22Z

src/log_rate.c

+	case 'G':
+		scale = (size_t)1024 * 1024 * 1024;
+		break;
+	case 'T':


We really want to allow somebody to specify a "tera-byte" level logging rate? Eek.

No need for passthrough default if we just make the default rate-limit essentially infinite.

Ideally, there would be a sysconfig value for the maximum allowable logging rate for a individual container. That way, nobody could cause problems for the SRE.

Is that possible to do?

For the rate limit suffixes, I was going by what pv(1) does. We can call it future-proofing. Or I can get rid of T.

As far as a config file goes, conmon receives all it's config on the command line. I think this is something that needs to be looked at at the podman and cri-o level, which I don't have a good grasp of yet.

portante · 2019-12-12T17:47:21Z

src/log_rate.c

+		}
+		break;
+	case IGNORE:
+		return true;


It would be great if there were metrics collected for how many bytes were ignored.

We can collect the metrics, but what should we do with them?

to be returned up the stack they'd have to be collected and written to a file here, and read by the caller. Alternatively a pipe fd could be sent from the parent similar to how sync_fd is in main. Seems like a reasonable extension but a bit out of scope here

portante · 2019-12-12T22:10:50Z

src/log_rate.c

+	start_new_period();
+}
+
+bool log_rate_write_to_logs(stdpipe_t pipe, char *buf, ssize_t num_read) {


Perhaps it'd be worth a comment here stating that this function's call signature must be kept the same as write_to_logs()?

Also, it is not clear this is the interface we want to rate limit. The SRE needs to be able to control the final behavior of conmon, after it has added all the metadata to the logs. So it would seem we need to add the rate limit to the file writes to disk, or the journald writes.

Yep, that's true. Stay tuned...

Added a check for extra text after the suffix.

syedriko · 2020-05-04T14:59:23Z

The work on making log collection more reliable is ongoing, but this branch as it stands is unlikely to be directly usable.

First stab at rated-limited logging

fb4ef6a

This was referenced Dec 9, 2019

Provide logging behavior policies applied by conmon to stdout/stderr #84

Open

WIP: Added support for log policy and log rate limit in conmon containers/podman#4663

Closed

added log_rate to meson build

81c8b75

fixed a build warning

b4a7d17

portante suggested changes Dec 9, 2019

View reviewed changes

giuseppe reviewed Dec 9, 2019

View reviewed changes

Added the --log-policy and --log-rate-limit CLI options

fec2795

Added implementations of the drop, ignore and passthrough log policies.

syedriko force-pushed the issue_84 branch from 8834f75 to fec2795 Compare December 10, 2019 03:17

syedriko added 2 commits December 10, 2019 12:50

corrected a call to strtol(3)

755d9a2

enforce positive log-rate

18add6e

portante suggested changes Dec 12, 2019

View reviewed changes

syedriko added 2 commits December 12, 2019 23:34

In rate limit parsing, corrected strtol() error handling.

82705c6

Added a check for extra text after the suffix.

Corrected the style of opening brackets in function definitions

56b2f8d

syedriko closed this May 4, 2020

syedriko changed the title ~~WIP: logging behavior policies and rated-limited logging~~ WIP: logging behavior policies and rate-limited logging Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: logging behavior policies and rate-limited logging #92

WIP: logging behavior policies and rate-limited logging #92

syedriko commented Dec 9, 2019

lgtm-com bot commented Dec 9, 2019

portante left a comment

portante Dec 9, 2019

syedriko Dec 9, 2019

portante Dec 9, 2019

syedriko Dec 9, 2019

alanconway Dec 13, 2019

syedriko Dec 13, 2019 •

edited

Loading

portante Dec 9, 2019

giuseppe Dec 9, 2019

syedriko Dec 9, 2019

alanconway Dec 13, 2019

portante left a comment

portante Dec 12, 2019

syedriko Dec 13, 2019

syedriko Dec 13, 2019

portante Dec 12, 2019

portante Dec 12, 2019

syedriko Dec 13, 2019

portante Dec 12, 2019

syedriko Dec 13, 2019

haircommander Dec 13, 2019

portante Dec 12, 2019

portante Dec 12, 2019

syedriko Dec 13, 2019

syedriko commented May 4, 2020

WIP: logging behavior policies and rate-limited logging #92

WIP: logging behavior policies and rate-limited logging #92

Conversation

syedriko commented Dec 9, 2019

lgtm-com bot commented Dec 9, 2019

portante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

syedriko Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

portante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

syedriko commented May 4, 2020

syedriko Dec 13, 2019 •

edited

Loading