monitor: reduce overhead #2037

aalemayhu · 2017-11-14T16:44:45Z

Reduce the number of allocations from the monitor.

rlenglet

This is a good idea, thanks!

rlenglet · 2017-11-14T19:10:23Z

monitor/monitor.go

@@ -77,14 +77,20 @@ func (m *Monitor) Run(npages int) {
 				break
 			}
 		}
-		if todo > 0 {
+		if todo > 0 && numListeners() > 0 {


I would revert this change. Even if you have no listeners you must keep reading from the perf buffer, otherwise you'll start losing events and incorrectly report a high lost event count when the first reader connects.
Plus, it's always interesting to collect and report the "lost event" metric even when there are no readers.

However, you could optimize the no-listener case by not decoding the monitor events, just read and discard them.

You want to disconnect from the perf ring buffer to stop triggering event generation in the data path. We don't want to pay the datapath cost as a penalty if no-one is listening.

rlenglet · 2017-11-15T07:21:42Z

api/v1/openapi.yaml

@@ -1058,6 +1058,9 @@ definitions:
      unknown:
        description: Number of unknown samples.
        type: integer
+      numlisteners:


How about renaming into nlisteners for consistency with npages above, or just listeners for consistency with cpus, lost, and unknown above?

nlisteners sounds good.

rlenglet · 2017-11-15T07:41:15Z

monitor/monitor.go

-				log.WithError(err).Warn("Error received while reading from perf buffer")
+
+		if numListeners() > 0 {
+			if todo > 0 {


epoll is used in its default edge-triggered mode here. Therefore, if you don't call monitorEvents.ReadAll every time monitorEvents.Poll returns todo > 0, the next call to monitorEvents.Poll will not block, because there's still data to be read from the file descriptor. Therefore, this loop will spin and consume 100% of one CPU, and will also result in the perf ring buffer filling up.

In short, this if todo > 0 test CAN'T be conditioned on anything, e.g. numListeners() > 0.
You must revert this change.

You should either:

Move your optimization into m.receiveEvent, where you could just ignore read events (don't decode them) if there are no listeners, as I previously suggested.

Or close the file descriptor until numListeners() > 0, as @tgraf suggested.

Please wait till I request a new review. Thanks!

aanm · 2017-11-16T14:04:33Z

monitor/monitor.go

+	eventConfig.NumPages = npages
+
+main_event_loop:
+	for ok := <-waitForListeners; !ok; {


What is the purpose for this channel?

To wait for listeners which are added in the handle connection. Am I using it wrong?

There are a few patterns you might use that would be more canonical. The most useful would be that you can make a buffered channel (make(chan bool, 1000)) and then the writes you have to it below wouldn't need the goroutine wrappers. Since you're using main_event_loop as a normal label, instead of one for the loop, you can replace the for ok := <-waitForListeners; !ok; with a simple for <-waitForListeners but then only write to the channel when you would have written a true. You currently always write true in handleConnection so that doesn't change, and you would have to use an if statement in the send case.

Looks cleaner. The only other change I did was drop the loop because it seemed to lock forever. Please take one more pass in case I misunderstood you.

Or maybe wait looks like I missed something.

Sort of. My goal was to avoid the busy loop on the waitForListeners channel. The thing is, we're mixing concerns a bit, in that we want the loop to pause until there are listeners but the only way we have to do that is by blocking on the channel every iteration, thus, we also need to bypass this channel read :/

Would it be simpler to coordinate this a different way?

Spawn a goroutine that runs the logic in main_event_loop when we handleConnection and we go from numListeners==0 to numListeners==1

In this main_event_loop goroutine, exit it completely when there are no more listeners
after purgeInactiveListeners (or just check at the end/start of the loop).

We can also defer closing sockets and things in this goroutine, so we know the cleanups happen at the right time.

This version would not have this waitForListeners channel at all.

My reasoning is that it is simple to not run the perf reading parts at all if there are no listeners. I think this was the motivation for these changes right?

If you want to stick with what you have now, then it might be easier to split the gotos and labels:

add a new label waiting_for_listers. jump to it when we have no more listeners (so, where we currently jump to main_event_loop)

main_event_loop is moved to after the waitForListeners check. We jump to it in the case where there are listeners still, and we need to poll again and keep reading.

You are right. Will do it as purposed, just need to test it a little bit locally and will push it out. Thanks, Ray.

joestringer · 2017-11-18T18:17:04Z

cilium/cmd/monitor.go

@@ -287,7 +287,9 @@ start:
 			}
 		}

-		if pl.Type == payload.EventSample {
+		if meta.Size == 0 {


When there are no monitors, is that what produces an API response with size 0 here?

The intention of writing an empty meta is so that we can detect a write failure when any of the known connections are gone / inactive. If we are not aware of any connections we never write anything and these lines are not reached. The connections are added to a list of listeners, but the Go API has no good way for us to know when a connection is inactive. So we need to try to write in order to detect a failure. Does this answer your question?

The explanation makes sense, can you add it as a comment to the code? It's not obvious otherwise.

aanm · 2017-11-20T14:20:40Z

monitor/monitor.go

-			log.WithError(err).Error("Error in Poll")
-			if err == syscall.EBADF {
-				break
+		select {


t := time.NewTicker(5 * time.Second) for range t.C { if monitorEvents != nil { m.dumpState() } }

tgraf · 2017-11-20T14:09:52Z

pkg/bpf/perf.go

@@ -295,21 +297,19 @@ func (e *PerfEvent) Disable() error {
 		return nil
 	}

+	C.free(e.state)


Is e.state. guaranteed to never be NULL?

No. We are using malloc so there is a chance it can return NULL. What is the proper way to handle this. Do you want to do a null check first?

I guess we'd have to check on line 293 and return an error there. To be safe, we could make this no-op on e.state being nil too

Will leave this as is unless @tgraf really wants it to change.

I'm taking my comment back, calling free() on NULL is perfectly fine so we can keep this.

tgraf · 2017-11-20T14:15:28Z

cilium/cmd/monitor.go

@@ -287,7 +287,9 @@ start:
 			}
 		}

-		if pl.Type == payload.EventSample {
+		if meta.Size == 0 {


The explanation makes sense, can you add it as a comment to the code? It's not obvious otherwise.

tgraf · 2017-11-20T14:15:48Z

cilium/cmd/status.go

-		if nm := sr.NodeMonitor; nm != nil {
-			fmt.Fprintf(w, "NodeMonitor:\tListening for events on %d CPUs with %dx%d of shared memory\n",
-				nm.Cpus, nm.Npages, nm.Pagesize)
+		/* If  there are no listeners, let it appear as if the node


Convert this to // comments

tgraf · 2017-11-20T14:17:07Z

monitor/monitor.go

 // dumpStat prints out the monitor status in JSON.
 func (m *Monitor) dumpStat() {
 	c := int64(monitorEvents.Cpus)
 	n := int64(monitorEvents.Npages)
 	p := int64(monitorEvents.Pagesize)
 	l, u := monitorEvents.Stats()
-	ms := models.MonitorStatus{Cpus: c, Npages: n, Pagesize: p, Lost: int64(l), Unknown: int64(u)}
+	num := int64(numListeners())
+	ms := models.MonitorStatus{Cpus: c, Npages: n, Pagesize: p, Lost: int64(l), Unknown: int64(u), Nlisteners: num}


Expand this to multiple lines, it will become easier to read and you don't need to store num first.

tgraf · 2017-11-20T14:17:13Z

monitor/monitor.go

 		listeners.PushBack(conn)
 		log.WithField("count.listener", listeners.Len()).Info("New monitor connected.")
+		/* Since this is our first listener that means the event loop


Convert to //

tgraf · 2017-11-20T14:28:00Z

monitor/monitor.go

+}
+
+func purgeInactiveListeners() {
+	meta := &payload.Meta{Size: uint32(0)}


Move this into a function:

func generateHealthCheckMsg() { meta := &payload.Meta{Size: uint32(0)} metaBuf, err := meta.MarshalBinary() if err != nil { log.WithError(err).Fatal("meta encode") } return metaBuf }

tgraf · 2017-11-20T14:36:25Z

monitor/monitor.go

+	meta := &payload.Meta{Size: uint32(0)}
+	metaBuf, err := meta.MarshalBinary()
+	if err != nil {
+		log.WithError(err).Fatal("meta encode")


Can we safely continue here?

We should continue will change it to a info.

raybejjani

Mostly small things that you can ignore, with one logic order thing that might be worth revisiting.

raybejjani · 2017-11-21T10:44:39Z

cilium/cmd/monitor.go

@@ -287,7 +287,9 @@ start:
 			}
 		}

-		if pl.Type == payload.EventSample {
+		if meta.Size == 0 {


This if-else chain can be a switch statement, if you want. That would allow you to handle the last pl.Type == payload.RecordLost case explicitly and then have default report an error or whatever.

switch { case meta.Size == 0: continue case pl.Type == payload.EventSample: receiveEvent(pl.Data, pl.CPU) case pl.Type == payload.RecordLost: lostEvent(pl.Lost, pl.CPU) default: log.WithError(err).Fatal("unsupported type in meta object") }

Looks better

raybejjani · 2017-11-21T10:56:23Z

monitor/monitor.go

-	c.NumPages = npages
+	eventConfig.NumPages = npages
+	t := time.NewTicker(5 * time.Second)
+	for range t.C {


I personally prefer to create the Ticker as you have here but if you wanted a more compact form you can do:

for range time.Tick(5 * time.Second)

The assumption here is that this function is never intended to return (which it can't, since we loop infinitely and have no returns or breaks) and so the Ticker providing the channel returned by Tick does not need to Close called on it (this would normally cause it to leak (https://golang.org/pkg/time/#Tick)
If you keep the explicit t := thing then could you also add a defer t.Close(). It isn't necessary but if someone ever refactors this code and it returns, we won't leak the timer.

Will add a comment but otherwise fine as is.

raybejjani · 2017-11-21T11:03:28Z

monitor/monitor.go

+		/* Since this is our first listener that means the event loop
+		 * is not running and has to be started.
+		 */
+		if first {


This could be `if first := listeners.Len() == 0; first {``` since we don't use first otherwise

This does not do what you think it does. We only want to start the event loop if this is really our first listener. If you move the initialization further down after the list is being modified the meaning of first is not the same anymore.

Oh! I see, yes. We also can't start the main loop since it would then exit because there are no listeners. You are totally right!

raybejjani · 2017-11-21T11:15:13Z

monitor/monitor.go

+	monitorEvents = me
+
+	for {
+		purgeInactiveListeners()


Should we do the numListeners() == 0 check (the else on line 151 below) here? this then means we can return before the Poll if there are no listeners, and also means that if we do the Poll call we always read data in the todo > 0 case (I think this was rlenglet's concern earlier).
The general structure of the function would then look like:

bpf.NewCpuEvents() for { purgeInactiveListeners() if numListeners == 0 { CloseAll() return } Poll() if todo > 0 { ReadAll() } }

An alternative would be to defer monitorEventsCloseAll() and the error handling for it and if numListeners == 0 simply returns from the function/breaks the loop to allow the function to exit.

Will change it to the suggested but deferring the closing of the perf event side might lead to an invalid memory address or nil pointer dereference so we can't do that.

raybejjani · 2017-11-21T11:16:01Z

monitor/monitor.go

+		log.WithError(err).Error("Error while starting monitor")
+		return
+	}
+	monitorEvents = me


A nit, but you could assign monitorEvents directly on line 128 (I suspect this is a holdover from a complicated if statement that didn't create monitorEvents in the outer scope).

raybejjani · 2017-11-21T11:18:21Z

monitor/payload/monitor_payload.go

@@ -105,6 +105,11 @@ func ReadMetaPayload(r io.Reader, meta *Meta, pl *Payload) error {
 		return err
 	}

+	// If there is no payload just abort
+	if meta.Size == 0 {
+		return nil


The implied behaviour here is that payload's contents are undefined in the meta.Size==0 case. We can zero out payload or add a comment about this in the function comment (line 102)

raybejjani · 2017-11-21T11:19:51Z

pkg/bpf/perf.go

@@ -295,21 +297,19 @@ func (e *PerfEvent) Disable() error {
 		return nil
 	}

+	C.free(e.state)


I guess we'd have to check on line 293 and return an error there. To be safe, we could make this no-op on e.state being nil too

raybejjani · 2017-11-21T13:36:14Z

monitor/monitor.go

 		}
 	}
 }

+func generateHealthCheckMsg() []byte {


Oh, it just occurred to me that since we just keep sending this same empty meta, and only in purgeInactiveListeners, we can store the []byte and just keep reusing that. It's not a big deal, though, this isn't a big memory use or anything.

Dismissing review, since there's been changes since last time

The `TotalAlloc` average was `824380337.5` and after this is `685498361.9`. The numbers were retrieved by sending signal to the node monitor every 10 seconds while running the connection tracking tests (tests/01-ct.sh). What is `TotalAlloc`? From the Go docs[0] // TotalAlloc is cumulative bytes allocated for heap objects. // // TotalAlloc increases as heap objects are allocated, but // unlike Alloc and HeapAlloc, it does not decrease when // objects are freed. TotalAlloc uint64 [0]: https://golang.org/pkg/runtime/#MemStats Related: #1926 (monitor: Reduce overhead) Suggested-by: Ray Bejjani <ray@covalent.io> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

Before the total allocation was `778696759` now it's `20105978.88`. These numbers are after reducing state allocations. Related: #1926 (monitor: Reduce overhead) Suggested-by: Ray Bejjani <ray@covalent.io> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

This will help debugging the monitor and also useful to see other info messages. Closes: #1931 (agent: runtime option to show monitor logs in the agent logs) Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

This should reduce the overhead on small instances that need all the CPU and RAM available when not actively monitoring. Reading the perf events is now dynamically managed. When a new monitor connection is there we enable and stop as soon there are no more connections. We now routinely check for inactive connections and remove them. Before the write fail would only occur if we got a event while trying to write the meta or payload. The readers should check the meta size and ignore if size is 0. This way we can update the listener count earlier but the delay for the polling timeout is still there. Also update the output when listening and in status. Related: #1926 (monitor: Reduce overhead) Closes: #1927 (monitor: improve handling of write failure) Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

As the previous commit mentions, this does not occur on main, v1.16, and v1.15 due to cilium#29036. However, in v1.14 and v1.13, we need to take special care because the NameManager and SelectorCache lock can be taken while the Endpoint lock is held during Endpoint deletion. Here are the relevant stacktraces regarding the deadlock: ``` 1: sync.Mutex.Lock [75 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*uint32(0x5), false, 43690) sync mutex.go:171 (*Mutex).lockSlow(*Mutex(cilium#1733)) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*RWMutex(0xb0)) fqdn name_manager.go:70 (*NameManager).Lock(0xffffffffffffffff) policy selectorcache.go:964 (*SelectorCache).RemoveSelectors(cilium#1088, {cilium#28569, 0xb, 1}, {cilium#643, cilium#32582}) policy l4.go:810 (*L4Filter).removeSelectors(cilium#32582, cilium#29992) policy l4.go:817 (*L4Filter).detach(cilium#719, cilium#29993) policy l4.go:988 L4PolicyMap.Detach(...) policy l4.go:1179 (*L4Policy).Detach(cilium#20318, cilium#1383) policy resolve.go:103 (*selectorPolicy).Detach(...) policy distillery.go:81 (*PolicyCache).delete(cilium#1354, cilium#19354) policy distillery.go:138 (*PolicyCache).LocalEndpointIdentityRemoved(cilium#523, cilium#1163) identitymanager manager.go:167 (*IdentityManager).remove(cilium#706, cilium#19354) identitymanager manager.go:147 (*IdentityManager).Remove(cilium#706, cilium#19354) identitymanager manager.go:52 Remove(...) endpoint endpoint.go:1146 (*Endpoint).leaveLocked(cilium#1883, cilium#12221, {0x30, 0}) endpoint endpoint.go:2192 (*Endpoint).Delete(cilium#1883, {0x80, 0xaa}) endpointmanager manager.go:380 (*EndpointManager).removeEndpoint(cilium#1161, 0, {0xff, 0xff}) endpointmanager manager.go:394 (*EndpointManager).RemoveEndpoint(...) cmd endpoint.go:684 (*Daemon).deleteEndpointQuiet(...) cmd endpoint.go:666 (*Daemon).deleteEndpoint(cilium#1155, cilium#1883) cmd endpoint.go:713 (*Daemon).DeleteEndpoint(cilium#1155, {cilium#27399, cilium#8108}) cmd endpoint.go:770 (*deleteEndpointID).Handle(cilium#21193, {cilium#2452, {cilium#27399, 0x4d}}) endpoint delete_endpoint_id.go:66 (*DeleteEndpointID).ServeHTTP(cilium#5934, {cilium#666, cilium#2242}, cilium#2452) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, cilium#2242}, cilium#2452) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#130), func{cilium#2242, 0x3}) middleware router.go:78 NewRouter.func1({cilium#666, cilium#2242}, cilium#2451) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#718), func{cilium#2242, #59}) middleware redoc.go:72 Redoc.func1({cilium#666, cilium#2242}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#4920), func{cilium#2242, #45}) middleware spec.go:46 Spec.func1({cilium#666, cilium#2242}, cilium#4921) http server.go:2136 HandlerFunc.ServeHTTP(ReadCloser(cilium#10532), func{cilium#2242, cilium#23015}) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, cilium#10533}, cilium#2451) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(cilium#722, {cilium#668, cilium#10533}, cilium#4922) http server.go:2938 serverHandler.ServeHTTP(*Server(cilium#8105), cilium#668, cilium#10533, 0x6) http server.go:2009 (*conn).serve(*conn(cilium#16005), Context{cilium#673, cilium#1554}) 8: sync.Mutex.Lock [74 minutes] [Created by http.(*Server).Serve in goroutine 1699 @ server.go:3086] sync sema.go:77 runtime_SemacquireMutex(*, 0x47, cilium#1154) sync mutex.go:171 (*Mutex).lockSlow(cilium#706) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(*) identitymanager manager.go:99 (*IdentityManager).RemoveOldAddNew(cilium#706, 0, cilium#1154) identitymanager manager.go:123 RemoveOldAddNew(...) endpoint policy.go:852 (*Endpoint).SetIdentity(*, cilium#1154, 0) endpoint endpoint.go:1932 (*Endpoint).identityLabelsChanged(*, {cilium#674, *}, 1) endpoint endpoint.go:1780 (*Endpoint).runIdentityResolver(*, {cilium#674, *}, 1, 1) endpoint endpoint.go:1720 (*Endpoint).UpdateLabels(*, {cilium#674, *}, *, *, 8) cmd endpoint.go:477 (*Daemon).createEndpoint(cilium#1155, {cilium#673, *}, {cilium#683, cilium#1155}, *) cmd endpoint.go:542 (*putEndpointID).Handle(cilium#21192, {*, *, {*, 0xe}}) endpoint put_endpoint_id.go:58 (*PutEndpointID).ServeHTTP(cilium#3961, {cilium#666, *}, *) middleware operation.go:28 (*Context).RoutesHandler.NewOperationExecutor.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware router.go:78 NewRouter.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) middleware redoc.go:72 Redoc.func1({cilium#666, *}, cilium#1251) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, #45) middleware spec.go:46 Spec.func1({cilium#666, *}, *) http server.go:2136 HandlerFunc.ServeHTTP(*, {cilium#666, *}, *) metrics middleware.go:64 (*APIEventTSHelper).ServeHTTP(cilium#1459, {cilium#668, *}, *) api apipanic.go:42 (*APIPanicHandler).ServeHTTP(#49, {cilium#668, *}, *) http server.go:2938 serverHandler.ServeHTTP({cilium#653}, {cilium#668, *}, 6) http server.go:2009 (*conn).serve(*, {cilium#673, cilium#1554}) 5: sync.Mutex.Lock [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] sync sema.go:77 runtime_SemacquireMutex(cilium#142, 0xe8, *) sync mutex.go:171 (*Mutex).lockSlow(cilium#1733) sync mutex.go:90 (*Mutex).Lock(...) sync rwmutex.go:147 (*RWMutex).Lock(0x68) fqdn name_manager.go:70 (*NameManager).Lock(*) policy selectorcache.go:798 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, *}, {{*, 0x4d}, {0, 0}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(*, {*, 4, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, *}, {*, 1, 1}, 0, {cilium#660, *}, {{*, 4}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, *}, #44, {*, 0xa, 0}, *, {cilium#660, *}, {{*, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, *}) api l4.go:284 PortRules.Iterate({*, 1, cilium#546}, *) policy rule.go:624 mergeEgress({cilium#680, *}, *, {*, 1, 1}, 0, {cilium#661, *}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(*, {cilium#680, *}, *, *, *, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({*, *, *}, {cilium#680, *}, *) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, *) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, *) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(*) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(*, *, *) endpoint bpf.go:589 (*Endpoint).regenerateBPF(*, *) endpoint policy.go:457 (*Endpoint).regenerate(*, *) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(*, *) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*, *) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(*) 1: select [75 minutes] [Created by eventqueue.(*EventQueue).Run in goroutine 1482 @ eventqueue.go:229] semaphore semaphore.go:60 (*Weighted).Acquire(cilium#1092, {cilium#671, cilium#722}, cilium#766) lock semaphored_mutex.go:30 (*SemaphoredMutex).Lock(...) ipcache ipcache.go:140 (*IPCache).Lock(...) ipcache cidr.go:56 (*IPCache).AllocateCIDRs(cilium#1316, {0, 0, cilium#2037}, {0, 0, 0}, 0) ipcache cidr.go:103 (*IPCache).AllocateCIDRsForIPs(0, {0, cilium#697, 0}, 0xffffffffffffffff) cmd identity.go:114 cachingIdentityAllocator.AllocateCIDRsForIPs(...) policy selectorcache.go:509 (*SelectorCache).allocateIdentityMappings(cilium#1088, {{0, 0}, {cilium#5036, 0x2b}}, #45) policy selectorcache.go:843 (*SelectorCache).AddFQDNSelector(cilium#1088, {cilium#643, cilium#27811}, {{0, 0}, {cilium#5036, 0x2b}}) policy l4.go:628 (*L4Filter).cacheFQDNSelector(...) policy l4.go:623 (*L4Filter).cacheFQDNSelectors(cilium#27811, {#42936, 0x287, cilium#193}, cilium#536) policy l4.go:725 createL4Filter({cilium#680, cilium#22826}, {cilium#22828, 1, 1}, 0, {cilium#660, cilium#24510}, {{cilium#4487, 3}, ...}, ...) policy l4.go:879 createL4EgressFilter(...) policy rule.go:717 mergeEgressPortProto({cilium#680, cilium#22826}, #44, {cilium#22828, 0xa, #78536}, #79633, {cilium#660, cilium#24510}, {{cilium#4487, ...}, ...}, ...) policy rule.go:672 mergeEgress.func1({cilium#660, cilium#24510}) api l4.go:284 PortRules.Iterate({cilium#24510, 1, cilium#546}, cilium#11741) policy rule.go:624 mergeEgress({cilium#680, cilium#22826}, cilium#18687, {cilium#22828, 1, 1}, 0, {cilium#661, cilium#5624}, {cilium#662, ...}, ...) policy rule.go:753 (*rule).resolveEgressPolicy(cilium#24575, {cilium#680, cilium#22826}, cilium#18687, cilium#29345, cilium#4782, {0, 0, 0}, {0, ...}) policy rules.go:103 ruleSlice.resolveL4EgressPolicy({cilium#10690, 0xb, 5}, {cilium#680, cilium#22826}, cilium#18687) policy repository.go:718 (*Repository).resolvePolicyLocked(cilium#1089, cilium#18461) policy distillery.go:119 (*PolicyCache).updateSelectorPolicy(cilium#1354, cilium#18461) policy distillery.go:153 (*PolicyCache).UpdatePolicy(...) endpoint policy.go:262 (*Endpoint).regeneratePolicy(cilium#1748) endpoint bpf.go:744 (*Endpoint).runPreCompilationSteps(cilium#1748, cilium#27542, cilium#4781) endpoint bpf.go:589 (*Endpoint).regenerateBPF(cilium#1748, cilium#27542) endpoint policy.go:457 (*Endpoint).regenerate(cilium#1748, cilium#27542) endpoint events.go:53 (*EndpointRegenerationEvent).Handle(cilium#18609, cilium#703) eventqueue eventqueue.go:245 (*EventQueue).run.func1() sync once.go:74 (*Once).doSlow(*Once(#45414), func(cilium#69)) sync once.go:65 (*Once).Do(...) eventqueue eventqueue.go:233 (*EventQueue).run(cilium#6023) ``` Generated from pp tool: https://github.com/maruel/panicparse Signed-off-by: Chris Tarazi <chris@isovalent.com>

aalemayhu requested a review from a team November 14, 2017 16:44

aalemayhu requested review from a team as code owners November 14, 2017 16:44

rlenglet suggested changes Nov 14, 2017

View reviewed changes

aalemayhu requested a review from a team as a code owner November 15, 2017 06:29

rlenglet previously requested changes Nov 15, 2017

View reviewed changes

aalemayhu added the wip label Nov 15, 2017

aanm reviewed Nov 16, 2017

View reviewed changes

joestringer reviewed Nov 18, 2017

View reviewed changes

aalemayhu removed the wip label Nov 20, 2017

aalemayhu requested review from raybejjani, aanm, tgraf, amreshakim and joestringer November 20, 2017 13:39

aanm reviewed Nov 20, 2017

View reviewed changes

tgraf requested changes Nov 20, 2017

View reviewed changes

raybejjani suggested changes Nov 21, 2017

View reviewed changes

raybejjani approved these changes Nov 21, 2017

View reviewed changes

aalemayhu requested review from rlenglet and tgraf November 22, 2017 12:49

aalemayhu added 4 commits November 22, 2017 17:03

monitor: set stderr to the agent's stderr

3225cc5

This will help debugging the monitor and also useful to see other info messages. Closes: #1931 (agent: runtime option to show monitor logs in the agent logs) Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

tgraf approved these changes Nov 22, 2017

View reviewed changes

tgraf merged commit 1a33ccc into cilium:master Nov 22, 2017

aalemayhu deleted the reduce-overhead branch November 22, 2017 16:17

tgraf added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Dec 4, 2017

monitor: reduce overhead #2037

monitor: reduce overhead #2037

Conversation

aalemayhu commented Nov 14, 2017

rlenglet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlenglet Nov 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aalemayhu Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raybejjani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlenglet Nov 15, 2017 •

edited

Loading

aalemayhu Nov 21, 2017 •

edited

Loading