Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error logs when using experiments: "Error publishing event to queue" #472

Closed
SoerenSilkjaer opened this issue Dec 9, 2020 · 7 comments
Closed
Assignees

Comments

@SoerenSilkjaer
Copy link

SoerenSilkjaer commented Dec 9, 2020

Since we started using experiments in our izanami cluster, we started getting spammed by the logs attached below.

We are running izanami 1.8.2 in a cluster of two izanami servers with Redis as data- and eventstore. We are running only one experiment at a time, always with 2 variants and we are using the scala client.

I believe this occurs either when we mark variants as displayed or when we mark variants as won. Maybe both.

It comes in waves on a daily basis and sometimes lasts for hours.

In the last month, our izanami cluster got ~10M requests, out of those ~235K requests logged that error. I dont know how many of the ~10M requests were to the mark variant displayed and mark variant won endpoints.

Should I consider all these events as not tracked in our experiements statistics?

[ERROR] from izanami in application-akka.actor.default-dispatcher-14 - Error publishing event to queue
java.lang.IllegalStateException: You have to wait for the previous offer to be resolved to send another request
at akka.stream.impl.QueueSource$$anon$1.bufferElem(QueueSource.scala:108)
at akka.stream.impl.QueueSource$$anon$1.$anonfun$callback$1(QueueSource.scala:121)
at akka.stream.impl.QueueSource$$anon$1.$anonfun$callback$1$adapted(QueueSource.scala:115)
at akka.stream.impl.fusing.GraphInterpreter.runAsyncInput(GraphInterpreter.scala:466)
at akka.stream.impl.fusing.GraphInterpreterShell$AsyncInput.execute(ActorGraphInterpreter.scala:497)
at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:599)
at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:768)
at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:783)
at akka.actor.Actor.aroundReceive(Actor.scala:534)
at akka.actor.Actor.aroundReceive$(Actor.scala:532)
at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:690)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
at akka.actor.ActorCell.invoke(ActorCell.scala:543)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
at akka.dispatch.Mailbox.run(Mailbox.scala:230)
at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
@pierrebruninmaif pierrebruninmaif self-assigned this Dec 11, 2020
@SoerenSilkjaer
Copy link
Author

#286

@pierrebruninmaif
Copy link
Collaborator

pierrebruninmaif commented Dec 15, 2020

The problem is located between the EventStore and the notification (SSE, WebHook):

  • won or displayed are called
  • The server store the events
    • in the DataStore (for metrics)
    • in the EventStore to notify clients (with SSE, WebHook).

Once the event is stored in the EventStore, it is consumed to be sent to client.
To manage the pressure on the client, use the backpressure strategy.
This has a buffer, which can become full if the flow of events stored in the EventStore is large.

To limit the problem, the size of the buffer is configurable in next release (i hope this week).

To summarize :

  • The events are well stored in the DataStore and the EventStore, no loss of data
  • When the buffet is full, the Scala customer is not notified of all events. There is a loss of info
  • The problem may be workaround by increasing the buffer size

The ultimate solution would be to use Redis streams (#483).

@SoerenSilkjaer
Copy link
Author

Thanks @pierrebruninmaif that will be very useful in fixing this issue.

Can you take a quess as to where the bottleneck is that causes the buffer to not empty in time?

@pierrebruninmaif
Copy link
Collaborator

Buffer is full when :

  • a lot of events are stored.
    and / or
  • clients are too slow.

@pierrebruninmaif
Copy link
Collaborator

There is another case, when no one client are subscribed

@SoerenSilkjaer
Copy link
Author

That might be the case for us.

pierrebruninmaif added a commit that referenced this issue Dec 17, 2020
🐛 Add timezone in configuration (#468) and add back-pressure buffer in configuration (#472)
@pierrebruninmaif
Copy link
Collaborator

If you reproduce the issue with 1.8.3, you can increase (progressively) a buffer with envvar REDIS_BACKPRESSURE_BUFFER_SIZE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants