Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add publisher confirms to ingress #543

Closed
wants to merge 3 commits into from

Conversation

ikavgo
Copy link
Contributor

@ikavgo ikavgo commented Dec 8, 2021

Changes

/kind

Fixes #334

Release Note


Docs


@knative-prow-robot
Copy link

@ikvmw: The label(s) kind/<kind> cannot be applied, because the repository doesn't have them.

In response to this:

Changes

/kind

Fixes #334

Release Note


Docs


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2021
@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ikvmw
To complete the pull request process, please assign nak3 after the PR has been reviewed.
You can assign the PR to them by writing /assign @nak3 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 8, 2021
@codecov
Copy link

codecov bot commented Dec 8, 2021

Codecov Report

Merging #543 (4d67b20) into main (23a3926) will not change coverage.
The diff coverage is n/a.

❗ Current head 4d67b20 differs from pull request most recent head a9ef4ed. Consider uploading reports for the commit a9ef4ed to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main     #543   +/-   ##
=======================================
  Coverage   75.78%   75.78%           
=======================================
  Files          47       47           
  Lines        2783     2783           
=======================================
  Hits         2109     2109           
  Misses        549      549           
  Partials      125      125           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23a3926...a9ef4ed. Read the comment docs.

@knative-prow-robot
Copy link

@ikvmw: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 8, 2021
Copy link
Contributor

@benmoss benmoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to step back and improve the way the amqp library handles confirms rather than working around it like this. It's pretty hard to reason about this solution with all the levels of concurrency that we're working with here.

I am thinking something like if channel.Publish returned a promise-type object that we would be able to push a lot of this complexity into the library and hide it from users.

@@ -1,27 +1,29 @@
/*
Copyright 2020 The Knative Authors
Copyright 2020 The Knative Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these whitespace changes

channels []*Channel
logger *zap.SugaredLogger

rc uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable names could be a little more clear

Comment on lines +167 to +169
crc := atomic.AddUint64(&env.rc, 1)

channel := env.channels[crc%uint64(env.cc)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a Go channel and have all these Channels select on it rather than trying to do this load balancing logic

}

func (channel *Channel) confirmsHandler() func() error {
return func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to make this an anonymous function, just have this be the signature of confirmsHandler


if !present {
// really die?
log.Fatalf("Received confirm for unknown send. DeliveryTag: %d", confirm.DeliveryTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would Warn but not Fatal here, this shouldn't be possible unless there's a bug in RabbitMQ

Copy link
Contributor

@embano1 embano1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I found the card hard to review for correctness due to the concurrency and non-transactional locking at play. I also don't see how this code is more performant than the proposed sync.Pool take/return Channel proposed by @bmoss.

Can you please explain why this complexity is justified? Perhaps I'm not understanding the optimizations at play.

@@ -36,34 +38,44 @@ import (
const (
defaultMaxIdleConnections = 1000
defaultMaxIdleConnectionsPerHost = 1000
defaultMaxAmqpChannels = 10
)

type envConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we create a separate ingress server object and use envConfig only for what it actually does? It's a bit confusing in the code and method signatures

}

func main() {
var env envConfig

var err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If envConfig is separated from the server implementation, this line is not needed anymore (and addresses err shadowing in line 62)

if err != nil {
log.Fatalf("failed to connect to RabbitMQ: %s", err)
}
defer conn.Close()
defer env.conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd at least log the error here or use the run pattern where main only does initialization logic, e.g. env vars, and then calls a blocking run(context.Context) error on the ingress server. defer can then be handled inside run and any error easily returned to main if needed.


env.openAmqpChannels()

defer env.closeAmqpChannels()

env.logger = logging.FromContext(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use https://pkg.go.dev/knative.dev/pkg/signals to handle SIGTERM etc

if err := envconfig.Process("", &env); err != nil {
log.Fatal("Failed to process env var", zap.Error(err))
}

conn, err := amqp.Dial(env.BrokerURL)
env.conn, err = amqp.Dial(env.BrokerURL)

if err != nil {
log.Fatalf("failed to connect to RabbitMQ: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use the pkg logger (move its construction up so you can use it) and thus allow for structured logging (see also other places where we still use log.)

log.Fatalf("failed to open a channel: %s", err)
}

channel := &Channel{channel: a_channel, counter: 1, pconfirms_chan: make(chan amqp.Confirmation)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use unbuffered channel for confirmation. docs for confirmation channel in NotifyPublish state it should be buffered to accomodate for multiple in-flights, otherwise risking deadlocks. Could we run into this scenario with an unbuffered channel?

for confirm := range channel.pconfirms_chan {
channel.handleConfirm(confirm)
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a log debug line could not harm here to understand that the chan was closed, e.g. due to underlying RMQ channel closed

}

func (env *envConfig) sendMessage(headers amqp.Table, bytes []byte) (chan amqp.Confirmation, error) {
crc := atomic.AddUint64(&env.rc, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this means the first message will bump counter to 1 and hence env.channels[0] would never be used? if so, you might change your slice indexing in line 175 accordingly

Body: bytes,
}); err != nil {

ch, err := env.sendMessage(headers, bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: confirmCh

}

func (channel *Channel) publish(exchange string, headers amqp.Table, bytes []byte) (chan amqp.Confirmation, error) {
channel.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically means that every message published will lock the particular channel, i.e. serialize the logic and constrain parallelism to the number of defined channels, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure Broker Ingress uses Publisher-Confirm.
4 participants