-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECI-466] Metric Forwarding In Go #40
base: master
Are you sure you want to change the base?
Conversation
COPY . . | ||
|
||
# Enable Go modules & build with optimizations | ||
RUN go mod tidy && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support X86 as well?
I can't remember if OCI in Japan supports only ARM or only X86?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Japan only supports x86 but I think we said with Kanishk's stack improvements we won't need to worry about it later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well... we don't need to worry about it if we build for X86 and have multi arch images.
So I would say here we do need to build it as both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the image to build for both platforms. Also updated the PR description to have instructions on how to build such an image, and the testing I have done to ensure the same image works on function application of both architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the multi-arch image build is managed outside the docker file using buildx
so that will be a separate step
metrics-forwarder/forwarder.go
Outdated
} | ||
|
||
func sendMetricsToDatadog(metricsMessage []byte, client HTTPClient) error { | ||
endpoint := os.Getenv("DD_INTAKE_HOST") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this was the way in the Python version as well, so we can go with this for now...
Should we consider the DD API key a secret?
From our perspective maybe we don't care (i.e. we charge more if someone sends more stuff to us) but from the customer's standpoint?
Anyway not a blocker for this PR by any means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DD API Key should be considered a secret, but I don't fully follow what changes are you recommending based on that. Are you suggesting that this should not be a configuration for function application, but sourced from somewhere else (OCI vault etc.) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I am saying, but of course it is not for this PR and maybe not even something we can/should do at all.
I am not recommending any changes today, but I think we should think about it as a team for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I definitely agree that this may require some discussion, as well as some experimentation. In my opinion, it is certainly a good idea to add API-KEY and APP-KEY to the customer's vault during initial stack execution, and replicate it in all regions at that time. We can then use it everywhere we need this.
I think this will be really nice for longer term. We do have to think about how to handle key rotations on expiration.
metrics-forwarder/forwarder.go
Outdated
if shouldCompressPayload() { | ||
compressedPayload, err := compressPayload(metricsMessage) | ||
if err != nil { | ||
log.Printf("Error compressing payload: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not a blocker for this version, but ultimately I think we should report this kind of error to us via the EvP endpoint, so that we can monitor when errors occur in the fleet and fix them proactively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes complete sense, but I think that should be a separate PR with focus on "relaying information from forwarders to Datadog", so that we can think about holistically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm just had some nitpicks
|
||
func CreateDatadogClient(site, apiKey string) DatadogClient { | ||
configuration := datadog.NewConfiguration() | ||
configuration.RetryConfiguration.EnableRetry = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to set max Retries and the retry timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default setting for these values are as follows (Link):
RetryConfiguration: RetryConfiguration{
EnableRetry: false,
BackOffMultiplier: 2,
BackOffBase: 2,
HTTPRetryTimeout: 60 * time.Second,
MaxRetries: 3,
}
If we want to change them, we can override, but they looked fine to me as is.
return err | ||
} | ||
if resp.StatusCode < 200 || resp.StatusCode >= 300 { | ||
log.Printf("Error: Received non-200 response from Datadog: %d", resp.StatusCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use log
or fmt
here considering we're using fmt
to print the logs in other places. Same in handler.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I experimented with both. Here is how these different log packages are treated in OCI.
- Log package : These are written to STDERR by default. You will see
"src" : "STDERR"
in the log json. - fmt package : These are written to STDOUT by default. You will see
"src" : "STDOUT"
in the log json.
If a log is an error, I used the log package, else I used the fmt package.
"metrics-forwarder/internal/formatter" | ||
) | ||
|
||
var sendMetricsFunc = client.SendMetricsToDatadog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can call datadog.NewAPIClient
since this client is what is used for connection and as var
can be re-used if the container is not destroyed.
Internally we can also decide the size of connection pool with this which is again created once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully understand this comment. I am happy to chat more about this.
if status == "success" { | ||
fdk.WriteStatus(out, 200) | ||
} else { | ||
fdk.WriteStatus(out, 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
500 status may not be accurate for scenarios like:
- Invalid API key when getting a 403 response from the endpoint
- missing API key, site or tenancy OCID (more closer to a
400
error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am considering the response from function invocations as a pass or fail only. The status code from Datadog itself will be printed in OCI logs.
This particular status code becomes relevant if the function itself is placed behind an API Gateway, and then invoked through an API call. Now, for the client of the function itself, if an internal API call is failing, should we mask that or relay that information as is, could be debated.
Let me know if you want to have a discussion about this.
// Returns: | ||
// - error: An error if the request preparation or API call fails, or if the | ||
// response status code indicates a failure. | ||
func SendMetricsToDatadog(client DatadogClient, metricsMessage []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be a method of DatadogClient
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I fully understand the intent behind making this a function on struct, instead of passing the struct. This way, it helps me test the send function by creating a mock client.
What
-Write Metrics Forwarder in Go
Steps to build multi-architecture image
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t iad.ocir.io/iduhc9hzgn3o/metrics-forwarder-function/metrics-forwarder:0.0.3 . --no-cache --push
Why
Testing