Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCPlog job - entry out of order. #3492

Closed
Whyeasy opened this issue Mar 16, 2021 · 9 comments · Fixed by #3501
Closed

GCPlog job - entry out of order. #3492

Whyeasy opened this issue Mar 16, 2021 · 9 comments · Fixed by #3501

Comments

@Whyeasy
Copy link
Contributor

Whyeasy commented Mar 16, 2021

Describe the bug

I started to use the GCPLog job to ingest GCP logs from PubSub with the release of 2.2. When I enable the sink in GCP and start ingesting, Promtail/Ingester are starting to throw entry out of order warnings and errors. There are logs that get ingested which are showing correctly. If I check the metric promtail_gcplog_parsing_errors_total, it's empty.

I tried several options with different timestamps to see if it can put in to order.

I tried true and false regarding use_incoming_timestamp in the job specification. I also tried to overwrite the timestamp within a pipeline_stage with timestamp and receiveTimestamp without success.

I also added the cluster_name label to the logs to make a better distinguish regarding the logs for Loki. (Don't know if this even helps 😄).

To Reproduce
Steps to reproduce the behavior:

  1. Using Loki and Promtail 2.2 - configured GCPlogs job with the available documentation.
  2. Ingesting logs from one GCP project and only consuming resource="k8s_cluster". This project contains 3 different clusters.
  3. Enable the sink.

Expected behavior
I would expect that there are no entry out of order errors/warnings and all logs are ingested.

Environment:

  • Using GKE(v1.18.12-gke.1210) in GCP.
  • Both Loki and Promtail are deployed via Helm. Running the Loki-distributed version/chart.
@kavirajk
Copy link
Contributor

HI @Whyeasy thanks for trying out gcplog! Love to help :)

May I know how many promtail pods are you running? Are you running it as DaemonSet? Can you tell what are the additional labels you ingest into log stream?

I suspect the problem may be promtail is not ingesting its pod label. (if you are running more than one promtail)

Why do you need pod label?. Because to make log stream unique per promtail instance.

Just to give you some context. Loki doesn't accept out-of-order entries for single log stream. (this is the known issue and will be fixed soon. there is a design doc).
In gcplog, we workaround this issue by rewriting the timestamp to the log processed time by the promtail (we still keep original timestamp for querying though).

So in your case if there is no pod label to distinguish the log stream, there may be the case two promtail put same timestamp as processed time and one may be sent to Loki much after the other, which might have caused loki to reject the later log stream because of out-of-order.

@Whyeasy
Copy link
Contributor Author

Whyeasy commented Mar 16, 2021

Hi @kavirajk,

Thanks for adding the feature 😄. Indeed it's running as a Daemonset and I don't have the pod label added to the stream. Let me try it out.

@kavirajk
Copy link
Contributor

@Whyeasy you can add k8 pod name via relabel_configs. Something like this on your promtail scrapeconfigs!

        relabel_configs:
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_name
            target_label: pod

@Whyeasy
Copy link
Contributor Author

Whyeasy commented Mar 17, 2021

Yes, but to have these labels I need to add the kubernetes_sd_config as well to the job. Is this supported for the gcplog job?

@Whyeasy
Copy link
Contributor Author

Whyeasy commented Mar 17, 2021

@kavirajk No luck. This is my current config:

- job_name: gcplog
  gcplog:
    project_id: <project_id>
    subscription: <subscription_name>
    use_incoming_timestamp: false
    labels:
      job: gcplog
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
    - action: replace
      source_labels: 
        - __cluster_name
      target_label: cluster
      regex: '(.+)-.+-.+'
      replacement: tools-$1
    - action: replace
      source_labels: 
        - __project_id
      target_label: project
      replacement: $1
    - action: replace
      source_labels: 
        - __location
      target_label: region
      replacement: $1
    - action: replace
      source_labels:
        - __meta_kubernetes_pod_name
      target_label: pod

I tried it with and without the kubernetes_sd_configs. With the config, there is no incoming logs from the sink. If I remove the config the pod label is not added and we have the out of order error.

@kavirajk
Copy link
Contributor

@Whyeasy Sorry. my bad! yes you cannot use kubernetes_sd with gcplog. You need to have some unique id as labels in each promtail instance

for example we use.

gcplog:
    project_id: <project_id>
    subscription: <subscription_name>
    use_incoming_timestamp: false
    labels:
      job: gcplog
      promtail_instance: '${POD_NAME}'

and use --config.expand-env as true in the promtail flags. And we set POD_NAME in our kube manifests (we use jsonnet)

Given all that. now we think, its good to have this label set by gcplog itself. rather than doing manually in promtail config(also useful in non-k8 envs)

So I added that in this PR #3501

@Whyeasy
Copy link
Contributor Author

Whyeasy commented Mar 17, 2021

It works without errors 😄 Thanks! Shall we close this issue, or keep it open till the PR is merged?

@kavirajk
Copy link
Contributor

@Whyeasy glad it worked!

did you set the promtail_instance manually? or tried with the UUID fix?

I would prefer to keep the issue open till PR is closed :)

@Whyeasy
Copy link
Contributor Author

Whyeasy commented Mar 17, 2021

@kavirajk I've set the promtail_instance manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants