Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor - Number of ingesters #1488

Closed
Serrvosky opened this issue Jul 2, 2019 · 2 comments
Closed

Distributor - Number of ingesters #1488

Serrvosky opened this issue Jul 2, 2019 · 2 comments

Comments

@Serrvosky
Copy link

Hello everyone,

Can any one tell me how much ingesters are necessary?

This is my distributor deployment file:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: distributor
    spec:
      containers:
      - name: distributor
        image: quay.io/cortexproject/cortex:master-6d684f65
        imagePullPolicy: IfNotPresent
        args:
        - -target=distributor
        - -log.level=debug
        - -server.http-listen-port=80
        - -consul.hostname=consul.default.svc.cluster.local:8500
        - -distributor.replication-factor=2
        ports:
        - containerPort: 80

and this is my ingesters deployment file:

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  replicas: 5

  # Ingesters are not ready for at least 1 min
  # after creation.  This has to be in sync with
  # the ring timeout value, as this will stop a
  # stampede of new ingesters if we should loose
  # some.
  minReadySeconds: 60

  # Having maxSurge 0 and maxUnavailable 1 means
  # the deployment will update one ingester at a time
  # as it will have to stop one (making one unavailable)
  # before it can start one (surge of zero)
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1

  template:
    metadata:
      labels:
        name: ingester
    spec:
      # Give ingesters 40 minutes grace to flush chunks and exit cleanly.
      # Service is available during this time, as long as we don't stop
      # too many ingesters at once.
      terminationGracePeriodSeconds: 2400

      containers:
      - name: ingester
        image: quay.io/cortexproject/cortex:master-6d684f65
        imagePullPolicy: IfNotPresent
        args:
        - -target=ingester
        - -ingester.join-after=30s
        - -ingester.claim-on-rollout=true
        - -consul.hostname=consul.default.svc.cluster.local:8500
        - -s3.url=s3://abc:123@s3.default.svc.cluster.local:4569
        - -dynamodb.original-table-name=cortex
        - -dynamodb.url=dynamodb://user:pass@dynamodb.default.svc.cluster.local:8000
        - -dynamodb.periodic-table.prefix=cortex_weekly_
        - -dynamodb.periodic-table.from=2019-06-01
        - -dynamodb.daily-buckets-from=2019-06-01
        - -dynamodb.base64-buckets-from=2019-06-01
        - -dynamodb.v4-schema-from=2019-06-01
        - -dynamodb.v5-schema-from=2019-06-01
        - -dynamodb.v6-schema-from=2019-06-01
        - -dynamodb.chunk-table.from=2019-06-01
        - -memcached.hostname=memcached.default.svc.cluster.local
        - -memcached.timeout=100ms
        - -memcached.service=memcached
        ports:
        - containerPort: 80
        #readinessProbe:
        #  httpGet:
        #    path: /ready
        #    port: 80
        #  initialDelaySeconds: 15
        #  timeoutSeconds: 1

As you can see, I spin up 5 ingesters first, then I wait some time to everything comes up (pods, registry on consul, health checks, etc), and then I deploy the distributors.

However I'm getting a lot of logs like this:

level=warn ts=2019-07-02T10:07:48.719546323Z caller=logging.go:49 traceID=5909d18c1b0598b0 msg="POST /api/prom/push (500) 415.597µs Response: \"at least 4 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 4990; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.720214262Z caller=logging.go:49 traceID=42cb99b7e4d6403 msg="POST /api/prom/push (500) 404.171µs Response: \"at least 4 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 4306; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.720461758Z caller=logging.go:49 traceID=2ba6ed33a0136724 msg="POST /api/prom/push (500) 1.888748ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5861; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.724230653Z caller=logging.go:49 traceID=5abb13d5575ad94b msg="POST /api/prom/push (500) 620.785µs Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5443; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.808633897Z caller=logging.go:49 traceID=121b69b5928e939a msg="POST /api/prom/push (500) 1.934723ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5956; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.811379216Z caller=logging.go:49 traceID=80038b1e98b8644 msg="POST /api/prom/push (500) 5.421684ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5718; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.819939803Z caller=logging.go:49 traceID=1902ec7d178c424e msg="POST /api/prom/push (500) 521.814µs Response: \"at least 5 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5924; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.824741619Z caller=logging.go:49 traceID=7b7a37191590c41d msg="POST /api/prom/push (500) 358.946µs Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5833; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
@bboreham
Copy link
Contributor

Best way to troubleshoot this is to look at the status page using a browser (http request) to /ring on one of your distributors. This should give Cortex's internal view of what is active, unhealthy, etc.
If it has some outdated information press 'forget' on that line.

@bboreham
Copy link
Contributor

bboreham commented Sep 2, 2019

I think the basic question about sizing is now answered in docs/running.md

@bboreham bboreham closed this as completed Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants