Distributor - Number of ingesters #1488

Serrvosky · 2019-07-02T10:10:06Z

Hello everyone,

Can any one tell me how much ingesters are necessary?

This is my distributor deployment file:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: distributor
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: distributor
    spec:
      containers:
      - name: distributor
        image: quay.io/cortexproject/cortex:master-6d684f65
        imagePullPolicy: IfNotPresent
        args:
        - -target=distributor
        - -log.level=debug
        - -server.http-listen-port=80
        - -consul.hostname=consul.default.svc.cluster.local:8500
        - -distributor.replication-factor=2
        ports:
        - containerPort: 80

and this is my ingesters deployment file:

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ingester
spec:
  replicas: 5

  # Ingesters are not ready for at least 1 min
  # after creation.  This has to be in sync with
  # the ring timeout value, as this will stop a
  # stampede of new ingesters if we should loose
  # some.
  minReadySeconds: 60

  # Having maxSurge 0 and maxUnavailable 1 means
  # the deployment will update one ingester at a time
  # as it will have to stop one (making one unavailable)
  # before it can start one (surge of zero)
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1

  template:
    metadata:
      labels:
        name: ingester
    spec:
      # Give ingesters 40 minutes grace to flush chunks and exit cleanly.
      # Service is available during this time, as long as we don't stop
      # too many ingesters at once.
      terminationGracePeriodSeconds: 2400

      containers:
      - name: ingester
        image: quay.io/cortexproject/cortex:master-6d684f65
        imagePullPolicy: IfNotPresent
        args:
        - -target=ingester
        - -ingester.join-after=30s
        - -ingester.claim-on-rollout=true
        - -consul.hostname=consul.default.svc.cluster.local:8500
        - -s3.url=s3://abc:123@s3.default.svc.cluster.local:4569
        - -dynamodb.original-table-name=cortex
        - -dynamodb.url=dynamodb://user:pass@dynamodb.default.svc.cluster.local:8000
        - -dynamodb.periodic-table.prefix=cortex_weekly_
        - -dynamodb.periodic-table.from=2019-06-01
        - -dynamodb.daily-buckets-from=2019-06-01
        - -dynamodb.base64-buckets-from=2019-06-01
        - -dynamodb.v4-schema-from=2019-06-01
        - -dynamodb.v5-schema-from=2019-06-01
        - -dynamodb.v6-schema-from=2019-06-01
        - -dynamodb.chunk-table.from=2019-06-01
        - -memcached.hostname=memcached.default.svc.cluster.local
        - -memcached.timeout=100ms
        - -memcached.service=memcached
        ports:
        - containerPort: 80
        #readinessProbe:
        #  httpGet:
        #    path: /ready
        #    port: 80
        #  initialDelaySeconds: 15
        #  timeoutSeconds: 1

As you can see, I spin up 5 ingesters first, then I wait some time to everything comes up (pods, registry on consul, health checks, etc), and then I deploy the distributors.

However I'm getting a lot of logs like this:

level=warn ts=2019-07-02T10:07:48.719546323Z caller=logging.go:49 traceID=5909d18c1b0598b0 msg="POST /api/prom/push (500) 415.597µs Response: \"at least 4 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 4990; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.720214262Z caller=logging.go:49 traceID=42cb99b7e4d6403 msg="POST /api/prom/push (500) 404.171µs Response: \"at least 4 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 4306; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.720461758Z caller=logging.go:49 traceID=2ba6ed33a0136724 msg="POST /api/prom/push (500) 1.888748ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5861; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.724230653Z caller=logging.go:49 traceID=5abb13d5575ad94b msg="POST /api/prom/push (500) 620.785µs Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5443; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.808633897Z caller=logging.go:49 traceID=121b69b5928e939a msg="POST /api/prom/push (500) 1.934723ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5956; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.811379216Z caller=logging.go:49 traceID=80038b1e98b8644 msg="POST /api/prom/push (500) 5.421684ms Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5718; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.819939803Z caller=logging.go:49 traceID=1902ec7d178c424e msg="POST /api/prom/push (500) 521.814µs Response: \"at least 5 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5924; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "
level=warn ts=2019-07-02T10:07:48.824741619Z caller=logging.go:49 traceID=7b7a37191590c41d msg="POST /api/prom/push (500) 358.946µs Response: \"at least 3 live ingesters required, could only find 2\\n\" ws: false; Connection: close; Content-Encoding: snappy; Content-Length: 5833; Content-Type: application/x-protobuf; User-Agent: Go-http-client/1.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: 0; "

The text was updated successfully, but these errors were encountered:

bboreham · 2019-07-19T09:35:51Z

Best way to troubleshoot this is to look at the status page using a browser (http request) to /ring on one of your distributors. This should give Cortex's internal view of what is active, unhealthy, etc.
If it has some outdated information press 'forget' on that line.

bboreham · 2019-09-02T16:01:23Z

I think the basic question about sizing is now answered in docs/running.md

Serrvosky mentioned this issue Jul 2, 2019

Single node down when joining and leaving will cause total outage #1290

Open

bboreham closed this as completed Sep 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributor - Number of ingesters #1488

Distributor - Number of ingesters #1488

Serrvosky commented Jul 2, 2019

bboreham commented Jul 19, 2019

bboreham commented Sep 2, 2019

Distributor - Number of ingesters #1488

Distributor - Number of ingesters #1488

Comments

Serrvosky commented Jul 2, 2019

bboreham commented Jul 19, 2019

bboreham commented Sep 2, 2019