wrong users generation causing clickhouse crash #1332

salimidruide · 2024-02-05T11:31:41Z

Hello there,

After the upgrade to clickhouse-operator 0.23.0 , we noticed that the ClickHouseInstallation object has some erroneous injected users configuration:
users: /networks/ip: - 10.2.2.100 /profile: clickhouse_operator default/networks/host_regexp: >- (chi-clickhouse-[^.]+\d+-\d+|clickhouse\-clickhouse)\.analytics\.svc\.cluster\.local$ default/networks/ip: - '::1' - 127.0.0.1 - 10.2.14.136 default/profile: default

It seems that the operator is injecting "empty"/networks.. "empty"/profile.
Clickhouse-server is trying then to parse the generated configuration and crashing with the error:

<Error> Application: Code: 36. DB::Exception: Either 'password' or 'password_sha256_hex' or 'password_double_sha1_hex' or 'no_password' or 'ldap' or 'kerberos or 'ssl_certificates' or 'ssh_keys' or 'http_authentication' must be specified for user networks.: while parsing user 'networks' in users configuration file: while loading configuration file '/etc/clickhouse-server/users.xml'. (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

I am using the lasted clickhouse-server version: clickhouse/clickhouse-server:23.11.5-alpine (self-hosted on Kubernetes).

is there a way to fix it? a turn around ?
Thank you in advance

The text was updated successfully, but these errors were encountered:

salimidruide · 2024-02-05T12:38:05Z

For more information:

My operator config:

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: operatorgroup
  namespace: mynamespace
spec:
  targetNamespaces:
  - mynamespace
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: clickhouse
  namespace: mynamespace
spec:
  channel: latest
  name: clickhouse
  source: operatorhubio-catalog
  sourceNamespace: olm
  installPlanApproval: Automatic

---

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseOperatorConfiguration"
metadata:
  name: "chop-config-01"
  namespace: "mynamespace"
spec:
  ################################################
  ##
  ## Watch Section
  ##
  ################################################
  watch:
    # List of namespaces where clickhouse-operator watches for events.
    # Concurrently running operators should watch on different namespaces
    #namespaces: ["dev", "test"]
    namespaces: []

  clickhouse:
    configuration:
      ################################################
      ##
      ## Configuration Files Section
      ##
      ################################################
      file:
        path:
          # Path to the folder where ClickHouse configuration files common for all instances within a CHI are located.
          common: config.d
          # Path to the folder where ClickHouse configuration files unique for each instance (host) within a CHI are located.
          host: conf.d
          # Path to the folder where ClickHouse configuration files with users settings are located.
          # Files are common for all instances within a CHI.
          user: users.d
      ################################################
      ##
      ## Configuration Users Section
      ##
      ################################################
      user:
        default:
          # Default values for ClickHouse user configuration
          # 1. user/profile - string
          # 2. user/quota - string
          # 3. user/networks/ip - multiple strings
          # 4. user/password - string
          profile: default
          quota: default
          networksIP:
            - "::1"
            - "127.0.0.1"
          password: "default"
      ################################################
      ##
      ## Configuration Network Section
      ##
      ################################################
      network:
        # Default host_regexp to limit network connectivity from outside
        hostRegexpTemplate: "(chi-{chi}-[^.]+\\d+-\\d+|clickhouse\\-{chi})\\.{namespace}\\.svc\\.cluster\\.local$"
    ################################################
    ##
    ## Access to ClickHouse instances
    ##
    ################################################
    access:
      # ClickHouse credentials (username, password and port) to be used by operator to connect to ClickHouse instances
      # for:
      # 1. Metrics requests
      # 2. Schema maintenance
      # 3. DROP DNS CACHE
      # User with such credentials can be specified in additional ClickHouse .xml config files,
      # located in `chUsersConfigsPath` folder
      username: "clickhouse_operator"
      password: "clickhouse_operator_password"
      secret:
        # Location of k8s Secret with username and password to be used by operator to connect to ClickHouse instances
        # Can be used instead of explicitly specified username and password
        namespace: ""
        name: ""
      # Port where to connect to ClickHouse instances to
      port: 8123

  ################################################
  ##
  ## Templates Section
  ##
  ################################################
  template:
    chi:
      # Path to the folder where ClickHouseInstallation .yaml manifests are located.
      # Manifests are applied in sorted alpha-numeric order.
      path: templates.d

  ################################################
  ##
  ## Reconcile Section
  ##
  ################################################
  reconcile:
    runtime:
      # Max number of concurrent CHI reconciles in progress
      reconcileCHIsThreadsNumber: 10
      # Max number of concurrent shard reconciles in progress
      reconcileShardsThreadsNumber: 1
      # The maximum percentage of cluster shards that may be reconciled in parallel
      reconcileShardsMaxConcurrencyPercent: 50

    statefulSet:
      create:
        # What to do in case created StatefulSet is not in Ready after `statefulSetUpdateTimeout` seconds
        # Possible options:
        # 1. abort - do nothing, just break the process and wait for admin
        # 2. delete - delete newly created problematic StatefulSet
        # 3. ignore - ignore error, pretend nothing happened and move on to the next StatefulSet
        onFailure: ignore

      update:
        # How many seconds to wait for created/updated StatefulSet to be Ready
        timeout: 300
        # How many seconds to wait between checks for created/updated StatefulSet status
        pollInterval: 5
        # What to do in case updated StatefulSet is not in Ready after `statefulSetUpdateTimeout` seconds
        # Possible options:
        # 1. abort - do nothing, just break the process and wait for admin
        # 2. rollback - delete Pod and rollback StatefulSet to previous Generation.
        # Pod would be recreated by StatefulSet based on rollback-ed configuration
        # 3. ignore - ignore error, pretend nothing happened and move on to the next StatefulSet
        onFailure: rollback

    host:
      wait:
        exclude: "true"
        include: "false"

  ################################################
  ##
  ## Annotations management
  ##
  ################################################
  annotation:
    # Applied when:
    #  1. Propagating annotations from the CHI's `metadata.annotations` to child objects' `metadata.annotations`,
    #  2. Propagating annotations from the CHI Template's `metadata.annotations` to CHI's `metadata.annotations`,
    # Include annotations from the following list:
    # Applied only when not empty. Empty list means "include all, no selection"
    include: []
    # Exclude annotations from the following list:
    exclude: []

  ################################################
  ##
  ## Labels management
  ##
  ################################################
  label:
    # Applied when:
    #  1. Propagating labels from the CHI's `metadata.labels` to child objects' `metadata.labels`,
    #  2. Propagating labels from the CHI Template's `metadata.labels` to CHI's `metadata.labels`,
    # Include labels from the following list:
    # Applied only when not empty. Empty list means "include all, no selection"
    include: []
    # Exclude labels from the following list:
    exclude: []
    # Whether to append *Scope* labels to StatefulSet and Pod.
    # Full list of available *scope* labels check in labeler.go
    #  LabelShardScopeIndex
    #  LabelReplicaScopeIndex
    #  LabelCHIScopeIndex
    #  LabelCHIScopeCycleSize
    #  LabelCHIScopeCycleIndex
    #  LabelCHIScopeCycleOffset
    #  LabelClusterScopeIndex
    #  LabelClusterScopeCycleSize
    #  LabelClusterScopeCycleIndex
    #  LabelClusterScopeCycleOffset
    appendScope: "no"

  ################################################
  ##
  ## StatefulSet management
  ##
  ################################################
  statefulSet:
    revisionHistoryLimit: 0

  ################################################
  ##
  ## Pod management
  ##
  ################################################
  pod:
    # Grace period for Pod termination.
    # How many seconds to wait between sending
    # SIGTERM and SIGKILL during Pod termination process.
    # Increase this number is case of slow shutdown.
    terminationGracePeriod: 30

  ################################################
  ##
  ## Log parameters
  ##
  ################################################
  logger:
    logtostderr: "true"
    alsologtostderr: "false"
    v: "1"
    stderrthreshold: ""
    vmodule: ""
    log_backtrace_at: ""
    ```
    
   My clickhouse configuration
   ```yaml
   ---
## https://github.com/Altinity/clickhouse-operator/blob/master/deploy/operator/clickhouse-operator-install-bundle.yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata: 
  name: clickhouse
  namespace: mynamespace
  annotations: 
    prometheus.io/scrape: "false"
    prometheus.io/port: "8888"  
    ad.datadoghq.com/clickhouse.checks: |
              {
                "openmetrics": { 
                  "init_config": {},
                  "instances": [
                    {
                      "openmetrics_endpoint": "http://%%host%%:8888/metrics",
                      "use_openmetrics": "false"
                    }
                  ]
                }
              }
    ad.datadoghq.com/clickhouse.logs: '[{"source":"clickhouse","service":"clickhouse_cluster"}]'
    argocd.argoproj.io/sync-wave: "3" 
spec:
  configuration:
    settings:
      logger/level: "error"
      # to allow scrape metrics via embedded prometheus protocol
      prometheus/endpoint: /metrics
      prometheus/port: 8888
      prometheus/metrics: false
      prometheus/events: false
      prometheus/asynchronous_metrics: false
    users:
      superset/networks/ip: "::/0"
      superset/password: apassword
      superset/profile: default
    clusters:
      - name: clickhouse
        layout:
          replicasCount: 1
          shardsCount: 1
        templates:
          volumeClaimTemplate: storage-clickhouse
          podTemplate: pod-template 
          serviceTemplate: svc-template
    files:
      config.d/01-clickhouse-02-logger.xml: |
        <yandex>
            <logger>
                <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
                <level>error</level>
                <log>/var/log/clickhouse-server/clickhouse-server.log</log>
                <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
                <size>1000M</size>
                <count>10</count>
                <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
                <console>1</console>
            </logger>
        </yandex>
  templates:
    serviceTemplates:
    - name: svc-template
      spec:
        type: ClusterIP        
    podTemplates:
      - name: pod-template
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:23.11.5-alpine
              resources:
                limits:
                  cpu: "1"
                  memory: 7Gi
                requests:
                  cpu: "1"
                  memory: 7Gi      

    volumeClaimTemplates:
      - name: storage-clickhouse
        spec:
          storageClassName: "csi-cinder-classic"
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Gi

The erroneous generated config

sunsingerus · 2024-02-07T14:09:41Z

Found the reason, work in progress. Please, keep an eye on operatorhub releases, will publish 0.23.1 soon

sunsingerus · 2024-02-07T14:16:40Z

@salimidruide thanks for detailed description of the case, this really helped to reproduce and to catch the issue

alex-zaitsev · 2024-02-12T15:49:02Z

Fixed in https://github.com/Altinity/clickhouse-operator/releases/tag/release-0.23.1

salimidruide mentioned this issue Feb 5, 2024

Clickhouse fails parsing users.xml with 0.23.0 #1324

Closed

alex-zaitsev closed this as completed Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong users generation causing clickhouse crash #1332

wrong users generation causing clickhouse crash #1332

salimidruide commented Feb 5, 2024

salimidruide commented Feb 5, 2024 •

edited

Loading

sunsingerus commented Feb 7, 2024

sunsingerus commented Feb 7, 2024

alex-zaitsev commented Feb 12, 2024

wrong users generation causing clickhouse crash #1332

wrong users generation causing clickhouse crash #1332

Comments

salimidruide commented Feb 5, 2024

salimidruide commented Feb 5, 2024 • edited Loading

sunsingerus commented Feb 7, 2024

sunsingerus commented Feb 7, 2024

alex-zaitsev commented Feb 12, 2024

salimidruide commented Feb 5, 2024 •

edited

Loading