Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong users generation causing clickhouse crash #1332

Closed
salimidruide opened this issue Feb 5, 2024 · 4 comments
Closed

wrong users generation causing clickhouse crash #1332

salimidruide opened this issue Feb 5, 2024 · 4 comments

Comments

@salimidruide
Copy link

Hello there,

After the upgrade to clickhouse-operator 0.23.0 , we noticed that the ClickHouseInstallation object has some erroneous injected users configuration:
users: /networks/ip: - 10.2.2.100 /profile: clickhouse_operator default/networks/host_regexp: >- (chi-clickhouse-[^.]+\d+-\d+|clickhouse\-clickhouse)\.analytics\.svc\.cluster\.local$ default/networks/ip: - '::1' - 127.0.0.1 - 10.2.14.136 default/profile: default

It seems that the operator is injecting "empty"/networks.. "empty"/profile.
Clickhouse-server is trying then to parse the generated configuration and crashing with the error:

<Error> Application: Code: 36. DB::Exception: Either 'password' or 'password_sha256_hex' or 'password_double_sha1_hex' or 'no_password' or 'ldap' or 'kerberos or 'ssl_certificates' or 'ssh_keys' or 'http_authentication' must be specified for user networks.: while parsing user 'networks' in users configuration file: while loading configuration file '/etc/clickhouse-server/users.xml'. (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

I am using the lasted clickhouse-server version: clickhouse/clickhouse-server:23.11.5-alpine (self-hosted on Kubernetes).

is there a way to fix it? a turn around ?
Thank you in advance

@salimidruide
Copy link
Author

salimidruide commented Feb 5, 2024

For more information:

My operator config:

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: operatorgroup
  namespace: mynamespace
spec:
  targetNamespaces:
  - mynamespace
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: clickhouse
  namespace: mynamespace
spec:
  channel: latest
  name: clickhouse
  source: operatorhubio-catalog
  sourceNamespace: olm
  installPlanApproval: Automatic

---

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseOperatorConfiguration"
metadata:
  name: "chop-config-01"
  namespace: "mynamespace"
spec:
  ################################################
  ##
  ## Watch Section
  ##
  ################################################
  watch:
    # List of namespaces where clickhouse-operator watches for events.
    # Concurrently running operators should watch on different namespaces
    #namespaces: ["dev", "test"]
    namespaces: []

  clickhouse:
    configuration:
      ################################################
      ##
      ## Configuration Files Section
      ##
      ################################################
      file:
        path:
          # Path to the folder where ClickHouse configuration files common for all instances within a CHI are located.
          common: config.d
          # Path to the folder where ClickHouse configuration files unique for each instance (host) within a CHI are located.
          host: conf.d
          # Path to the folder where ClickHouse configuration files with users settings are located.
          # Files are common for all instances within a CHI.
          user: users.d
      ################################################
      ##
      ## Configuration Users Section
      ##
      ################################################
      user:
        default:
          # Default values for ClickHouse user configuration
          # 1. user/profile - string
          # 2. user/quota - string
          # 3. user/networks/ip - multiple strings
          # 4. user/password - string
          profile: default
          quota: default
          networksIP:
            - "::1"
            - "127.0.0.1"
          password: "default"
      ################################################
      ##
      ## Configuration Network Section
      ##
      ################################################
      network:
        # Default host_regexp to limit network connectivity from outside
        hostRegexpTemplate: "(chi-{chi}-[^.]+\\d+-\\d+|clickhouse\\-{chi})\\.{namespace}\\.svc\\.cluster\\.local$"
    ################################################
    ##
    ## Access to ClickHouse instances
    ##
    ################################################
    access:
      # ClickHouse credentials (username, password and port) to be used by operator to connect to ClickHouse instances
      # for:
      # 1. Metrics requests
      # 2. Schema maintenance
      # 3. DROP DNS CACHE
      # User with such credentials can be specified in additional ClickHouse .xml config files,
      # located in `chUsersConfigsPath` folder
      username: "clickhouse_operator"
      password: "clickhouse_operator_password"
      secret:
        # Location of k8s Secret with username and password to be used by operator to connect to ClickHouse instances
        # Can be used instead of explicitly specified username and password
        namespace: ""
        name: ""
      # Port where to connect to ClickHouse instances to
      port: 8123

  ################################################
  ##
  ## Templates Section
  ##
  ################################################
  template:
    chi:
      # Path to the folder where ClickHouseInstallation .yaml manifests are located.
      # Manifests are applied in sorted alpha-numeric order.
      path: templates.d

  ################################################
  ##
  ## Reconcile Section
  ##
  ################################################
  reconcile:
    runtime:
      # Max number of concurrent CHI reconciles in progress
      reconcileCHIsThreadsNumber: 10
      # Max number of concurrent shard reconciles in progress
      reconcileShardsThreadsNumber: 1
      # The maximum percentage of cluster shards that may be reconciled in parallel
      reconcileShardsMaxConcurrencyPercent: 50

    statefulSet:
      create:
        # What to do in case created StatefulSet is not in Ready after `statefulSetUpdateTimeout` seconds
        # Possible options:
        # 1. abort - do nothing, just break the process and wait for admin
        # 2. delete - delete newly created problematic StatefulSet
        # 3. ignore - ignore error, pretend nothing happened and move on to the next StatefulSet
        onFailure: ignore

      update:
        # How many seconds to wait for created/updated StatefulSet to be Ready
        timeout: 300
        # How many seconds to wait between checks for created/updated StatefulSet status
        pollInterval: 5
        # What to do in case updated StatefulSet is not in Ready after `statefulSetUpdateTimeout` seconds
        # Possible options:
        # 1. abort - do nothing, just break the process and wait for admin
        # 2. rollback - delete Pod and rollback StatefulSet to previous Generation.
        # Pod would be recreated by StatefulSet based on rollback-ed configuration
        # 3. ignore - ignore error, pretend nothing happened and move on to the next StatefulSet
        onFailure: rollback

    host:
      wait:
        exclude: "true"
        include: "false"

  ################################################
  ##
  ## Annotations management
  ##
  ################################################
  annotation:
    # Applied when:
    #  1. Propagating annotations from the CHI's `metadata.annotations` to child objects' `metadata.annotations`,
    #  2. Propagating annotations from the CHI Template's `metadata.annotations` to CHI's `metadata.annotations`,
    # Include annotations from the following list:
    # Applied only when not empty. Empty list means "include all, no selection"
    include: []
    # Exclude annotations from the following list:
    exclude: []

  ################################################
  ##
  ## Labels management
  ##
  ################################################
  label:
    # Applied when:
    #  1. Propagating labels from the CHI's `metadata.labels` to child objects' `metadata.labels`,
    #  2. Propagating labels from the CHI Template's `metadata.labels` to CHI's `metadata.labels`,
    # Include labels from the following list:
    # Applied only when not empty. Empty list means "include all, no selection"
    include: []
    # Exclude labels from the following list:
    exclude: []
    # Whether to append *Scope* labels to StatefulSet and Pod.
    # Full list of available *scope* labels check in labeler.go
    #  LabelShardScopeIndex
    #  LabelReplicaScopeIndex
    #  LabelCHIScopeIndex
    #  LabelCHIScopeCycleSize
    #  LabelCHIScopeCycleIndex
    #  LabelCHIScopeCycleOffset
    #  LabelClusterScopeIndex
    #  LabelClusterScopeCycleSize
    #  LabelClusterScopeCycleIndex
    #  LabelClusterScopeCycleOffset
    appendScope: "no"

  ################################################
  ##
  ## StatefulSet management
  ##
  ################################################
  statefulSet:
    revisionHistoryLimit: 0

  ################################################
  ##
  ## Pod management
  ##
  ################################################
  pod:
    # Grace period for Pod termination.
    # How many seconds to wait between sending
    # SIGTERM and SIGKILL during Pod termination process.
    # Increase this number is case of slow shutdown.
    terminationGracePeriod: 30

  ################################################
  ##
  ## Log parameters
  ##
  ################################################
  logger:
    logtostderr: "true"
    alsologtostderr: "false"
    v: "1"
    stderrthreshold: ""
    vmodule: ""
    log_backtrace_at: ""
    ```
    
   My clickhouse configuration
   ```yaml
   ---
## https://github.com/Altinity/clickhouse-operator/blob/master/deploy/operator/clickhouse-operator-install-bundle.yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata: 
  name: clickhouse
  namespace: mynamespace
  annotations: 
    prometheus.io/scrape: "false"
    prometheus.io/port: "8888"  
    ad.datadoghq.com/clickhouse.checks: |
              {
                "openmetrics": { 
                  "init_config": {},
                  "instances": [
                    {
                      "openmetrics_endpoint": "http://%%host%%:8888/metrics",
                      "use_openmetrics": "false"
                    }
                  ]
                }
              }
    ad.datadoghq.com/clickhouse.logs: '[{"source":"clickhouse","service":"clickhouse_cluster"}]'
    argocd.argoproj.io/sync-wave: "3" 
spec:
  configuration:
    settings:
      logger/level: "error"
      # to allow scrape metrics via embedded prometheus protocol
      prometheus/endpoint: /metrics
      prometheus/port: 8888
      prometheus/metrics: false
      prometheus/events: false
      prometheus/asynchronous_metrics: false
    users:
      superset/networks/ip: "::/0"
      superset/password: apassword
      superset/profile: default
    clusters:
      - name: clickhouse
        layout:
          replicasCount: 1
          shardsCount: 1
        templates:
          volumeClaimTemplate: storage-clickhouse
          podTemplate: pod-template 
          serviceTemplate: svc-template
    files:
      config.d/01-clickhouse-02-logger.xml: |
        <yandex>
            <logger>
                <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
                <level>error</level>
                <log>/var/log/clickhouse-server/clickhouse-server.log</log>
                <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
                <size>1000M</size>
                <count>10</count>
                <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
                <console>1</console>
            </logger>
        </yandex>
  templates:
    serviceTemplates:
    - name: svc-template
      spec:
        type: ClusterIP        
    podTemplates:
      - name: pod-template
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:23.11.5-alpine
              resources:
                limits:
                  cpu: "1"
                  memory: 7Gi
                requests:
                  cpu: "1"
                  memory: 7Gi      

    volumeClaimTemplates:
      - name: storage-clickhouse
        spec:
          storageClassName: "csi-cinder-classic"
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Gi

The erroneous generated config

Screenshot 2024-02-05 at 13 39 41

@sunsingerus
Copy link
Collaborator

Found the reason, work in progress. Please, keep an eye on operatorhub releases, will publish 0.23.1 soon

@sunsingerus
Copy link
Collaborator

@salimidruide thanks for detailed description of the case, this really helped to reproduce and to catch the issue

@alex-zaitsev
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants