Airflow Enabling Kerberos #27339
-
Environment
Keytab File Permissionchmod 600 /opt/airflow/custom/conf/kerberos/ai.keytab Config Kerberos in docker-compose.yamlAIRFLOW__CORE__SECURITY: kerberos
AIRFLOW__KERBEROS__REINIT_FREQUENCY: 3600
AIRFLOW__KERBEROS__PRINCIPAL: hive/hh1.elf.com@EXAMPLE.COM
AIRFLOW__KERBEROS__KEYTAB: /opt/airflow/custom/conf/kerberos/ai.keytab
AIRFLOW__KERBEROS__CCACHE: /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache
AIRFLOW__KERBEROS__FORWARDABLE: True
AIRFLOW__KERBEROS__INCLUDE_IP: True krb5.conf[libdefaults]
dns_lookup_kdc = true
dns_lookup_realm = false
default_ccache_name = /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache
udp_preference_limit = 1
[realms]
EXAMPLE.COM = {
kdc = nn1.elf.com
admin_server = nn1.elf.com
}
[domain_realm]
.example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM config hosts and krb5.confservices:
airflow-worker:
<<: *airflow-common
command: celery worker
container_name: airflow-worker
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
KRB5_CONFIG: /opt/airflow/custom/conf/kerberos/krb5.conf
extra_hosts:
#- "host:IP"
- "nn1.elf.com:192.168.9.51"
- "nn2.elf.com:192.168.9.52"
- "hh1.elf.com:192.168.9.53"
- "hh2.elf.com:192.168.9.54"
- "cd1.elf.com:192.168.9.55"
- "cd2.elf.com:192.168.9.56"
- "cd3.elf.com:192.168.9.57"
- "cd4.elf.com:192.168.9.58"
- "cd5.elf.com:192.168.9.59"
- "edge1.elf.com:192.168.9.60"
- "edge2.elf.com:192.168.9.62" Config Hapoopssh 'root@192.168.9.51'
find / -name core-site.xml
vim /etc/hadoop/conf.cloudera.hdfs/core-site.xml <property>
<name>hadoop.proxyuser.airflow.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.airflow.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.airflow.hosts</name>
<value>*</value>
</property>
QuestionQuestion One: Connect to Hive Failure
{"auth_mechanism": "KERBEROS", "kerberos_service_name": "hive", "run_set_variable_statements": "false"} Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1):
generic failure: GSSAPI Error: Unspecified GSS failure.
Minor code may provide more information (Cannot find KDC for realm "EXAMPLE.COM")' when i connect to hive via pyhive as original method,not from apache-airflow-provider-hive,it works. Question Two:apache-airflow-provider-hdfs startup with error encoding airflow-init_1 | from airflow.main import main
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/main.py", line 28, in
airflow-init_1 | from airflow.cli import cli_parser
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 621, in
airflow-init_1 | type=argparse.FileType('w', encoding='UTF-8'),
airflow-init_1 | TypeError: init() got an unexpected keyword argument 'encoding' so connect to hdfs via hdfs and pykerberos,it also works: RUN pip3 install --no-cache-dir pykerberos==1.2.4
RUN pip3 install --no-cache-dir hdfs[kerberos,dataframe]==2.7.0 Question Three: install extras in Dockerfilehttps://airflow.incubator.apache.org/docs/apache-airflow/2.4.1/extra-packages-ref.html RUN pip3 install 'apache-airflow[kerberos]' Question Four: Kerberos Ticket Renewer$ airflow kerberos
/home/airflow/.local/lib/python3.9/site-packages/airflow/configuration.py:363: FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
warnings.warn(
[2022-10-28 10:41:01,135] {settings.py:263} DEBUG - Setting up DB connection pool (PID 126)
[2022-10-28 10:41:01,136] {settings.py:365} DEBUG - settings.prepare_engine_args(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=126
[2022-10-28 10:41:01,188] {cli_action_loggers.py:39} DEBUG - Adding <function default_action_log at 0x7f9ea1203160> to pre execution callback
[2022-10-28 10:41:01,346] {cli_action_loggers.py:65} DEBUG - Calling callbacks: [<function default_action_log at 0x7f9ea1203160>]
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2022-10-28 10:41:03,876] {kerberos.py:90} INFO - Re-initialising kerberos from keytab: kinit -f -a -r 3600m -k -t /opt/airflow/custom/conf/kerberos/ai.keytab -c /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache ai@EXAMPLE.COM
[2022-10-28 10:41:05,704] {kerberos.py:142} INFO - Renewing kerberos ticket to work around kerberos 1.8.1: kinit -c /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache -R
kinit: Incorrect net address while renewing credentials
[2022-10-28 10:41:09,143] {kerberos.py:149} ERROR - Couldn't renew kerberos ticket in order to work around Kerberos 1.8.1 issue. Please check that the ticket for 'ai@EXAMPLE.COM/12ec8c2ccb62' is still renewable:
$ kinit -f -c /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache
If the 'renew until' date is the same as the 'valid starting' date, the ticket cannot be renewed. Please check your KDC configuration, and the ticket renewal policy (maxrenewlife) for the 'ai@EXAMPLE.COM/12ec8c2ccb62' and `krbtgt' principals.
[2022-10-28 10:41:09,144] {cli_action_loggers.py:83} DEBUG - Calling callbacks: []
[2022-10-28 10:41:09,146] {settings.py:401} DEBUG - Disposing DB connection pool (PID 126)
airflow@12ec8c2ccb62:/opt/airflow/custom/conf/kerberos$ /usr/bin/kinit -kt /opt/airflow/custom/conf/kerberos/ai.keytab -c /opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache ai@EXAMPLE.COM
airflow@12ec8c2ccb62:/opt/airflow/custom/conf/kerberos$ klist
Ticket cache: FILE:/opt/airflow/custom/conf/kerberos/airflow_krb5_ai_ccache
Default principal: ai@EXAMPLE.COM
Valid starting Expires Service principal
10/28/22 10:41:20 10/29/22 10:41:20 krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 11/04/22 10:41:20
airflow@12ec8c2ccb62:/opt/airflow/custom/conf/kerberos$ so can thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#networking services:
airflow-worker:
extra_hosts: - "host.docker.internal:host-gateway" this way can not write entry to services:
airflow-worker:
command: celery worker
environment:
extra_hosts:
#- "host:IP" |
Beta Was this translation helpful? Give feedback.
-
what unit is it? 3600m equal to 3600 minutes? |
Beta Was this translation helpful? Give feedback.
-
I think you should start with studying the documentation (and if you cannot find things in doc - go to the code to find out and search for kerberos - looking at the code might help you. Since very few of our users use Kerberos, the documentation there is sparse but it is there (and you will find answers to some of the questions) BTW. Since this is open-source software and it is the users who help with making our documentation better, I would really appreciate it is (as a kind of "pay-back" for the absolutely free software you get) - once you find out answers to those question, it would be great if you update the docs and improve it with what you think is missing there or might be helpful to others using kerberos. It is very simple just "Suggest a change on this Page" button opens a PR and you can add docs directly
|
Beta Was this translation helpful? Give feedback.
-
yes,of course,after task completion,will write doc of airflow integrate kerberos |
Beta Was this translation helpful? Give feedback.
I think you should start with studying the documentation (and if you cannot find things in doc - go to the code to find out and search for kerberos - looking at the code might help you. Since very few of our users use Kerberos, the documentation there is sparse but it is there (and you will find answers to some of the questions)
BTW. Since this is open-source software and it is the users who help with making our documentation better, I would really appreciate it is (as a kind of "pay-back" for the absolutely free software you get) - once you find out answers to those question, it would be great if you update the docs and improve it with what you think is missing there or might be helpful …