[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

kuhushukla · 2024-08-20T14:53:42Z

Describe the bug
Python qual tool runs does not honor hadoop_conf_dir when set and uses defaults to read from the HDFS location. This can cause failures (depending on configs that are non-default for that cluster)
Steps/Code to reproduce bug

spark_rapids qualification --verbose --eventlogs=hdfs://nn:8020/spark2-history/application_1234_5678 --platform=onprem

Say the core-site.xml sets up the cluster auth to kerberos, this command will fail due to picking up the default value (simple) instead of the appropriate confs from the environment variable.
Expected behavior
HADOOP_CONF_DIR should be passed by the python tool to the jar in the classpath
-cp <jars>:<tools_jar>:$HADOOP_CONF_DIR/*

Environment details (please complete the following information)
HDFS with some non-default cluster config like kerberos authentication (as an example)

Additional context
This issue
would help catch such bugs earlier

The text was updated successfully, but these errors were encountered:

tgravescs · 2024-08-20T15:38:41Z

when hdfs is the filesystem, the java tool output will default to hdfs but python still goes to local filesystem. This can be confusing to user

Signed-off-by: Ahmed Hussein <ahussein@nvidia.com> Fixes NVIDIA#1253 Fixes NVIDIA#1302 This change includes the following: - the python wrapper pulls the hadoop configuration directory `$HADOOP_CONF_DIR` env var. If the latter is not defined, the wrapper tries `$HADDOP_HOME/etc/hadoop`. - If the `hadoop_conf_dir` is defined then it is appended to the java CLASSPATH iff it is a valid local directory path - If none of the above applies, the class path will be the same.

kuhushukla added bug Something isn't working ? - Needs Triage labels Aug 20, 2024

parthosa mentioned this issue Aug 20, 2024

[FEA] HDFS Support in Tools #1304

Open

amahussein self-assigned this Aug 20, 2024

amahussein added user_tools Scope the wrapper module running CSP, QualX, and reports (python) and removed ? - Needs Triage labels Aug 20, 2024

amahussein mentioned this issue Aug 21, 2024

Append HADOOP_CONF_DIR to the tools CLASSPATH execution cmd #1308

Merged

amahussein closed this as completed in #1308 Aug 22, 2024

amahussein closed this as completed in 72f7e57 Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

kuhushukla commented Aug 20, 2024 •

edited

Loading

tgravescs commented Aug 20, 2024

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

Comments

kuhushukla commented Aug 20, 2024 • edited Loading

tgravescs commented Aug 20, 2024

kuhushukla commented Aug 20, 2024 •

edited

Loading