Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

Closed
Tracked by #1304
kuhushukla opened this issue Aug 20, 2024 · 1 comment · Fixed by #1308
Closed
Tracked by #1304

[BUG] user_tools fails to pick up hadoop_conf_dir when reading from hdfs #1302

kuhushukla opened this issue Aug 20, 2024 · 1 comment · Fixed by #1308
Assignees
Labels
bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python)

Comments

@kuhushukla
Copy link
Collaborator

kuhushukla commented Aug 20, 2024

Describe the bug
Python qual tool runs does not honor hadoop_conf_dir when set and uses defaults to read from the HDFS location. This can cause failures (depending on configs that are non-default for that cluster)
Steps/Code to reproduce bug

spark_rapids qualification --verbose --eventlogs=hdfs://nn:8020/spark2-history/application_1234_5678 --platform=onprem

Say the core-site.xml sets up the cluster auth to kerberos, this command will fail due to picking up the default value (simple) instead of the appropriate confs from the environment variable.
Expected behavior
HADOOP_CONF_DIR should be passed by the python tool to the jar in the classpath
-cp <jars>:<tools_jar>:$HADOOP_CONF_DIR/*

Environment details (please complete the following information)
HDFS with some non-default cluster config like kerberos authentication (as an example)

Additional context
This issue
would help catch such bugs earlier

@kuhushukla kuhushukla added bug Something isn't working ? - Needs Triage labels Aug 20, 2024
@tgravescs
Copy link
Collaborator

when hdfs is the filesystem, the java tool output will default to hdfs but python still goes to local filesystem. This can be confusing to user

@amahussein amahussein self-assigned this Aug 20, 2024
@amahussein amahussein added user_tools Scope the wrapper module running CSP, QualX, and reports (python) and removed ? - Needs Triage labels Aug 20, 2024
amahussein added a commit to amahussein/spark-rapids-tools that referenced this issue Aug 21, 2024
Signed-off-by: Ahmed Hussein <ahussein@nvidia.com>

Fixes NVIDIA#1253
Fixes NVIDIA#1302

This change includes the following:

- the python wrapper pulls the hadoop configuration directory `$HADOOP_CONF_DIR` env var. If the latter is not defined, the wrapper tries `$HADDOP_HOME/etc/hadoop`.
- If the `hadoop_conf_dir` is defined then it is appended to the java
  CLASSPATH iff it is a valid local directory path
- If none of the above applies, the class path will be the same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants