Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEEDBACK] Request for response to contributor/user survey #685

Closed
ShubhamGupta29 opened this issue Apr 29, 2020 · 13 comments
Closed

[FEEDBACK] Request for response to contributor/user survey #685

ShubhamGupta29 opened this issue Apr 29, 2020 · 13 comments
Labels

Comments

@ShubhamGupta29
Copy link
Contributor

We are increasing focus on Dr. Elephant and the community of contributors and users. Our immediate goals are to support the latest versions of Spark. Over the last year, we have noticed that Dr. Elephant is being used more in cloud platforms like AWS and Azure. The following issues track these efforts:

  1. Support for Spark 2.3/2.4 in Dr.Elephant
  2. Support for Hadoop 3
  3. Installation instructions for AWS EMR and Azure HDInsight

We also want to know from you how you are using Dr. Elephant as well as how Dr. Elephant and the community can be improved. Can you please respond to this survey? Your responses will help us prioritize features.

If we have missed something in the survey please let us know in this issue or in the last question of the survey.

@astahlman @xglv1985 @mareksimunek @sri840 @tooptoop4

@ShubhamGupta29
Copy link
Contributor Author

@mareksimunek @xglv1985 Thanks for filling out the survey. We have added one more question to the survey: Do you prefer to install Dr.Elephant using Docker containers, kindly provide your preference for using docker installation for Dr.Elephant.

@tooptoop4
Copy link

@ShubhamGupta29 will there be support for spark standalone? (no yarn)

@xglv1985
Copy link

xglv1985 commented May 6, 2020

@mareksimunek @xglv1985 Thanks for filling out the survey. We have added one more question to the survey: Do you prefer to install Dr.Elephant using Docker containers, kindly provide your preference for using docker installation for Dr.Elephant.

Yes, I prefer Docker containers, especially when I have more than one yarn clusters to be tracked by Dr.Elephant. But I also hope non-Docker Dr.elephant can be reserved, to be compatible with the functionality of our online Dr.Elephant

@mareksimunek
Copy link

@ShubhamGupta29 yes Docker is preferred way. Or ansbile playbook.
How are you installing Dr.elephant at Likedin?

@ShubhamGupta29
Copy link
Contributor Author

@ShubhamGupta29 will there be support for spark standalone? (no yarn)

@tooptoop4 can you provide details about your use case. Using Dr.Elephant for standalone jobs seems like a overkill.

@ShubhamGupta29
Copy link
Contributor Author

@ShubhamGupta29 yes Docker is preferred way. Or ansbile playbook.
How are you installing Dr.elephant at Likedin?

@mareksimunek we are installing the same way as mentioned in the documentation.

@tooptoop4
Copy link

@ShubhamGupta29 i dont have emr/cloudera just hive/spark/s3

@shkhrgpt
Copy link
Contributor

@ShubhamGupta29 Thanks for starting this thread. Improving Spark support is much needed. However, I have one concern. Are we still going to have customSHSWork branch? I am asking this because any changes done in customSHSWork are not useful for rest of the community because we don't have access to the custom Spark history server which LinkedIn uses.

@ShubhamGupta29
Copy link
Contributor Author

@ShubhamGupta29 Thanks for starting this thread. Improving Spark support is much needed. However, I have one concern. Are we still going to have customSHSWork branch? I am asking this because any changes done in customSHSWork are not useful for rest of the community because we don't have access to the custom Spark history server which LinkedIn uses.

For LinkedIn we will continue to use customSHSWork, but for OpenSource we will be adding all the new changes to master branch itself as now master will the branch in focus for all the new development keeping OS community in mind.

@ShubhamGupta29
Copy link
Contributor Author

@shkhrgpt and @tooptoop4 there is a survey for feedback related to Dr.Elephant, kindly fill that survey and provide any feature you would like to have in Dr.Elephant.
survey: https://forms.gle/Fb956VQuyXREvfmM6

@ShubhamGupta29
Copy link
Contributor Author

Closing this thread as a similar thread is in place #687

@ShubhamGupta29
Copy link
Contributor Author

@ShubhamGupta29 i dont have emr/cloudera just hive/spark/s3

What's the RM you are using?
@tooptoop4 better to mention this in the survey and you can create a separate thread for this so we can understand the requirements well and provide the support you need from the Dr.Elephant's end.

@theyaa
Copy link

theyaa commented May 12, 2020

I am trying to integrate Dr. Elephant with HDP3(Hadoop 3 and Hive 3). And I am looking at the following features.

  • Support for Hadoop 3

  • Support for Spark 2.3,2.4,3

  • Support for Hive 3, including LLAP.

  • Support for Yarn TimelineServer v2, that includes reading the hive queries metadata from hive sys db. Since Tez writes data to atsv2 in HDP3

  • Support for user authentication. Connecting to AD, PAM, or others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants