Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider what to do with nbgrader support #174

Closed
yuvipanda opened this issue Sep 19, 2017 · 22 comments
Closed

Consider what to do with nbgrader support #174

yuvipanda opened this issue Sep 19, 2017 · 22 comments

Comments

@yuvipanda
Copy link
Collaborator

Moving jupyterhub/helm-chart#47 here.

Lots of people want nbgrader support, but nbgrader requires a shared filesystem for all users to function. per jupyterhub/helm-chart#47 (comment) I think it'll wait for hubshare to happen.

@willingc
Copy link
Collaborator

For now (at least until Jess resurfaces), let's put this in a holding pattern.

@labarba
Copy link

labarba commented Sep 25, 2017

@yuvipanda The links above throw 404. Did you move the comments from that closed issue somewhere? I wanted to review the problem description and discussion.

@yuvipanda
Copy link
Collaborator Author

@labarba apparently closing issues makes things 404 :( I've enabled them back again, and the link works!

@Sefriol
Copy link

Sefriol commented May 22, 2018

We are currently trying implement a Kubernetes + JupyterHub + nbGrader combo in our university as a test run for a course starting this fall. Is there guides on how to integrate NFS into this if it is required to make nbgrader work?
Or are there any elegant solutions made since this was originally posted? HubShare seems to be in a stale mate.

@vishwesh5
Copy link

Hi! Is there are any update regarding this? If we manage to setup an NFS (#421), can nbgrader be used?

@Sefriol
Copy link

Sefriol commented Aug 2, 2018

We have a WIP prototype of the system (Kubernetes + JupyterHub + nbGrade) with NFS and it "works", but you really have to put some extra hours to iron out the bugs.

Most of our problems relate to access rights and there are some concerns about security as well. We are going to run a big course during the fall, where this system would be a key part. No idea if we manage to get it "ready" before it, but currently we are able to execute the basic nbgrader functionality.

So if you got the time to debug nbgrader and put extra time to get it done... Sure, it works.

@vishwesh5
Copy link

Thanks Sefriol for your reply. Can you please detail how you managed to have it up and working? What functionalities are you talking about? To give you an idea, I just need the basic functionalities like the following:

  1. Auto and manual grading should be there.
  2. It should read a student database and release and collect the assignments.
    That's it.

@betatim
Copy link
Member

betatim commented Aug 3, 2018

@Sefriol can you link to/share your setup? I think with these things that aren't part of the official guide it would be good to share links so those that are interested can share their experience. The hope being that out of this a set of instructions/guide will emerge that we can link to from the Zero2JupyterHub guide.

@Sefriol
Copy link

Sefriol commented Aug 3, 2018

We are aiming at running the course from September to October, so maybe I can share some resources by then.

Most of the changes required to be done, are configured by our University's IT department, so I wasn't part of that process. All in all, when I created some test cases, we were able to run all common features provided by nbgrader.

Since it's still WIP, there are some hacks that are specific to our setup, but when we have a "finished product" (and I would use finished very loosely here), I'm pretty sure we can share our experience and changes we did. I'll mention this conversation and interest to our IT department. I think they will be thrilled to share it when most of the work is done.

@rkdarst
Copy link

rkdarst commented Aug 3, 2018

@murhum1 and I are the ones setting up the system as part of Aalto University's CS-IT group. I'll try to put our internal repo public soon... we've almost split out all the secrets. We'll try to describe it some:

Our goal is a secure, multi-course system (usable for any number of courses) which integrates into existing university systems. By the way, we are using kubernetes, but not zero-to-jupyterhub. I have a feeling that our requirements to integrate with an existing university framework means that there are lots of things can't be easily standardized for everyone.

Overall, the base has been really good and most of the pieces are there. If you know how NFS, k8s, kubespawer, and the nbgrader common directories work, most things are quite straightforward.

However, there are certain small things which make small details very, very difficult and I am making PRs to address them whenever I can. If you jupyter people want to make this even easier, I can make more PRs even for things which are easy to hack around.

Our initial state / prerequisites:

  • existing NFS server provided by others. Meaningful uid/gids which we want to match with university systems. Users should have ability to mount their files separately from our system.
  • In particular, one gid per course which is used to share files among instructors.
  • existing user ids and infrastructure to integrate with.
  • Ideally, users should be able to mount their own data via NFS on their own computers. Our NFS server gives us this for free if we can make the uids match up properly.
  • Multi-course capable: can easily scale to any number of courses with minimal manual effort. Also allows for generic instances not tied to any course.

Our setup

  • PAM authentication (join hub server to AD domain to make this possible) (easy)
  • collection of yaml files which serve to define courses (easy)
    • instructors defined by username. Anyone can run the container as a student. If an instructor tries to start an image, they go into instructor mode (main differences are mounting /course and setting primary gid)
    • pre spawn hook can do extensive per-course customization based on this.
    • Dynamically create the profile list.
  • The kubespawer profile list is used to select a course. (easy)
  • changes to docker singleuser image to make uid/gid handling better: Update group handling: set primary gid, leave suplemental with group users jupyter/docker-stacks#687 (make primary gids match what we want) and Add a posibliity for a pre-start hook jupyter/docker-stacks#688 (pre-start hook) (easy with changes, hard without)
  • Extensive use of a pre-spawn hook
    • set uid to the user's university uid. For instructors only, set primary gid to the course's gid. (so far I can do this either for root running or non-root running.) (This is relatively straightforward using either pure k8s or NB_UID support in the docker images, but required fixes to make the primary gid set correctly, see link above.) (easy)
    • NFS mount per-course /course only for instructors. (all NFS mounts are straightforward using k8s). (easy)
    • NFS mount per-course /srv/nbgrader/exchange for everyone because this is the default path. (easy)
    • NFS mount per-user /notebooks to the NFS user home directory (shared among all a user's courses). Add some hacks to make this the starting directory for shell and notebooks (notebook_dir=/ and default_url=/tree/notebooks) while giving filesystem access to the whole system. (The home directory is not shared among instances). (mounting easy, config easy)
    • mount per-course /coursedata directory from NFS if it exists for the course. (easy)
    • hook to create user's initial notebooks directory. (easy)
    • disable formgrader for students. (medium, could be improved upstream but hardly worth it).
    • Write a custom /etc/jupyter/nbgrader_config.py. This is minimal because we can make everything use defaults. We have to specify CourseDirectory=/course and course_id=course_slug (even though by design courses will never conflict). I have to add some groupshared=True options to enable my multiuser nbgrader changes (see below)
  • set umask to 0007 for instructors via different hacks (hard, can be improved upstream)
  • pre-create gradebook.db and chmod 664. sqlite hardcodes 644 when creating new files and this can't be changed (though there are some fancy URI arguments which might be able to do something....) (hard)
  • Custom nbgrader changes so that instructors can share files after all the above has been taken care of: Allow instructors to share files via shared group id jupyter/nbgrader#1000 (hard, can be improved upstream)
  • SMB/NFS mounting so that students/instructors can access their data from their own computers easily.

Problems

I will update this comment in place as things change.

Please comment on which of these you would like me to contribute to upstream.

@willingc
Copy link
Collaborator

willingc commented Aug 8, 2018

Thanks for the detailed writeup @rkdarst. @jhamrick I wasn't sure if you had seen this, but I figure that you would have some input/priority from an nbgrader viewpoint.

@rabernat
Copy link

👍 to NBGrader + jupyterhub-k8s. This would be a killer app for so many data science courses.

I understand the technical limitations that prevent this from working today. Hopefully some of these issues can be solved by next semester or next year.

@consideRatio
Copy link
Member

@rkdarst thank you for the writeup it was great!

@rkdarst
Copy link

rkdarst commented Aug 13, 2018

Hi,

Thanks for the good feedback! I made several updates above: several problems have been worked on, we provide access to files via SMB/NFS, and we want a script to return feedback.

From what you all know, has anyone made comparable nbgrader setups? I'm wondering how unique our setup is and thus our future actions should be. How should we try to contribute these things?

I guess our particular part should be moved to some other issue. What we are doing is too complex for typical zero-to-jupyterhub setups. A lot of our stuff could be used, but it would require careful design and planning by someone who has the big picture of z2jh.

@rabernat
Copy link

At JupyterCon, @yuvipanda mentioned that some recent developments in terms of user home spaces now make it much easier to use nbgrader with jupyterhub-k8s. Can someone give us an update on the status of this?

@jhamrick
Copy link

jhamrick commented Oct 6, 2018

@rkdarst Thank you so much for all your work on this! This is really, really awesome! I'm sorry I've been slow to get to looking at this, but I am going to try to look at all your PRs carefully over the next few days. As a whole though this sounds really great, and I'm excited to get your changes merged!

@rkdarst
Copy link

rkdarst commented Oct 6, 2018 via email

@frouzbeh
Copy link

Hi @rkdarst,
I've been looking for the steps to use PAM authentication and have not been successful. Would you please tell me how you did that?

"PAM authentication (join hub server to AD domain to make this possible) (easy) "

@moorepants
Copy link

We have a kubernetes based JupyterHub deployment at UC Davis (on bare metal) and would like to get nbgrader running on the system for instructors to use. I have a team of four CS seniors that will spend about 40 collective hours a week for 5 months with the goal to develop a community usable solution for this issue. They need some help getting oriented to understand what the current state of affairs is, what the issues are, and what solution paths have people considered. Any suggestions on how to move forward? They start this week :)

@consideRatio
Copy link
Member

I have not set it up myself successfully, but my understanding is that the most challanging part is connecting the relevant storage with relevant permissions, and the need for ability to read/write to this storage from multiple connected pods. All while also not granting permissions to users that shouldnt have them etc.

That i think only describe the key challange to prepare for i think.

@scivm
Copy link

scivm commented Jan 16, 2020

I have been using jupyterhub for about 3 weeks now to get a analytics workspace functionality to a client. I had good kubernetes, helm and docker experience. I was able to put up the basic system in a few days after reading docs and then started with storage, ssl, and oauth integrations which took another week. The documentation is 90% which is understandable because users have on prem, azure, aws and gce installations. I read every post and every issue for the last few years in the project, discourse and gitter chats.

They need a sandbox to play and should try installing apps to kubernetes using helm3 including jupyterhub. Google the nbgrader and read everything related to it.

The current helm chart zero to jupyterhub production release 0.8.2 is old stuff and they should use the 0.9.0-beta.2.

@manics
Copy link
Member

manics commented Jun 10, 2020

Solved by #1556 !

@manics manics closed this as completed Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests