-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand user group to XArray mailing list #130
Comments
@mrocklin - how hard would it be to add some documentation on the JupyterHub landing page. I have showed our deployment to a number of people and almost universally, I get questions like: "what is this?" or "how does this differ from my JupyterHub?". It would be great if up front, we could tell people that this is running on GCP, it comes from the pangeo project (give a link to our website), and explain how this deployment is different (autoscales, integration with distributed). |
I don't know, but I agree that it would be a good idea.
…On Fri, Feb 23, 2018 at 2:08 PM, Joe Hamman ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> - how hard would it be to add
some documentation on the JupyterHub landing page. I have showed our
deployment to a number of people and almost universally, I get questions
like: "what is this?" or "how does this differ from my JupyterHub?". It
would be great if up front, we could tell people that this is running on
GCP, it comes from the pangeo project (give a link to our website), and
explain how this deployment is different (autoscales, integration with
distributed).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszFq4gBRdp-I6UuT9bw8Aj7KicNLqks5tXwy_gaJpZM4SQMqe>
.
|
Perhaps @yuvipanda can comment on the potential to do this. |
Great points @jhamman. I think it would be great to link to our other project sites and acknowledge our funders on the landing page. |
To @mrocklin's list I would add
Having fine-grained and long-term timeseries could be very valuable in the long run (e.g. when it's time to get more funding. |
It has been about a month here. Checking in on this. It looks like @jhamman has made progress on adding documentation to the landing page. Most of the things that I've mentioned have not happened:
I've added a couple examples here for this item.
I would encourage others to add more. This can be done by submitting a PR that targets the @rabernat 's desire to have fine-grained metrocs and long term timeseries also has not happened. Do we go ahead an expand to the xarray mailing list anyway? |
I'm fine publicizing this as is. I have two talks in the next few weeks where I'll be using or mentioning this deployment so it be nice to keep these things moving to whatever extent possible. |
It looks like there is already a lot of kubernetes log info dumped into stackdriver. (GCE docs on kubernetes logging.) Browsing through the logs I can see lots of details on how individual users create and delete different resources. This should be sufficient to retroactively create the metrics we want at a later date. So I am satisfied in the sense that the data is fundamentally there; we just haven't analyzed it yet. Are we concerned at all about security? What about RBAC (disusssion in #167)? I am somewhat paranoid that a user could easily and even accidentally delete data that I have worked very hard to upload (see e.g. #150, #166). If @mrocklin and @jacobtomlinson feel that these are not serious concerns, I will defer to their expertise. I also have a few more example notebooks that I would really like add to the default docker image. Is the current notebook dockerfile in this repo up to date? If so I will add my examples today and rebuild the docker image. |
I'm concerned about security, but more from a misuse of resources perspective than a delete-data perspective. Data is stored on GCS, which is separate from our kubernetes deployment here. You can (and should) view permissions on GCS by going to the cloud console and navigating to the storage tab. I don't think that people we don't know can delete data there, but you shouldn't trust me on this,
Yeah, this would be great. Unfortunately it looks like no one is doing it.
I am not an expert on cloud deployments and you should not trust me. I do not accept this responsibility and instead push it back up to you as PI.
I'll be rebuilding the image sometime in the next day or two. If you just push things to the |
@tjcrone: would it be feasible for you to apply what you have learned about RBAC to pangeo.pydata.org? It sounds like this is an important security feature we should have enabled, but few people around here have the necessary expertise to make it work. |
I have some large concerns about the security of the current platform. The two main issues are down to RBAC not being enabled and the notebooks being run in privileged containers. If data on GCS is writable by any single user on the cluster then it technically is writable by anyone via privilege escalation. It would also be reasonable trivial to begin crypto mining or other things on this platform, this is unavoidable as the whole platform is intended to allow people to execute arbitrary code. Perhaps some monitoring of resources would be useful so maintainers can be notified of large scale usage. |
@jacobtomlinson I hear you! The problem is that we are having a hard time finding someone with the necessary expertise to actually fix these issues. Is there any chance you would have some time to take on the RBAC issue? It sounds like you have already implemented this in your own deployments, so perhaps it would not be too heavy a burden to enable it for pangeo.pydata.org. We would be sincerely grateful. As for crypto mining and other misuse, I'm slightly less concerned about that right now. How does mybinder deal with that question? Hopefully we can eventually find a way to use the stackdriver logs for both retroactive analysis and realtime monitoring of resource usage. |
We need to resolve #176 before we can share with the XArray mailing list. It looks like this will require a new version of gcsfs. |
We're already running on a branch. This just requires someone to make a PR
with an appropriate fix. It would be good to get more hands in that
project if anyone is around. This sounds like a perfect first PR
opportunity.
…On Thu, Mar 22, 2018 at 1:21 PM, Ryan Abernathey ***@***.***> wrote:
We need to resolve #176 <#176>
before we can share with the XArray mailing list. It looks like this will
require a new version of gcsfs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszO_92mpfPoTrCmzPq86yXVMJi_izks5tg919gaJpZM4SQMqe>
.
|
Currently, you should assume that anyone with any access to this hub can impersonate anyone else (covertly or not) and do anything they want with it. As others have already mentioned, this is primarily because of RBAC and using Privileged containers. I'm unfortunately swamped with a course launch happening in the first week of April here in Berkeley, and will not be able to help at least until sometime after mid-April :( Am happy to answer specific questions people have in the meantime though! |
People don't seem to be using fuse at the moment. So we can tear that
down. Hopefull with @tjcrone's RBAC implementatation that might resolve
all of the critical issues?
…On Thu, Mar 22, 2018 at 3:23 PM, Yuvi Panda ***@***.***> wrote:
Currently, you should assume that anyone with any access to this hub can
impersonate anyone else (covertly or not) and do anything they want with
it. As others have already mentioned, this is primarily because of RBAC and
using Privileged containers.
I'm unfortunately swamped with a course launch happening in the first week
of April here in Berkeley, and will not be able to help at least until
sometime after mid-April :( Am happy to answer specific questions people
have in the meantime though!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszFKen6koFenb5Nap4bLLYByTb4UVks5tg_ofgaJpZM4SQMqe>
.
|
I have been using fuse recently to provide read access to non-zarr filetypes. I may not be following but what is the current motivation for tearing down fuse? |
As we currently have it configured FUSE is implemented with elevated
permissions on the docker containers. Tearing it down would increase
security. We could also mount FUSE using Kubernetes FlexVolumes. I think
that the UK Met office solution does this currently with S3
…On Thu, Mar 22, 2018 at 7:47 PM, Joe Hamman ***@***.***> wrote:
People don't seem to be using fuse at the moment.
I have been using fuse recently to provide read access to non-zarr
filetypes. I may not be following but what is the current motivation for
tearing down fuse?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszDuJJTJH9E5L3YkIByJjbx1qp-ALks5thDgdgaJpZM4SQMqe>
.
|
I've been in and out of the office the last few days for personal reasons. But it looks like @tjcrone has done is good job of RBAC stuff in #172. We don't use privileged containers and we give FUSE access using our S3 FUSE flex volume drivers. These are based on @yuvipanda's NFS flex volume driver. It does make some assumptions about the cluster (it's running |
My guess is that people will not be picky :) We're sufficiently
constrained by expertise and hours that language preferences seem
extravagant :)
…On Fri, Mar 23, 2018 at 5:52 AM, Jacob Tomlinson ***@***.***> wrote:
I've been in and out of the office the last few days for personal reasons.
But it looks like @tjcrone <https://github.com/tjcrone> has done is good
job of RBAC stuff in #172 <#172>
.
We don't use privileged containers and we give FUSE access using our S3
FUSE flex volume drivers
<https://github.com/met-office-lab/s3-fuse-flex-volume>. These are based
on @yuvipanda <https://github.com/yuvipanda>'s NFS flex volume driver
<https://github.com/yuvipanda/nfs-flex-volume>.
It does make some assumptions about the cluster (it's running ubuntu for
example) and it installs packages on the hosts, which I'm unsure whether
that's possible on your GCE clusters. If it is possible then we could
definitely adapt this package to run on your cluster, if the use of golang
puts you off that is also easy to change to something else (but that thing
needs to be available on the host).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszE9xkzwvwm_H03rfMzNCGj7bTrsnks5thMXQgaJpZM4SQMqe>
.
|
Where do we stand on this. Does FUSE still work with #172 merged? Once we have the cluster re-deployed with RBAC, I think we are ready to share with the xarray mailing list. |
According to @jacobtomlinson FUSE still represents a serious security
issue. @jhamman still finds fuse useful. I see three options:
1. Someone implements FUSE more cleanly using FlexVolumes. UK Met has a
fairly clean example doing this that they use in production.
2. We remove FUSE
3. We publish anyway
…On Mon, Mar 26, 2018 at 10:05 AM, Ryan Abernathey ***@***.***> wrote:
Where do we stand on this. Does FUSE still work with #172
<#172> merged?
Once we have the cluster re-deployed with RBAC, I think we are ready to
share with the xarray mailing list.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszKAg61QnHmx0qGqOT4yahmGUWuuUks5tiPW_gaJpZM4SQMqe>
.
|
This is obviously the best choice. Who else might be qualified to do that? We are already leaning on @jacobtomlinson pretty hard... |
I don't know if I will have time to look at implementing this for GCE, but I would be keen for my flex volume driver to be expanded and made more generic. There is no reason why it has to be S3 specific, we could make it an "object store FUSE flex driver" instead. We could add more FUSE applications and create drivers for them, for example we could add gcsfuse. This requires changes to two sections: The first installs system requirements on the kubernetes nodes themselves. This is currently done here using a privileged container and assumes the node is running Debian. This could be abstracted out into a custom container with logic for The second step builds and drops the, currently two, flex volume drivers onto the node. The drivers are golang cli applications which wrap the FUSE cli applications to conform to the Kubernetes flex volume API. They are packaged as a docker image, which builds the binaries when you build the docker image and then copies them to a volume when you run the container. It is reasonably straight forward to copy one of the existing go applications and adapt it to the gcsfuse command line application. The reason the drivers are written in golang is for portability, however you could probably write a short shell script which achieves the same task. The only blocker is it requires some json munging which probably would require Once you have installed the helm chart, and therefore the FUSE applications and drivers, you can use them (or not) in any pod on your cluster. So an AWS cluster would have the gcsfuse driver available, you just wouldn't use it on your Pangeo as you will get charged for data transit. And vice versa for GCE clusters. |
#190 has way forward on the FUSE issue. I no longer use the NFS Flex Volume I built. I instead use the approach mentioned in that issue. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
So far knowledge of the pangeo.pydata.org website has mostly been communicated in a social fashion. We might also force this spread a bit once we feel we're ready. I suspect that the next logical group would be the XArray mailing list. What do we want to accomplish before we are comfortable with this?
Some options:
How much do we care about these? What are other possible blockers that we might care about before releasing to the XArray mailing list?
The text was updated successfully, but these errors were encountered: