Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize the 'zoned' property in Linux #3159

Open
FransUrbo opened this issue Mar 7, 2015 · 18 comments
Open

Utilize the 'zoned' property in Linux #3159

FransUrbo opened this issue Mar 7, 2015 · 18 comments
Labels
Type: Feature Feature request or new feature

Comments

@FransUrbo
Copy link
Contributor

I've revitalized an idea I had about a year ago (or there about :) - I'm sure I had a discussion with @behlendorf about this, but I can't find that discussion...

I think the basic idea was to use "lightweight VMs" (i.e. "Linux Containers") for this. But trying to understand the zoned property is somewhat difficult, since I haven't used Solaris in years (and the *BSD jails is basically chroots from what I've understand when using them).

Creating a filesystem with zoned=on doesn't mount the filesystem automatically, nor can it be mounted:

# zfs create -o zoned=on rpool/test/test1
# zfs mount rpool/test/test1
cannot mount 'rpool/test/test1': dataset is exported to a local zone

On IRC, I was told that on Solaris, the container (or the Solaris equivalent) needs to do all the work.

[17:12:12] <d[^_^]b> http://docs.oracle.com/cd/E23824_01/html/821-1448/gayov.html#gbbre
[17:13:13] <d[^_^]b> FransUrbo: it is automatically set
[17:13:13] <FransUrbo> Thanx. Read the first paragraph and I already got a headache! Why,
           oh why, does some manpages NEED to be written in such ... discusting language1!?
[17:13:13] <d[^_^]b> ok here
[17:13:13] <d[^_^]b> The zoned property is a boolean value that is automatically turned on
           when a zone containing a ZFS dataset is first booted. A zone administrator does
           not need to manually turn on this property. If the zoned property is set, the
           dataset cannot be mounted or shared in the global zone. In the following
           example, tank/zone/zion has been delegated to a zone, while tank/zone/global
           has not:
[17:14:14] <d[^_^]b> basically you'd  need to get LXC or something to turn on the 'zoned'
           setting
[17:14:14] <d[^_^]b> it seems like it is a way for solaris to let ZFS know a filesystem is
           in-use by a zone so it can not be shared or mounted in the host
[17:14:14] <FransUrbo> Why?
[17:15:15] <FransUrbo> Then how is it accessed?
[17:15:15] <d[^_^]b> probably using a simple set ioctl
[17:15:15] <d[^_^]b> just like 'zfs set zoned=on pool/filesystem'
[17:15:15] <FransUrbo> But the filesystem MUST (?) be mounted (somewhere) to be accessed,
           right?
[17:16:16] <d[^_^]b> in solaris zones behave oddly, i'm sure it has direct hooks to access
           ZFS
[17:16:16] <d[^_^]b> it provides ZFS administration inside zones, too, i think
[17:16:16] <d[^_^]b> like how FreeBSD does in a jail - but i think freebsd has security
           pieces missing that solaris doesn't
[17:16:16] <d[^_^]b> in other words you can administer a single fileset in a solaris zone

But hack every container software on Linux (and apparently there's a few: https://linuxcontainers.org/) isn't an option!

I haven't used Linux containers in years (ever since I realized they where a little to lightweight for my use and I started to investigate 'real' VMs), but I got a quick crash course by @DeHackEd and from my understanding, combining that with a jail (like *BSD does it) seems to be a much more reasonable thing for Linux...

Searching for zoned in the issue tracker gives a couple of code matches, the most notable ones is:

https://github.com/zfsonlinux/zfs/blob/d14cfd83dae0b1a261667acd416dba17a98d15fa/module/zfs/zfs_ioctl.c#L440-447

and

https://github.com/zfsonlinux/zfs/blob/1e8db7710220332808920a582e5794d6fc37b109/lib/libzfs/libzfs_mount.c#L752-757

So the big question is: How to proceed with this?

Update: It seems like mounting it the same was as a legacy filesystem worked:

# mount -o zfsutil -t zfs rpool/test/test1 /mnt
# mount | grep test/test1
rpool/test/test1 on /mnt type zfs (rw,relatime,xattr,noacl)
@FransUrbo
Copy link
Contributor Author

But it's still possible for a 'normal' user to both list and cd into the directory:

# mount | grep test/test1
rpool/test/test1 on /mnt type zfs (rw,relatime,xattr,noacl)
# id
uid=0(root) gid=0(root) groups=0(root)
# ls /mnt/
# touch /mnt/test_file
# ls /mnt/
test_file
# cd /mnt/
# ls
test_file
# cd
# su - turbo
$ id
uid=1000(turbo) gid=1000(turbo) groups=1000(turbo),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev)
$ ls /mnt/
test_file
$ cd /mnt/
$ ls -l
total 1
-rw-r--r-- 1 root root 0 Mar  7 18:26 test_file

@DeHackEd
Copy link
Contributor

DeHackEd commented Mar 7, 2015

Once again Linux finds itself in the situation where there isn't a single integrated piece of software for a feature. Containers are made by leveraging several features in tandem by what is effectively third party software. SMB and iSCSI sharing are in much the same boat (even though Samba is the defacto). As far as the kernel is concerns containers are made on demand and mount namespaces are identified only by inode number (see /proc/self/ns/mnt on a sufficiently up-to-date kernel - the latest CentOS 6 is adequate).

From the man page and documentation it appears that from a Linux standpoint the behaviour would simply be, loosely,

extern struct mount_namespace *root_mnt;
int on_mount_filesystem(struct zfs_dataset *databset) {
  if (dataset->property_zoned == B_TRUE && current->mnt_ns != root_mnt) {
    return -EPERM;
  } else {
    // Do allow mount
    return 0;
  }
}

...which has the disadvantage that it doesn't specify WHICH zones allow mounting.

Furthermore under Linux it's typical that containers are restricted to mounting 'safe' filesystems (such as tmpfs, proc and sysfs) which may render this obsolete anyway.

@FransUrbo
Copy link
Contributor Author

Digging through the code (and comparing with the illumos code), I noticed that ZoL is always in a global zone. This because lib/libspl/zone.c:getzoneid() is always returning the name/id of the global zone:

zoneid_t
getzoneid()
{
        return (GLOBAL_ZONEID);
}

zoneid_t
getzoneidbyname(const char *name)
{
        if (name == NULL)
                return (GLOBAL_ZONEID);

        if (strcmp(name, GLOBAL_ZONEID_NAME) == 0)
                return (GLOBAL_ZONEID);

        return (EINVAL);
}

where as illumos does a sys call in usr/src/lib/libc/port/sys/zone.c:

static zoneid_t
zone_lookup(const char *name)
{
        return ((zoneid_t)syscall(SYS_zone, ZONE_LOOKUP, name));
}

zoneid_t
getzoneid(void)
{
        return (zone_lookup(NULL));
}

zoneid_t
getzoneidbyname(const char *zonename)
{
        return (zone_lookup(zonename));
}

This does not seem portable, but on the other hand, I've been unable to figure ot exactly what the zoned property DO...

@FransUrbo
Copy link
Contributor Author

It also seems like the secpolicy_zfs() isn't implemented. That's the part in module/zfs/zfs_ioctl.c:zfs_dozonecheck_impl() that would refuse a non-user to enter the zone fs:

        if (INGLOBALZONE(curproc)) {
                /*
                 * If the fs is zoned, only root can access it from the
                 * global zone.
                 */
                if (secpolicy_zfs(cr) && zoned)
                        return (SET_ERROR(EPERM));
        } else {

In illumos this is defined in usr/src/uts/common/os/policy.c:

/*
 * secpolicy_zfs
 *
 * Determine if the subject has permission to manipulate ZFS datasets
 * (not pools).  Equivalent to the SYS_MOUNT privilege.
 */
int
secpolicy_zfs(const cred_t *cr)
{
        return (PRIV_POLICY(cr, PRIV_SYS_MOUNT, B_FALSE, EPERM, NULL));
}

None of the secpolicy_*() functions is implemented in ZoL at the moment. The're all defined as (0) in https://github.com/zfsonlinux/spl/blob/master/include/sys/policy.h.

@FransUrbo
Copy link
Contributor Author

Depends on issue #228 (super seeded by #434 which have a possible, future fix in PR #619 - closed as stale and needs an update).

@FransUrbo
Copy link
Contributor Author

Trying out the secpolicy branches (https://github.com/maxximino/spl/tree/secpolicy and https://github.com/maxximino/zfs/tree/secpolicy) did (of course, in retrospect) not help because of getzoneid() in ZoL being a 'dummy' function...

Anyone have an idea on how to create one that can detect global/local zone (without using the SYS_zone syscall that doesn't exist in Linux anyway)?

@FransUrbo
Copy link
Contributor Author

INGLOBALZONE() is also a dummy function in ZoL, defined as (1) in include/sys/zfs_context.h but in Illumos there's the added define in usr/src/uts/common/sys/zone.h:

/*
 * Is process in the global zone?
 */
#define INGLOBALZONE(p) \
        ((p)->p_zone == global_zone)

@behlendorf behlendorf added Type: Feature Feature request or new feature Difficulty - Medium labels Mar 9, 2015
@behlendorf
Copy link
Contributor

@FransUrbo I think you've hit on most of they key issues here, let me just try and add a little more context. Basically the zone functionality for ZoL was never implemented because Linux doesn't provide a single direct analog for it. Instead as @DeHackEd mentioned there are a variety of solutions with differing functionality. At the time the most compatible and straight forward thing to do was to disable it entirely and place everything in the global zone. We could then revisit this issue another day.

If today is that day then let's take a step back and decide what problem we're trying to solve. Is it just a feeling that the zoned option should do something under Linux? And if so what exactly should that thing be? Are full blown containers what's needed? Maybe just tighter integration with private namespaces? What's the end goal here and how do we get there?

I think one big step in getting there would be to focus on finalizing the work in #434. Adding support for zfs allow would be a huge improvement in functionality. While people today can (and do) use sudo to accomplish some of this proper support for delegations would be great. So personally, I'd start with this to lay the ground work.

@FransUrbo
Copy link
Contributor Author

Since I don't know exactly what zoned DO, it's hard to really have an opinion if we should do anything about it. From discussions on IRC, FreeBSD don't even have that property. They have the jailed instead. That might be more in line with what Linux is/works...

@lkateley
Copy link

lkateley commented Mar 9, 2015

docker is similar

lk

On 3/9/15 3:03 PM, Kash Pande (Jentu) wrote:

FreeBSD jail is closer in its design to Solaris Zones - i'm not sure
Linux has this same idea of a namespace, and if it doesn't, this
property seems less useful. If there's some different namespace
implementations (LXC, OpenVZ, etc) we can probably have a configurator
to have 'zoned' behave different and look to the specified container
format in 'zonetype'.


Reply to this email directly or view it on GitHub
#3159 (comment).

Linda Kateley
Kateley Company
Skype ID-kateleyco
http://kateleyco.com

@sempervictus
Copy link
Contributor

Rights delegation is probably the keystone to all of this, as @behlendorf seems to be saying. Far as the concepts of jails/zones/containers, i'd say that LXC based namespacing seems most consistent with Solaris Zones, and should probably be the target implementation for the property @ present. However, LXC is a moving target (doing the sniper dance while having seizures), so we may have to maintain ongoing changes to make this work.

Far as the discrepancy between different container implementations, this looks reminiscent of the iSCSI work @FransUrbo so graciously maintained for so long, in that we're attempting to target multiple Linux userspace and kernel functions which do the same thing. Maybe this is where we start making a case for lib/zfs/linuxmonkeys to keep monkey patched wrappers for all the varied implementations of what the "Do it Right" folks built monolithically into Solaris kernel prior to its fall from grace. Zones, sharing mechanisms, and cryptographic components jump out as things which would probably not be so tightly integrated with ZFS if the original development work was targeted at cross-platform implementation. I've no clue if we can, or even should try to, convince Illumos to decouple storage from services, but barking up that tree with Oracle is not even worth discussing.

Anyone have a taste for punishment, or actually know how many service hooks we have in the code today? If OpenZFS (basically Illumos/Delphix and whoever lurks the BSD world) were to agree to push these into a separate layer which we maintain on a per-OS basis, it would make all of this portability much easier, and with consistent rules for how said shims are written, portability of function should be much simpler than full rewrites in the monolithic codebase we see today (again, look @ shareiscsi as an example of how this gets real hard, real fast). For example, we could use this approach to split the known requirements of any future crypto implementation into services and storage - with ZFS holding the cryptographic algos themselves (to ensure bit-level consistency between platforms) and ZIO pipeline code to actually encrypt, while key management and all the associated rigmarole is handled by the individual crypto APIs of each OS.

@FransUrbo
Copy link
Contributor Author

If OpenZFS (basically Illumos/Delphix and whoever lurks the BSD world) were to agree to push these into a separate layer

I'm not seeing this happening any easier than asking Oracle to do it… They
hate everything Linux, and making ANY amends or simplicity for us, isn't
going to happen.

They simply live in the eighties, with their 'one branch for everything, and
everything in a tree structure that resembles the real thing'. The'll NEVER
(!!) start splitting things up and make porting it to other systems easier.
It's simply against their philosophy and knowledge of how the real world
works.

look @ shareiscsi as an example of how this gets real hard, real fast

Actually, this, the third or fourth incarnation of shareiscsi is VERY simple to
both maintain and extend!

Granted, it was a humongous mess in the beginning (because I didn't really
think this through to it's fullest :).

@FransUrbo
Copy link
Contributor Author

The issue get's somewhat more complicated once we've implemented 'this' (either the zoned or the jailed property). We then need to make provisions to support "the other" property and translate it into a/the Linux variant.

@trisk
Copy link
Contributor

trisk commented Jul 21, 2015

Because the mnt namespace is not really a good match, I'm working on a prototype that introduces yet another namespace (modeled loosely after netns) specifically for ownership of ZFS datasets. The intent is to leverage as much of the existing paths for zoned datasets as possible.

@trisk
Copy link
Contributor

trisk commented Aug 21, 2015

These are the SPL-side changes right now: https://github.com/mistifyio/spl/tree/datasetns

@edillmann
Copy link
Contributor

@trisk
I found this article : https://omniti.com/presents/sandboxing-OpenZFS-on-Linux
Is there any news on your side ?

@dalbani
Copy link
Contributor

dalbani commented May 16, 2023

I don't know if that's what the original poster meant, but I can confirm that it's technically possible, in LXD with the yet-to-be-released ZFS 2.2, to use this zoned property to delegate ZFS dataset management to a LXC container.
See https://github.com/lxc/lxd/issues/4184#issuecomment-1545322553 for a work-in-progress report.

@dalbani
Copy link
Contributor

dalbani commented Jun 6, 2023

FYI, I have created a feature request for proper support in LXD at https://github.com/lxc/lxd/issues/11796.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

9 participants