blog/access-control.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545

* Access Control
On the Linux desktop, most people are running apps with zero access control enforced, meaning that running the Discord client for example, can ~rm -rf ${HOME}~ or slurp up your private data such as your ssh keys. This is not great, but there are several options available to address this problem.

Some of the information I shared here could be incorrect. Please contact me and let me know if anything is wrong, I am not an expert on any of these subjects.

* Example Program
I'm going to use weechat as an example of a program that we want to isolate from the rest of the system. It's simple but not trivial, and touches a lot of commonly shared directories like ~XDG_CONFIG_DIR~, ~XDG_STATE_DIR~ and similar.

At it's core, Weechat needs access to a few directories and the network. We will focus on the directories.

Here are the directories/files it needs write access to:

#+BEGIN_SRC
  ${HOME}/.config/weechat
  ${HOME}/.cache/weechat
  ${HOME}/.local/share/weechat
  ${HOME}/.local/state/weechat
  ${XDG_RUNTIME_DIR}/weechat
#+END_SRC

It also needs read-only access to some system files such as (assuming a merged-usr system):
#+BEGIN_SRC
  /etc/ld.so.cache # dynamic loader cache
  /usr/lib{,32,64}
  /usr/bin/weechat # weechat executable itself
  /usr/share
#+END_SRC

* Our Options
There are multiple options available to do access control on Linux, but I'm going to cover namespaces (with bubblewrap), Apparmor and SELinux.

** Bubblewrap
Bubblewrap is a small C utility used to setup mount namespaces for sandboxing and container purposes.

Mount namespaces are a way to control processes views of mount points, meaning processes in different mount namespaces cannot see each others mounts. A simple example would be mounting a USB drive to ~/mnt/usb~. If you mount the USB drive in a separate mount namespace, other processes will not see anything mounted to ~/mnt/usb~ at all.

Bubblewrap works by creating a new mount namespace, and then creates a new root mountpoint on a tmpfs (similar to a chroot), and bind mounts in the directories and files provided by the command line parameters.

Let's see an example of how we actually do this:

#+BEGIN_SRC
  #!/bin/bash

  args=(
    --unshare-all
    --share-net
    --dev /dev
    --proc /proc
    --tmpfs /tmp
    --tmpfs /run
    --tmpfs /var
    --tmpfs /mnt/sandbox
    --ro-bind /etc/ld.so.cache /etc/ld.so.cache
    --ro-bind /usr /usr
    --ro-bind /bin /bin
    --ro-bind /sbin/ /sbin
    --ro-bind /lib /lib
  )
  
  # handle lib32 and lib64 for some systems
  [[ -e /lib32 ]] && args+=(--ro-bind /lib32 /lib32)
  [[ -e /lib64 ]] && args+=(--ro-bind /lib64 /lib64)

  exec bwrap ${args[@]} /bin/sh
#+END_SRC

Running this script should drop you into a shell in the sandbox.

You won't be able to access much since almost everything is mounted read only, but there are writable tmpfs mounts. The tmpfs mount points will not persist across runs, and get deleted when the sandbox is destroyed.

This isn't super useful but it shows a simple example. Now lets adapt this to run weechat!

#+BEGIN_SRC
  #!/bin/bash

  # setup the core bind mounts
  args=(
    --unshare-all
    --share-net
    --dev     /dev
    --proc    /proc
    --tmpfs   /tmp
    --tmpfs   /run
    --tmpfs   /var
    --tmpfs   /mnt/sandbox
    --ro-bind /etc/ld.so.cache  /etc/ld.so.cache
    --ro-bind /usr   /usr
    --ro-bind /bin   /bin
    --ro-bind /sbin/ /sbin
    --ro-bind /lib   /lib
  )

  # handle lib32 and lib64 for some systems
  [[ -e /lib32 ]] && args+=(--ro-bind /lib32 /lib32)
  [[ -e /lib64 ]] && args+=(--ro-bind /lib64 /lib64)

  # weechat specific bind mounts (make sure these exist before running the script)
  args+=(
    --tmpfs ${HOME}
    --bind  ${HOME}/.config/weechat      ${HOME}/.config/weechat
    --bind  ${HOME}/.cache/weechat       ${HOME}/.cache/weechat
    --bind  ${HOME}/.local/share/weechat ${HOME}/.local/share/weechat
    --bind  ${HOME}/.local/state/weechat ${HOME}/.local/state/weechat
  )

  exec bwrap ${args[@]} /usr/bin/weechat
#+END_SRC

Hopefully weechat starts up. Now it will only have read only access to most of the system, and will not be able to access anything else in your ~${HOME}~, such as your ssh keys.

You may want to adapt this script to bind in other things, but this should at least give you a start.

There are some caveats with bwrap based sandboxing. The primary issue is that it requires "root" to create mount namespaces. You might wonder why you were able to run without root before, this is because bwrap created a user namespace.

User namespaces are similar to mount namespaces, but they unshare IDs rather than mount points. This means you can become UID 0 (root) inside of a sandbox, and perform actions that normally require root access, but outside of the sandbox you are still not-root and have no extra privileges.

User namespaces involve ID mapping. For example, UID 1000 may be mapped to UID 0 inside of the container. Most Linux systems also have a reserved range of IDs for each user, dedicated for mapping into user namespaces. My system has ~notroot:100000:65536~ dedicated for user ~notroot~. So all UIDs between 100000 and 165536 are reserved for this purpose. If you map 1000:0:1 and 100000:1:65535, files created inside of the sandbox by root will appear as owned by UID 1000 outside, and files owned by UID 1000 in the sandbox will be seen as UID 100999 outside. IDs that are not mapped will be seen as "nobody" inside of the sandbox.

ID mapping is confusing for me personally, but ~bwrap~ has some flags to help you setup trivial mappings that should work for a lot of simple use cases.

~bwrap~ can also unshare ipc, pid, net, uts and cgroup namespaces, which all work similar to the namespaces described above, and provide isolation for things beyond files which is also an important aspect of sandboxing.

** Apparmor
Apparmor is a "Linux Security Module" (LSM), and a mandatory access control (MAC) system.

MAC is different from discretionary access control (DAC) in that a central authority controls the rules, instead of owners of the resource.

Apparmor is a path based LSM. Apparmor profiles define a list of paths that a process can or can't access. The profile syntax supports glob-like "patterns" for matching specific paths that the process might try to access as runtime.

Lets show an example of an Apparmor profile for our IRC client:

#+BEGIN_SRC
  #include <tunables/global>

  profile weechat /usr/bin/weechat {
    #include <abstractions/base>

    # read only shared system resources
    /etc/fonts/** r,
    /usr/share/** r,
  
    owner @{HOME}/.config/weechat/ rw,
    owner @{HOME}/.config/weechat/** rw,

    owner @{HOME}/.cache/weechat/ rw,
    owner @{HOME}/.cache/weechat/** rw,

    owner @{HOME}/.local/share/weechat/ rw,
    owner @{HOME}/.local/share/weechat/** rw,

    owner @{HOME}/.local/state/weechat/ rw,
    owner @{HOME}/.local/state/weechat/** rw,

    owner @{XDG_RUNTIME_DIR}/weechat/ rw,
    owner @{XDG_RUNTIME_DIR}/weechat/** rw,     
  }
#+END_SRC

The first part of the profile simply includes a file (via the c-pre-processer) that has "tunables" such as ~@{HOME}~ predefined.

The second part of the profile (the ~profile weechat~ part) defines a profile for the ~/usr/bin/weechat~ executable. Apparmor transitions into confined mode when a process executes an executable that matches the ~/usr/bin/weechat~ pattern (globs are supported here).

The third part of the profile includes the ~base~ abstraction. ~base~ gives access to all of the basic things all processes will need to run at all, such as access to ~/usr/lib~ or ~/dev/null~. You can technically define these all yourself, but it's quite a lot of boilerplate, and the base should work for most use cases.

The rest of the profile defines path and patterns and access rules for them. Weechat will only be able to access the paths you defined and the things defined in ~base~ with this profile.

Apparmor is very simple and easy to get started with, but does have a few flaws.

The primary flaw is that apparmor is *path based* rather than *inode based*. Hardlinks of files could allow bypassing the apparmor rules, depending on the exact situation. Apparmor disallows creating links by default though, so the hardlinks would have to be created by something unconfined or that was explictly allowed.

By default, apparmor prevents you from being able to *execute* the paths you gave access to. There are a few ways to give *execute* permissions.

*** Execute Modes
 - ~ix~ starts the subproc under the current profile
 - ~ux~ starts the subproc unconfined
 - ~px~ starts the subproc under a profile that matches the executable path
 - ~cx~ starts the subproc under a subprofile

*** Caveats
Until Linux 6.17, apparmor will not be fully functional without Ubuntu kernel patches.

The primary missing feature I am aware of is the ability to restrict access to unix sockets.

** SELinux
Selinux is another MAC based LSM, it's however quite different from apparmor.

*** Labels
Selinux access control works by labeling subjects (processes) and objects (files etc) with "types", this information is stored in the files xattrs, an example label is "~sys.id:sys.role:sys.subj:s0~".

Unlike apparmor, Selinux is inode based rather than path based, so hardlinks can't be used as loopholes.

The first part of the label is the *user*, the second is the *role* and the third is the *type*. Mostly we are going to ignore users and roles and focus on types for this.

*** Commands & Utils
The Selinux userland comes with many utilites and figuring out what they do and why you would want them is not easy to figure out.

**** sestatus
~sestatus~ is a simple command that tells you whether SELinux is currently active, and whether it's in permissive or enforcing mode. There isn't much more to it, but it's handy to detect if SELinux is currently active.

**** restorecon
~restorecon~ applies *filecon* rules to your files. *filecon* is an expression in policy like this:

#+BEGIN_SRC
  (filecon "/home/john/.*")
#+END_SRC

These expressions are compiled and the end result is a file called ~file_contexts~, and normally installed into the policy config (e.g ~/etc/selinux/${SELINUXTYPE}~).

The modular policy system also keeps track of *filecon* expressions, so you don't need to change the policy config files everytime you want to update the rules.

Using ~restorecon~:
#+BEGIN_SRC
  # recursivly apply file contexts to the entire filesystem
  restorecon -Rv /

  # restore a single file
  restorecon -v /home/john/foo.txt
#+END_SRC

**** setfiles
~setfiles~ uses the ~file_contexts~ file mentioned before to label mountpoints. The default context for files is inherited from the mountpoint (afaik this is how it works?).

When using ~setfiles~, you probably want to bind mount your root filesystem somewhere, like ~/mnt/gentoo~. Otherwise you may not apply the contexts to the mount points themselves.

Hint: BTRFS subvolumes also count as mount points, and nested subvolumes can be a little confusing

This is how I used setfiles for my system:

#+BEGIN_SRC
setfiles -v \
  -r /mnt/gentoo \
  /etc/selinux/${SELINUXTYPE:-dssp5}/contexts/files/file_contexts \
  /mnt/gentoo/{,dev,proc,run,sys,tmp,boot,efi,etc,var,home} \
  /mnt/gentoo/mnt/subvolumes/var/{cache,tmp} \
  /mnt/gentoo/mnt/subvolumes/home/notroot \
#+END_SRC

I have the following subvolumes:

#+BEGIN_SRC
  /mnt/subvolumes/etc
  /mnt/subvolumes/var
  /mnt/subvolumes/var.cache
  /mnt/subvolumes/var.tmp
  /mnt/subvolumes/home
  /mnt/subvolumes/home.notroot  
#+END_SRC

Some of the subvolumes end up mounted on top of each other, like ~/mnt/subvolumes/home~ is mounted at ~/home~, and ~/mnt/subvolumes/home.notroot~ is mounted at ~/home/notroot~, so this means the "raw mount point" is actually ~/mnt/gentoo/mnt/subvolumes/home/notroot~ *not* ~/mnt/gentoo/home/notroot~. This is pretty confusing and easy to get wrong.

**** getpathcon and matchpatchon
~matchpathcon~ reads your ~file_contexts~ and shows you the default label for the paths provided.

#+BEGIN_SRC
  matchpathcon /home/john
  matchpathcon '/var/log/.*'
#+END_SRC

~getpathcon~ just gets the current context for a file.

#+BEGIN_SRC
  getpathcon /home/john
#+END_SRC

**** semodule
SELinux can load policy in two different ways. "monolithic" and "modular". Monolithic loading is mostly designed for embedded systems and can be ignored for now.

~semodule~ is an interface to the "modular" SELinux policy store. You can load modules at runtime, dynamically, and even version control modules.

You can load cil files directly with ~semodule~, each cil file corresponds to a single module. Modules loaded with with ~semodule~ are stored at ~/var/lib/selinux/${SELINUXTYPE}/active/modules/~.

Hint: you can't have two cil files with the same name even if they are in different directories without
clobbering your modules.

List all currently install modules:

#+BEGIN_SRC
  semodule -l
#+END_SRC

Load modules:

#+BEGIN_SRC
  semodule -i foo.cil bar.cil baz.cil
#+END_SRC

Remove modules:

#+BEGIN_SRC
  semodule --remove foo bar baz 
#+END_SRC

*** Dssp5
This post is going to assume we are basing our policy [[https://salsa.debian.org/dgrift/dssp5/][dssp5]], a minimal and modular base policy that we create our own types on top of. [[https://salsa.debian.org/dgrift/dssp5/][dssp5]] provides the core types.

*** Built In Types
[[https://salsa.debian.org/dgrift/dssp5/][dssp5]] provides many core types that we will build our policy on top of.

An example of a core type is ~home.file~. This is a type applied to home directories such as ~/home/john~. There are many base types for various parts of the filesystem.

Here are some major built in types:

 - ~conf.file~ for ~/etc~
 - ~lib.file~ for ~/usr/lib~
 - ~exec.file~ for ~/usr/bin~
 - ~run.file~ for ~/run~
 - ~var.file~ for ~/var~

There are also "subtypes" for some of these built in types like ~spool.var.file~ for ~/var/spool~.

*** How Do Files Get Typed
**** Setfiles
Mount points will be labeled with ~setfiles~, and any new files created underneath that mount point should inherit the label by default. This is default label for files that don't have a filecon defined.

**** Filecon
In Selinux policy you will define ~filecon~ expressions like this (ignore the other parts for now):

#+BEGIN_SRC
  (block var
    (blockherit .file.template)
    (filecon "/var" dir file_context)
    (filecon "/var/.*" file file_context))  
#+END_SRC

After compiling and loading the policy, you would use the built in ~restorecon~ command to apply these labels.

**** Type Transitions
Also files can change types via type transitions at runtime. An example for weechat, we want all of the runtime files weechat creates to be labeled ~agent.weechat~ or similar, so we define a type transition in the weechat selinux module:

#+BEGIN_SRC
  (call .agent.weechat.run.file_type_transition_file (.agent.weechat.subj dir "weechat"))
  (call .agent.weechat.run.file_type_transition_file (.agent.weechat.subj file "*"))  
#+END_SRC

(Don't worry if you don't understand this yet, we will learn more about the *cil* language in a bit.)

Another example would be transitioning from one context to another when executing something. In our later policy, running the weechat executable causes a type transition from ~sys.subj~ to ~weechat.subj~.

*** How Do Processes Get Typed
With dssp5, processes will start in the ~sys.subj~ context which is basically unconfined and has access to everything. Processes change types via type transitions or with ~runcon~. We will go over type transitions a bit more later when we define the weechat module.

#+BEGIN_SRC
  (sidcontext init (sys.id sys.role sys.subj sys.lowlow)) ;; userspace_initial_context
#+END_SRC

*** Cil Overview
Cil is the language we will write policy in. It's a simple sexpr based language, with namespaces, types, typeattributes (metatypes), macros and templates.

**** Cil Types
We can define types like this:

#+BEGIN_SRC
  (type foo)
#+END_SRC

**** Cil Namespaces
In cil we will almost always be working in a namespace.

We can define a namespace with the block keyword:

#+BEGIN_SRC
  (block foo
    (block bar))
#+END_SRC

If a block has already been created and you want to "enter" it, you use the "in" keyword

#+BEGIN_SRC
  (in .foo.bar)
#+END_SRC

You access types with the ~.~ operator. A dot at the beginning of the expression starts searching from the "top" namespace rather than looking for a type in the current namespace.

#+BEGIN_SRC
  (in foo.bar
    (macro baz ((type ARG1))
      (do_something_with ARG1))

    ;; define a type
    (type qux)

    ;; call our macro using local lookup
    (call baz (qux))

    ;; call our macro using global lookup
    (call .foo.bar.baz (.foo.bar.qux))
#+END_SRC

We will make great use of namespaces in our policy!

**** Macros
Macros are sort of like functions. Macros "capture" local types similar to lambdas and interpolate parameters into expressions.

#+BEGIN_SRC
  (block foo
    (type bar)

    ;; define our macro (we will cover typeattributes soon)
    (macro test ((type ARG1))
      (typeattributeset bar ARG1)))

  (block baz
    (type qux)
    ;; call our macro
    (call .foo.test (qux)))
#+END_SRC

**** Templates
Templates are blocks that are inherited by other blocks.

Abstract blocks are blocks which only exist once they are inherited.

You can think of abstract blocks like inheritance and OOP in programming.

#+BEGIN_SRC
  (block foo
    ;; define our abstract block (template)
    (block bar
      (blockabstract bar)
      ;; define a type
      (type t)))

  (block baz
    ;; inherit the bar block, now the t type will be created and in scope
    (blockinherit .foo.bar)
    (dothing t))
#+END_SRC

Hint: abstract blocks are very commonly used to define types, so you will often not be defining ~(type foo)~ directly, but instead letting the templates do the work for you.

We will make great use of the built in templates for almost everything we do.

**** Type Attributes
Type attributes are like "metatypes". They are used to group types together for shared behaviour.

An example here:

#+BEGIN_SRC
  (in file
    (block user
      (macro type ((type ARG1))
        ;; since its a macro we can use things before they are defined
        (typeattributeset typeattr ARG1))
  
      ;; create the type attribute
      (typeattribute typeattr)

      ;; our typeattr can be associated with another one as well
      (call .file.home.type (typeattr))

      (block base_template
        (blockabstract base_template)
        (blockinherit .file.base_template)
        ;; remember the file type is introduced via the template
        ;; associate the file type with the userfile.typeattr
        (call .userfile.type (file)))))

  (block ssh
    (blockinherit .file.user.template
    ;; file and file_context are also introduced via the .userfile.base_template, which inherits
    ;; from .file.base_template (layers of templates like this is important for
    ;; abstracting out boilerplate)
    (filecon "HOME_DIR/\.ssh" dir file_context)
    (filecon "HOME_DIR/.ssh/.*" file file_context)))

  (block gpg
    (blockinherit .file.user.template
    (filecon "HOME_DIR/\.gnupg" dir file_context)
    (filecon "HOME_DIR/.gnupg/.*" file file_context)))

  ;; Now we can give something access to all userfiles instead of listing each type.
  (block userdel
    (blockinherit .subj.template)
    ;; allow access to all userfiles including ssh and gpg files
    (call .file.user.type (subj)))
#+END_SRC

A good example for the usefulness of type attributes is the program ~userdel~, this needs access to ~${HOME}~ and all user files underneath. If each type (ssh, gpg, foo) were not associated with the ~file.home.typeattr~ (via associating with ~.userfile.typeattr~), policy for ~userdel~ would need to manually allow each type to do it's job.

Typeattributes are one of the most important things for abstracting out behavior. You can create hierarchies of types in a way similar to OOP.

**** Type Transitions
Type transitions are rules in policy that control how types change at runtime. A common desire would be to have files created by weechat end up with a weechat label, or entering a new context when executing something.

I do not fully understand how these work internally, but I will show an example of how to do this:

#+BEGIN_SRC
  (block weechat
    (block run
      (macro file_type_transition_file ((type ARG1) (class ARG2) (name ARG3))
        (call .user.run.file_type_transition (ARG1 file ARG2 ARG3)))

      ;; inherit the template for files in /var/user/${UID}
      (blockinherit .file.user.run.template)))

  (call .agent.weechat.run.file_type_transition_file (.agent.weechat.subj dir "weechat"))
  (call .agent.weechat.run.file_type_transition_file (.agent.weechat.subj file "*"))
#+END_SRC

This will cause files created in the weechat context, under ~/var/run/${UID}~ to be transitioned into the ~agent.weechat.run.file~ type, rather than the "default" ~user.var.file~ (the default type depends on your policy, this is just an example).

*** Policy
Lets write some policy now!

**** XDG Directories

We want to create some new types for the directories weechat requires access to.

#+INCLUDE: "access-control/xdgfile.cil" src

**** Loading Policy

You can load dssp5 policy up with:

#+BEGIN_SRC
  make modular_install
#+END_SRC

Next run ~restorecon~ to apply our new labels (this could take a while):

#+BEGIN_SRC
  restorecon -Rv /
#+END_SRC

If everything went as planned you should be able to use ~ls -alZ ${HOME}~ to see your new labels.

**** Weechat Policy

Define policy for weechat itself:

#+INCLUDE: "access-control/weechat.cil" src

In dssp5 you will notice that we rarely write ~allow~ rules directly, we use macros and templates to do the heavy lifting when we can. The templates and macros can be a little confusing at first but they make sense once you start to use them for your modules.

Selinux is by far the most verbose of the options I listed, but also the most powerful and flexible, and IMO the most fun.

**** Todo
For your real policy you want to create abstractions for common behaviour to cut down on the boilerplate.

A large part of the weechat module could be abstracted out into a new .subj.common module. Common behavor like accessing your own files and accessing things that every process will need like the dynamic loader and system libraries.

With dssp5 it's up to you to build up abstractions, it only provides a base.

* Questions
If you have any questions or problems you can email me (my contact info is on my front page), or join the ~#selinux~ channel on [[https://irc.libera.chat]].