NFS Encryption: RPC-with-TLS as an Alternative to VPN

Tr0jan_Horse

Moderator
Staff member
MODERATOR
ULTIMATE
PREMIUM
MEMBER
Joined
Oct 23, 2024
Messages
304
Reaction score
8,795
Deposit
0$
One day we wondered if it was possible to protect NFS protocol traffic. Well-known methods such as VPN tunnels and various proxies did not interest us. It turned out that RFC 9289 had recently been published , which describes RPC-with-TLS. And we decided to figure out what kind of beast it was.

In this article, we will figure out how to set up encryption for NFS traffic using RPC-with-TLS, what are the nuances and limitations. We will see how to configure tls and mtls, what to do with the tlshd daemon and why it is important not to put spaces in the config. At the same time, we will check how it all works in practice and what will happen if something goes wrong. In general, let's dive headlong into RPC-with-TLS and see what comes of it.
1750001610731.png

A little clarification​

For RPC-with-TLS to work in the Linux kernel (hereinafter simply the kernel), two options must be enabled:

  • CONFIG_TLS, which allows the kernel to work with TLS , but the TLS handshake (hereinafter referred to as handshake) requires the tlshd daemon ;
  • CONFIG_NET_HANDSHAKE, introduced in kernel 6.4, but backported and included in recent rpm-based distributions such as AlmaLinux 9 and Rocky Linux 9.
There are two security policies for RPC-with-TLS: tls and mtls. In short, with tls only the NFS server is authenticated, while with mtls the client is authenticated. This leads to the following requirements: if tls is used, then on the client side the root certificate (CA) of the server certificate must be in the list of trusted ones, and if mtls is used, then on the server side the CA of the client certificate must also be in the list of trusted ones.

The Common Name (CN) of the server certificate must be the server name that the client specifies in the command mount host. mount.nfs. CN of the client certificate does not matter when using mtls for the server.

It is important to correctly configure the owner and rights to certificates and keys: the owner must be root only, the rights to certificates must be 644, and to keys - 600.

Handshake daemon tlshd​

To start the tlshd daemon, a configuration file is needed, by default it is /etc/tlshd.conf. It is very important in this file that there are no spaces at the end of the lines with certificate and key files (as for me, this is a real "childhood disease"), otherwise the daemon will not work. For example, during testing, it turned out that the parameter looked like this: x509.certificate=/root/my_ca/ca1/certificates/worker-1.crt , because of this the handshake did not occur.

What happens on the client with access to the NFS folder if after successful mounting tlshd crashes or restarts on the client or server? Nothing will change, the folder will remain accessible for reading and writing. At the same time, traffic encryption will be preserved, since tlshd is only needed at the time of mounting.

In theory, no one forbids running several instances of this daemon on the host with different configs. However, when mounting an NFS folder, you cannot specify which instance of tlshd to use for handshake in each case.

An experiment was conducted: three tlshd instances with different configs (one valid and two not) were launched on the client and an NFS folder from the server was mounted. The mounting was chaotic - out of 10 attempts, two, or one, or four, or none could work. With each attempt to mount, each tlshd daemon wrote something to its log. This indicates that they were all involved at once. This leads to a limitation: only one client certificate can be used when using mtls. This limitation can be bypassed by launching a tlshd instance with the required config (read: with the required client certificate) each time during mounting and stopping the tlshd daemon after successful mounting, but this looks more like a hack than a good solution.

Testing​

We created a test stand and performed manual testing of mounting NFS folders with different security policies. As a result, we were convinced of the encryption of NFS traffic and looked at mounting errors.

Description of the test stand​

Three virtual machines with Ubuntu 24.04 LTS on board: two NFS servers (server-1 and server-2) with different CAs and one NFS client (worker-1). Three folders were exported on each NFS server /mnt/nfs_none, /mnt/nfs_tls, /mnt/nfs_mtlswith the appropriate security policies:

echo '/mnt/nfs_none *(rw,sync,no_subtree_check)' | sudo tee -a /etc/exports
echo '/mnt/nfs_tls *(rw,sync,no_subtree_check,xprtsec=tls)' | sudo tee -a /etc/exports
echo '/mnt/nfs_mtls *(rw,sync,no_subtree_check,xprtsec=mtls)' | sudo tee -a /etc/exports
Testing was performed for NFSv3, NFSv4.1, NFSv4.2 with self-signed certificates.

Encrypted NFS traffic​

This is how encryption was tested on the NFS server: tshark -i eth0 -f 'tcp and port 2049'.

Folder mounting moment:

1 0.000000000 10.1.1.13 → 10.1.1.11 TCP 74 674 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=398680558 TSecr=0 WS=128
2 0.000028999 10.1.1.11 → 10.1.1.13 TCP 74 2049 → 674 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM TSval=2420952483 TSecr=398680558 WS=128
3 0.000213423 10.1.1.13 → 10.1.1.11 TCP 66 674 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=398680559 TSecr=2420952483
4 0.000213494 10.1.1.13 → 10.1.1.11 NFS 110 V4 NULL Call
5 0.000248597 10.1.1.11 → 10.1.1.13 TCP 66 2049 → 674 [ACK] Seq=1 Ack=45 Win=65152 Len=0 TSval=2420952483 TSecr=398680559
6 0.000331695 10.1.1.11 → 10.1.1.13 NFS 102 V4 NULL Reply (Call In 4)
7 0.000461213 10.1.1.13 → 10.1.1.11 TCP 66 674 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=398680559 TSecr=2420952483
8 0.015183491 10.1.1.13 → 10.1.1.11 TLSv1 417 Client Hello (SNI=server-1)
9 0.015853138 10.1.1.11 → 10.1.1.13 TLSv1.3 264 Server Hello, Change Cipher Spec
10 0.016537570 10.1.1.13 → 10.1.1.11 TLSv1.3 72 Change Cipher Spec
11 0.029186413 10.1.1.11 → 10.1.1.13 TLSv1.3 7306 Application Data
12 0.029199606 10.1.1.11 → 10.1.1.13 TCP 7306 2049 → 674 [PSH, ACK] Seq=7475 Ack=402 Win=64896 Len=7240 TSval=2420952512 TSecr=398680575 [TCP segment of a reassembled PDU]
13 0.029770565 10.1.1.13 → 10.1.1.11 TCP 66 674 → 2049 [ACK] Seq=402 Ack=7475 Win=78592 Len=0 TSval=398680588 TSecr=2420952512
14 0.029770658 10.1.1.13 → 10.1.1.11 TCP 66 674 → 2049 [ACK] Seq=402 Ack=14715 Win=92160 Len=0 TSval=398680588 TSecr=2420952512
15 0.029790811 10.1.1.11 → 10.1.1.13 TLSv1.3 2255 Application Data, Application Data, Application Data, Application Data
16 0.029978228 10.1.1.13 → 10.1.1.11 TCP 66 674 → 2049 [ACK] Seq=402 Ack=16904 Win=96512 Len=0 TSval=398680588 TSecr=2420952512
17 0.046021972 10.1.1.13 → 10.1.1.11 TLSv1.3 1923 Application Data, Application Data, Application Data
18 0.046070545 10.1.1.11 → 10.1.1.13 TCP 66 2049 → 674 [ACK] Seq=16904 Ack=2259 Win=63104 Len=0 TSval=2420952529 TSecr=398680604
19 0.049060919 10.1.1.13 → 10.1.1.11 TLSv1.3 132 Application Data
20 0.049120160 10.1.1.11 → 10.1.1.13 TLSv1.3 116 Application Data
21 0.049489753 10.1.1.13 → 10.1.1.11 TLSv1.3 348 Application Data
22 0.049569299 10.1.1.11 → 10.1.1.13 TLSv1.3 192 Application Data
...
The moment a line is written to a file located in an NFS folder:

82 24.007203648 10.1.1.13 → 10.1.1.11 TLSv1.3 272 Application Data
83 24.007358680 10.1.1.11 → 10.1.1.13 TLSv1.3 260 Application Data
84 24.007529719 10.1.1.13 → 10.1.1.11 TCP 66 792 → 2049 [ACK] Seq=8269 Ack=23090 Win=94976 Len=0 TSval=398803883 TSecr=2421075807
85 24.007641431 10.1.1.13 → 10.1.1.11 TLSv1.3 372 Application Data
86 24.007724635 10.1.1.11 → 10.1.1.13 TLSv1.3 468 Application Data
87 24.007999216 10.1.1.13 → 10.1.1.11 TLSv1.3 316 Application Data
88 24.011385927 10.1.1.11 → 10.1.1.13 TLSv1.3 276 Application Data
89 24.011745229 10.1.1.13 → 10.1.1.11 TLSv1.3 296 Application Data
90 24.011833565 10.1.1.11 → 10.1.1.13 TLSv1.3 268 Application Data
91 24.052731556 10.1.1.13 → 10.1.1.11 TCP 66 792 → 2049 [ACK] Seq=9055 Ack=23904 Win=94976 Len=0 TSval=398803929 TSecr=2421075812

Mounting errors​

If the client does not have the server CA certificate in its trusted list during installation, the error will be as follows:

# mount.nfs -o xprtsec=mtls,nfsvers=4.2 server-1:/mnt/nfs_mtls /mnt/nfs_mtls
mount.nfs: access denied by server while mounting server-1:/mnt/nfs_mtls

Interesting feature​

There is a folder on the NFS server that is exported via mtls. If the client specifies the tls policy and NFS protocol version 3 when mounting, the mounting will be successful, but there will be no access to the folder.

We mount:

# mount.nfs -o xprtsec=tls,nfsvers=3 server-1:/mnt/nfs_mtls /mnt/nfs_mtls
#
No access:

# ls -l /mnt/nfs_mtls/
ls: cannot open directory '/mnt/nfs_mtls/': Permission denied
With NFS 4.1 and 4.2 there is no such behavior - the folder is simply not mounted:

# for i in 4.1 4.2;do mount.nfs -o xprtsec=tls,nfsvers=$i server-1:/mnt/nfs_mtls /mnt/nfs_mtls ;done
mount.nfs: Operation not permitted for server-1:/mnt/nfs_mtls on /mnt/nfs_mtls
mount.nfs: Operation not permitted for server-1:/mnt/nfs_mtls on /mnt/nfs_mtls

Limit for one NFS server​

If you mount several NFS folders on the same client via tls or mtls (in the example below mtls is used) from the same NFS server, then everything is mounted without problems:

# for i in 11 22 33;do mount.nfs -o nfsvers=4.2,xprtsec=mtls server-1:/mnt/nfs_mtls/$i /mnt/nfs_mtls/$i ;done
#
In this case, the handshake will only happen once - when mounting the first folder. In the tlshd log, this is visible like this:

tlshd[7776]: handshake with server-1 (10.1.1.11) was successful
After this, it will not be possible to mount another folder from the same NFS server via tls (or mtls, if the first time the mount was via tls).

# for i in 11 22 33;do mount.nfs -o nfsvers=4.2,xprtsec=tls server-1:/mnt/nfs_tls/$i /mnt/nfs_tls/$i ;done
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/11 on /mnt/nfs_tls/11
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/22 on /mnt/nfs_tls/22
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/33 on /mnt/nfs_tls/33
The most interesting thing is that the handshake will be successful:

tlshd[7807]: handshake with server-1 (10.1.1.11) was successful
tlshd[7815]: handshake with server-1 (10.1.1.11) was successful
tlshd[7820]: handshake with server-1 (10.1.1.11) was successful
But you can mount folders without RPC-with-TLS without any problems:

# for i in 11 22 33;do mount.nfs -o nfsvers=4.2 server-1:/mnt/nfs_none/$i /mnt/nfs_none/$i ;done
#
But if you change the mount order - first without RPC-with-TLS, and then with any RPC-with-TLS policies, then mounting with RPC-with-TLS will not occur:

# for i in 11 22 33;do mount.nfs -o nfsvers=4.2 server-1:/mnt/nfs_none/$i /mnt/nfs_none/$i ;done
# for i in 11 22 33;do mount.nfs -o nfsvers=4.2,xprtsec=tls server-1:/mnt/nfs_tls/$i /mnt/nfs_tls/$i ;done
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/11 on /mnt/nfs_tls/11
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/22 on /mnt/nfs_tls/22
mount.nfs: Operation not permitted for server-1:/mnt/nfs_tls/33 on /mnt/nfs_tls/33
# for i in 11 22 33;do mount.nfs -o nfsvers=4.2,xprtsec=mtls server-1:/mnt/nfs_mtls/$i /mnt/nfs_mtls/$i ;done
mount.nfs: Operation not permitted for server-1:/mnt/nfs_mtls/11 on /mnt/nfs_mtls/11
mount.nfs: Operation not permitted for server-1:/mnt/nfs_mtls/22 on /mnt/nfs_mtls/22
mount.nfs: Operation not permitted for server-1:/mnt/nfs_mtls/33 on /mnt/nfs_mtls/33
This results in a limitation: you cannot simultaneously mount folders from the same NFS server, mixing tls, mtls and unencrypted security policies.

Conclusion​

RPC-with-TLS is an interesting way to protect NFS traffic, but it has its limitations. For example, you can't mount folders with different security policies (tls, mtls, and no encryption) from one server at the same time. And when using mtls, you can only use one client certificate. However, encryption works stably, and even if the tlshd daemon crashes after mounting, access to the folder and traffic protection will remain.

The only native alternative is Kerberos in krb5p mode, but it has its drawbacks: complex user authentication setup and a strong performance drop .

By the way, in Deckhouse Kubernetes Platform , in the csi-nfs module , starting from version v0.2.0, there is already support for RPC-with-TLS.
 
Top Bottom