This is weird and I have run out of places to look. Any advice would be most welcome.
There are three hosts involved on the same VLAN as follows:
trillian (192.168.1.3/24)
- VM running Alma Linux 9.3, installed about a month ago as 9.2 and upgraded. This is an NFS server and DNS server. It cannot access its own NFS shares. This replaced an older trillian (CentOS Stream 8) that didn't have this problem.marvin (192.168.1.2/24)
- physical running CentOS Stream 8 that runs the two VMs under libvirt/QEMU/KVM with a bridged network for the VMs. This can access NFS shares ontrillian
agrajag (192.168.1.126/24)
- VM running AlmaLinux 9.3 (built today). Tis cannot access NFS shares ontrillian
.
There are several other physical and virtual Fedora 39s none of which have an issue accessing the NFS share.
The NFS server config is unchanged from the default, apart from /etc/exports
, which has this entry:
/srv/shares/steve localhost(rw) trillian.purplehayes.uk(rw) marvin.purplehayes.uk(rw) jabberwock.purplehayes.uk(rw) jubjub.purplehayes.uk(rw) dormouse.purplehayes.uk(rw) agrajag.purplehayes.uk(rw) arthur.purplehayes.uk(rw) alice.purplehayes.uk(ro) tweedledum.purplehayes.uk(ro) tweedledee.purplehayes.uk(ro)
fstab entries look like:
trillian:/srv/shares/steve /mnt/nfs/steve nfs defaults 0 0
The problem is that the two Alma Linux 9.3s (one of which is the NFS server) cannot mount the export: it always fails with mount.nfs: access denied by server while mounting trillian:/srv/shares/steve
A wireshark trace shows the same NFSv4 sequence of LOOKUP/GETATTR/ACCESS for /srv
then /srv/shares
then finally srv/shares/steve
. The failure occurs in the response to the third ACCESS command: it should return 0x1df
and it returns 0x000
, i.e., all access denied.
A diff of a Wireshark dissection shows the difference that causes the error: this is two packets from each of two captures, good first, bad second.
2c2
< 167 15:49:52.104792256 marvin.purplehayes.uk 731 trillian.purplehayes.uk 2049 NFS 278 V4 Call (Reply In 168) ACCESS FH: 0xfb898914, [Check: RD LU MD XT DL XAR XAW XAL]
---
> 114 12:27:07.730078275 agrajag.purplehayes.uk 805 trillian.purplehayes.uk 2049 NFS 266 V4 Call (Reply In 115) ACCESS FH: 0xfb898914, [Check: RD LU MD XT DL XAR XAW XAL]
4,8c4,8
< Frame 167: 278 bytes on wire (2224 bits), 278 bytes captured (2224 bits) on interface br1, id 0
< Ethernet II, Src: 54:52:00:01:02:01 (54:52:00:01:02:01), Dst: RealtekU_01:03:00 (52:54:00:01:03:00)
< Internet Protocol Version 4, Src: marvin.purplehayes.uk (192.168.1.2), Dst: trillian.purplehayes.uk (192.168.1.3)
< Transmission Control Protocol, Src Port: 731, Dst Port: 2049, Seq: 7137, Ack: 7057, Len: 212
< Remote Procedure Call, Type:Call XID:0x791d8bd7
---
> Frame 114: 266 bytes on wire (2128 bits), 266 bytes captured (2128 bits) on interface br1, id 0
> Ethernet II, Src: RealtekU_01:7e:00 (52:54:00:01:7e:00), Dst: RealtekU_01:03:00 (52:54:00:01:03:00)
> Internet Protocol Version 4, Src: agrajag.purplehayes.uk (192.168.1.126), Dst: trillian.purplehayes.uk (192.168.1.3)
> Transmission Control Protocol, Src Port: 805, Dst Port: 2049, Seq: 6510, Ack: 6937, Len: 200
> Remote Procedure Call, Type:Call XID:0x99b4e2d3
16,17c16,17
< sessionid: 5d386f652160d94b0700000000000000
< seqid: 0x00000023
---
> sessionid: 06ee6e65c8a3011bac00000000000000
> seqid: 0x00000242
46c46
< 168 15:49:52.104911178 trillian.purplehayes.uk 2049 marvin.purplehayes.uk 731 NFS 238 V4 Reply (Call In 167) ACCESS, [Allowed: RD LU MD XT DL XAR XAW XAL]
---
> 115 12:27:07.730147846 trillian.purplehayes.uk 2049 agrajag.purplehayes.uk 805 NFS 238 V4 Reply (Call In 114) ACCESS, [Access Denied: RD LU MD XT DL XAR XAW XAL]
48,53c48,52
< Packet comments
< Frame 168: 238 bytes on wire (1904 bits), 238 bytes captured (1904 bits) on interface br1, id 0
< Ethernet II, Src: RealtekU_01:03:00 (52:54:00:01:03:00), Dst: 54:52:00:01:02:01 (54:52:00:01:02:01)
< Internet Protocol Version 4, Src: trillian.purplehayes.uk (192.168.1.3), Dst: marvin.purplehayes.uk (192.168.1.2)
< Transmission Control Protocol, Src Port: 2049, Dst Port: 731, Seq: 7057, Ack: 7349, Len: 172
< Remote Procedure Call, Type:Reply XID:0x791d8bd7
---
> Frame 115: 238 bytes on wire (1904 bits), 238 bytes captured (1904 bits) on interface br1, id 0
> Ethernet II, Src: RealtekU_01:03:00 (52:54:00:01:03:00), Dst: RealtekU_01:7e:00 (52:54:00:01:7e:00)
> Internet Protocol Version 4, Src: trillian.purplehayes.uk (192.168.1.3), Dst: agrajag.purplehayes.uk (192.168.1.126)
> Transmission Control Protocol, Src Port: 2049, Dst Port: 805, Seq: 6937, Ack: 6710, Len: 172
> Remote Procedure Call, Type:Reply XID:0x99b4e2d3
62,63c61,62
< sessionid: 5d386f652160d94b0700000000000000
< seqid: 0x00000023
---
> sessionid: 06ee6e65c8a3011bac00000000000000
> seqid: 0x00000242
70c69
< Opcode: ACCESS (3), [Allowed: RD LU MD XT DL XAR XAW XAL]
---
> Opcode: ACCESS (3), [Access Denied: RD LU MD XT DL XAR XAW XAL]
81,89c80,88
< Access rights (of requested): 0x1df
< .... ...1 = 0x001 READ: allowed
< .... ..1. = 0x002 LOOKUP: allowed
< .... .1.. = 0x004 MODIFY: allowed
< .... 1... = 0x008 EXTEND: allowed
< ...1 .... = 0x010 DELETE: allowed
< .1.. .... = 0x040 XATTR READ: allowed
< 1... .... = 0x080 XATTR WRITE: allowed
< .... .... = 0x100 XATTR LIST: allowed
---
> Access rights (of requested): 0x00
> .... ...0 = 0x001 READ: *Access Denied*
> .... ..0. = 0x002 LOOKUP: *Access Denied*
> .... .0.. = 0x004 MODIFY: *Access Denied*
> .... 0... = 0x008 EXTEND: *Access Denied*
> ...0 .... = 0x010 DELETE: *Access Denied*
> .0.. .... = 0x040 XATTR READ: *Access Denied*
> 0... .... = 0x080 XATTR WRITE: *Access Denied*
> .... .... = 0x100 XATTR LIST: *Access Denied*
94c93
< changeid: 835
---
> changeid: 7305793694093118725
99,100c98,99
< seconds: 1701785659
< nseconds: 338989372
---
> seconds: 1701012648
> nseconds: 850758917
102,103c101,102
< seconds: 1701785659
< nseconds: 338989372
---
> seconds: 1701012648
> nseconds: 850758917
It really looks like a server side error even though the issue only happens (as far as I can tell) if the client is Alma Linux 9Update: I rebuilt agrajag as Alma Linux 8.9 and it still cannot mount the share. Using, variously rpcdebug -m rpc -c all
and rpcdebug -m nfsd -c all
on the server and rpcdebug -m nfs -c all
on the client shows nothing that looks like this error, via systemd journal or dmesg. I've tried sysctl -w sunrpc.nfsd_debug=1023
, etc. but that doesn't seem to do anything (I presume because this is under systemd).
Things this is not:
- firewall rules: NFS traffic is flowing
- network related: it happens mounting from localhost on the server
- the NFS
insecure
option: the port used is < 1024 and there is no NAT - a problem with
/etc/exports
:showmount -e
gives the expected result - a problem with
/etc/hosts
: all name resolution is via DNS (trillian is the DNS server as well as the NFS server; all four hosts have the correct forward and reverse entries) - selinux: disabling it doesn't change anything
- UID and GID mismatches: mounts are all done under
root(0:0)
; file access bysteve(1000:1000)
and the export is root squashed only. - NFSv4: if I attempt the mount by specifying
-o nfsvers=3
, the mount works but attempting tols
a file immediately fails. Since NFSv3 and v4 differ a lot, that's not surprising but it fails for both with the same error reported by the client. - That specific share: I get the same on another share, too.
Update:
- I built a new server
arthur (192.168.1.127)
- VM running Alma Linux 8.9, which is a replica of trillian in other respects. It shows the same behaviour (and the same NFSv4 ACCESS reply packet), so this problem is to specific to EL 9.x or to that specific build.