我们在近期运营中,PVE节点开通的VM客户反馈了一些异常情况:

1、PVE VM启用firewall/ macfilter/ ipfilter,ipset rules正确,但没有网络,重装系统/ 防火墙关闭也无法恢复;但在PVE VM网卡停用firewall网络即恢复。

2、PVE VM网络正常,但firewall rules不生效。

3、pve-firewall出现错误: status update error: iptables_restore_cmdlist: Try `iptables-restore -h' or 'iptables-restore --help' for more information.

对于第1个问题,通过简单的排错可以很明确知道是firewall异常,但在PVE GUI发现防火墙处于运行状态。登入SSH运行:

iptables-save -c | grep 2804(This is vmid)

找到异常VM的iptables规则:

[0:0] -A tap2804i0-OUT -m mac ! --mac-source bc:24:11:b8:97:88 -j DROP

tap2804i0中,2804是vmid,i0是第0个网卡;根据这个信息,对比iptables规则与vm的mac确认了mac是不匹配的。由此可以确认firewall已经异常,无法正常更新iptables rules。(tip:pve-firewall并不是独立组件,它最终会生成命令载入iptables。)

如果我们简单pve-firewall restart,此时VM网络即恢复正常。看似一切正常,但第2个问题即出现,所有防火墙规则失效。当我再次执行iptables-save -c,返回已经只剩下:

~# iptables-save -c
# Generated by iptables-save v1.8.9 on Wed Aug 21 19:00:45 2024
*raw
:PREROUTING ACCEPT [348735666585:213325671547273]
:OUTPUT ACCEPT [4591886294:3380972084311]
COMMIT
# Completed on Wed Aug 21 19:00:45 2024
# Generated by iptables-save v1.8.9 on Wed Aug 21 19:00:45 2024
*filter
:INPUT ACCEPT [6813060:3638198461]
:FORWARD ACCEPT [215129855:125354233180]
:OUTPUT ACCEPT [5768979:4211817661]
COMMIT

这是因为pve-firewall生成的iptables rules存在错误,无法被载入到iptables。执行systemctl status pvefw-logger pve-firewall可以看到类似的错误日志:

root@testnode:~# systemctl status pvefw-logger pve-firewall
● pvefw-logger.service - Proxmox VE firewall logger
     Loaded: loaded (/lib/systemd/system/pvefw-logger.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-08-21 00:00:09 HKT; 19h ago
   Main PID: 1649446 (pvefw-logger)
      Tasks: 2 (limit: 629145)
     Memory: 444.0K
        CPU: 8.328s
     CGroup: /system.slice/pvefw-logger.service
             └─1649446 /usr/sbin/pvefw-logger

Aug 21 00:00:09 testnode systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Aug 21 00:00:09 testnode pvefw-logger[1649446]: starting pvefw logger
Aug 21 00:00:09 testnode systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.

● pve-firewall.service - Proxmox VE firewall
     Loaded: loaded (/lib/systemd/system/pve-firewall.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-08-21 19:07:43 HKT; 16min ago
    Process: 4058179 ExecStartPre=/usr/bin/update-alternatives --set ebtables /usr/sbin/ebtables-legacy (code=exited, status=0/SUCCESS)
    Process: 4058181 ExecStartPre=/usr/bin/update-alternatives --set iptables /usr/sbin/iptables-legacy (code=exited, status=0/SUCCESS)
    Process: 4058182 ExecStartPre=/usr/bin/update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy (code=exited, status=0/SUCCESS)
    Process: 4058183 ExecStart=/usr/sbin/pve-firewall start (code=exited, status=0/SUCCESS)
    Process: 4088640 ExecReload=/usr/sbin/pve-firewall restart (code=exited, status=0/SUCCESS)
   Main PID: 4058255 (pve-firewall)
      Tasks: 1 (limit: 629145)
     Memory: 117.2M
        CPU: 1min 52.846s
     CGroup: /system.slice/pve-firewall.service
             └─4058255 pve-firewall

Aug 21 19:21:32 testnode pve-firewall[4058255]: status update error: iptables_restore_cmdlist: Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Aug 21 19:21:42 testnode pve-firewall[4058255]: status update error: iptables_restore_cmdlist: Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Aug 21 19:21:52 testnode pve-firewall[4058255]: status update error: iptables_restore_cmdlist: Try `iptables-restore -h' or 'iptables-restore --help' for more information.

除此外,我们还可以debug启动:pve-firewall stop; pve-firewall start -debug
这样我们会知道具体的错误类型(如--dport)、行号,这个信息非常的不清晰,我查询了很多文档好像无法知道这行的内容;只能运行:pve-firewall compile 来检查每个客户的firewal rules看有没错误。

为了快速锁定错误内容,既然-debug可以提示错误行,那么它就有完整的iptables rules。我查看了pve-firewall源码(https://github.com/proxmox/pve-firewall/blob/master/src/PVE/Firewall.pm),我们可以直接修改Firewall.pm源码。vi编辑/usr/share/perl5/PVE/Firewall.pm找到这个sub:

sub iptables_restore_cmdlist {
    my ($cmdlist, $table) = @_;

    $table = 'filter' if !$table;
    run_command(['iptables-restore', '-T', $table, '-n'], input => $cmdlist, errmsg => "iptables_restore_cmdlist");
}

在$table增加一个行打印所有cmdlist(iptables rules):

sub iptables_restore_cmdlist {
    my ($cmdlist, $table) = @_;

    $table = 'filter' if !$table;

    # 打印 cmdlist
    warn "Restoring iptables rules: $cmdlist\n";  # 使用 warn 打印到标准错误

    run_command(['iptables-restore', '-T', $table, '-n'], input => $cmdlist, errmsg => "iptables_restore_cmdlist");
}

现在我们再次执行:pve-firewall stop; pve-firewall start -debug
现在会输出所有它要执行的iptables rules、错误类型、错误行号,根据完整的iptables rules,我们就可以很轻松找到错误行。最终发现是客户增加的 udplite 协议rules导致了iptables错误。接下来就好办了:

~# pve-firewall compile  | grep udplite
-A tap2744i0-IN -p udplite --dport 19885 -j ACCEPT
-A tap2744i0-IN -p udplite --dport 20715 -j ACCEPT
-A tap2744i0-IN -p udplite --dport 19885 -j ACCEPT
-A tap2744i0-IN -p udplite --dport 20715 -j ACCEPT

我们已经找到了错误rules,现在去客户VM2744 Firewall rules删除udplite相关记录。然后 pve-firewall restart。现在不再提示Try `iptables-restore -h' or 'iptables-restore --help' for more information,启动正常,问题解决。记得在/usr/share/perl5/PVE/Firewall.pm注释或删除warn "Restoring iptables rules: $cmdlist\n"。

如果希望完全不受其困扰,可以在客户管理平台不再允许添加udplite rules;如果因为某些原因无法进行,也可以编辑 /usr/share/perl5/PVE/Firewall.pm(不建议,因为每次pve-firewall更新后修改可能被覆盖),找到verify_rule方法,添加检查,修改后的完整方法:

if ($rule->{proto}) {
    eval { pve_fw_verify_protocol_spec($rule->{proto}); };
    &$add_error('proto', $@) if $@;
    &$set_ip_version(4) if $rule->{proto} eq 'icmp';
    &$set_ip_version(6) if $rule->{proto} eq 'icmpv6';
    &$set_ip_version(6) if $rule->{proto} eq 'ipv6-icmp';
    $is_icmp = $proto_is_icmp->($rule->{proto});

    # 插入对 udplite 协议的检查
    if ($rule->{proto} eq 'udplite') {
        &$add_error('proto', "'udplite' protocol is not supported for firewall rules.");
    }
}

重启pve-firewall后客户再添加udplite rules不会再造成iptables错误,但会返回错误,如下所示:

~#  systemctl status pvefw-logger pve-firewall
● pvefw-logger.service - Proxmox VE firewall logger
     Loaded: loaded (/lib/systemd/system/pvefw-logger.service; enabled; preset: enabled)
     Active: active (running) since Thu 2024-08-22 13:13:18 HKT; 17s ago
    Process: 4165930 ExecStart=/usr/sbin/pvefw-logger (code=exited, status=0/SUCCESS)
   Main PID: 4165933 (pvefw-logger)
      Tasks: 2 (limit: 629145)
     Memory: 480.0K
        CPU: 88ms
     CGroup: /system.slice/pvefw-logger.service
             └─4165933 /usr/sbin/pvefw-logger

Aug 22 13:13:18 testnode systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Aug 22 13:13:18 testnode pvefw-logger[4165933]: starting pvefw logger
Aug 22 13:13:18 testnode systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.

● pve-firewall.service - Proxmox VE firewall
     Loaded: loaded (/lib/systemd/system/pve-firewall.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-07-05 06:35:42 HKT; 1 month 17 days ago
    Process: 3723 ExecStartPre=/usr/bin/update-alternatives --set ebtables /usr/sbin/ebtables-legacy (code=exited, status=0/SUCCESS)
    Process: 3725 ExecStartPre=/usr/bin/update-alternatives --set iptables /usr/sbin/iptables-legacy (code=exited, status=0/SUCCESS)
    Process: 3727 ExecStartPre=/usr/bin/update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy (code=exited, status=0/SUCCESS)
    Process: 3729 ExecStart=/usr/sbin/pve-firewall start (code=exited, status=0/SUCCESS)
    Process: 4165880 ExecReload=/usr/sbin/pve-firewall restart (code=exited, status=0/SUCCESS)
   Main PID: 3744 (pve-firewall)
      Tasks: 1 (limit: 629145)
     Memory: 123.0M
        CPU: 4d 18h 11min 46.192s
     CGroup: /system.slice/pve-firewall.service
             └─3744 pve-firewall

Aug 22 13:13:15 testnode pve-firewall[3744]: received signal HUP
Aug 22 13:13:15 testnode pve-firewall[3744]: server shutdown (restart)
Aug 22 13:13:15 testnode systemd[1]: Reloaded pve-firewall.service - Proxmox VE firewall.
Aug 22 13:13:15 testnode pve-firewall[3744]: restarting server
Aug 22 13:13:16 testnode pve-firewall[3744]: /etc/pve/firewall/4076.fw (line 37) - errors in rule parameters: IN ACCEPT -i net4 -p udpl>
Aug 22 13:13:16 testnode pve-firewall[3744]:   proto: 'udplite' protocol is not supported for firewall rules.

标签: PVE

添加新评论