Proxmox VE的ZFS如何更换故障盘
巡检中发现PVE的zfs-raidz2坏了一个盘,查询显示:
~# zpool status rpool
pool: rpool
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 00:43:00 with 0 errors on Sun May 12 01:07:02 2024
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3 REMOVED 0 0 0
故障盘的状态已经是 REMOVED。我现在把坏的拔出来,插入了一个新的盘,再次执行 zpool status rpool 回显信息是不变的。但ls -la /dev/disk/by-id 可以看到没有配置的 sdd 盘:
~# ls -la /dev/disk/by-id
total 0
drwxr-xr-x 2 root root 720 Jun 11 00:30 .
drwxr-xr-x 8 root root 160 Apr 29 11:33 ..
lrwxrwxrwx 1 root root 9 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN -> ../../sda
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 May 12 00:38 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN -> ../../sdb
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 9 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN -> ../../sdc
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3 -> ../../sdc3
lrwxrwxrwx 1 root root 9 May 25 17:43 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN -> ../../sde
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part2 -> ../../sde2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3 -> ../../sde3
lrwxrwxrwx 1 root root 9 Jun 11 00:30 ata-INTEL_SSDSC2KB019TZ_PHYI33840KVB1P9DGN -> ../../sdd
lrwxrwxrwx 1 root root 9 Apr 29 11:33 wwn-0x55cd2e415651c850 -> ../../sda
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c850-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c850-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c850-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 May 12 00:38 wwn-0x55cd2e415651c929 -> ../../sdb
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c929-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c929-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c929-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 9 Apr 29 11:33 wwn-0x55cd2e415651c955 -> ../../sdc
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c955-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c955-part2 -> ../../sdc2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651c955-part3 -> ../../sdc3
lrwxrwxrwx 1 root root 9 May 25 17:43 wwn-0x55cd2e415651ca6c -> ../../sde
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part2 -> ../../sde2
lrwxrwxrwx 1 root root 10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part3 -> ../../sde3
lrwxrwxrwx 1 root root 9 Jun 11 00:30 wwn-0x55cd2e4156563eaf -> ../../sdd
根据以上信息让我们确认sdd的by-id,执行:
~# ls -l /dev/disk/by-id/wwn-0x55cd2e4156563eaf
lrwxrwxrwx 1 root root 9 Jun 11 00:30 /dev/disk/by-id/wwn-0x55cd2e4156563eaf -> ../../sdd
接下来我们登入PVE WEB管理界面,点击节点磁盘,可以看到未配置的sdd磁盘,然后点击“使用GPT初始化磁盘”。
接着我们继续在CLI替换故障盘:
~# zpool replace rpool ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3 /dev/disk/by-id/wwn-0x55cd2e4156563eaf
其中ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3是旧盘,/dev/disk/by-id/wwn-0x55cd2e4156563eaf是新盘。
我们再次查询zpool状态已经开始替换。
~# zpool status rpool
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jun 11 00:46:36 2024
197G / 3.45T scanned at 12.3G/s, 0B / 3.44T issued
0B resilvered, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3 ONLINE 0 0 0
replacing-4 DEGRADED 0 0 0
ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3 REMOVED 0 0 0
wwn-0x55cd2e4156563eaf ONLINE 0 0 0
可以看到整个过程还是很简单、快捷的。