巡检中发现PVE的zfs-raidz2坏了一个盘,查询显示:

~# zpool status rpool
  pool: rpool
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0B in 00:43:00 with 0 errors on Sun May 12 01:07:02 2024
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 DEGRADED     0     0     0
          raidz2-0                                            DEGRADED     0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3  REMOVED      0     0     0

故障盘的状态已经是 REMOVED。我现在把坏的拔出来,插入了一个新的盘,再次执行 zpool status rpool 回显信息是不变的。但ls -la /dev/disk/by-id 可以看到没有配置的 sdd 盘:

~# ls -la /dev/disk/by-id
total 0
drwxr-xr-x 2 root root 720 Jun 11 00:30 .
drwxr-xr-x 8 root root 160 Apr 29 11:33 ..
lrwxrwxrwx 1 root root   9 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN -> ../../sda
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3 -> ../../sda3
lrwxrwxrwx 1 root root   9 May 12 00:38 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN -> ../../sdb
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3 -> ../../sdb3
lrwxrwxrwx 1 root root   9 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN -> ../../sdc
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3 -> ../../sdc3
lrwxrwxrwx 1 root root   9 May 25 17:43 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN -> ../../sde
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part2 -> ../../sde2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3 -> ../../sde3
lrwxrwxrwx 1 root root   9 Jun 11 00:30 ata-INTEL_SSDSC2KB019TZ_PHYI33840KVB1P9DGN -> ../../sdd
lrwxrwxrwx 1 root root   9 Apr 29 11:33 wwn-0x55cd2e415651c850 -> ../../sda
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c850-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c850-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c850-part3 -> ../../sda3
lrwxrwxrwx 1 root root   9 May 12 00:38 wwn-0x55cd2e415651c929 -> ../../sdb
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c929-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c929-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c929-part3 -> ../../sdb3
lrwxrwxrwx 1 root root   9 Apr 29 11:33 wwn-0x55cd2e415651c955 -> ../../sdc
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c955-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c955-part2 -> ../../sdc2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651c955-part3 -> ../../sdc3
lrwxrwxrwx 1 root root   9 May 25 17:43 wwn-0x55cd2e415651ca6c -> ../../sde
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part2 -> ../../sde2
lrwxrwxrwx 1 root root  10 Apr 29 11:33 wwn-0x55cd2e415651ca6c-part3 -> ../../sde3
lrwxrwxrwx 1 root root   9 Jun 11 00:30 wwn-0x55cd2e4156563eaf -> ../../sdd

根据以上信息让我们确认sdd的by-id,执行:

~# ls -l /dev/disk/by-id/wwn-0x55cd2e4156563eaf
lrwxrwxrwx 1 root root 9 Jun 11 00:30 /dev/disk/by-id/wwn-0x55cd2e4156563eaf -> ../../sdd

接下来我们登入PVE WEB管理界面,点击节点磁盘,可以看到未配置的sdd磁盘,然后点击“使用GPT初始化磁盘”。
接着我们继续在CLI替换故障盘:

~# zpool replace rpool ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3 /dev/disk/by-id/wwn-0x55cd2e4156563eaf

其中ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3是旧盘,/dev/disk/by-id/wwn-0x55cd2e4156563eaf是新盘。

我们再次查询zpool状态已经开始替换。

~# zpool status rpool

  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jun 11 00:46:36 2024
        197G / 3.45T scanned at 12.3G/s, 0B / 3.44T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                                    STATE     READ WRITE CKSUM
        rpool                                                   DEGRADED     0     0     0
          raidz2-0                                              DEGRADED     0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000GD1P9DGN-part3    ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000NS1P9DGN-part3    ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000Q21P9DGN-part3    ONLINE       0     0     0
            ata-INTEL_SSDSC2KB019TZ_PHYI334000Y91P9DGN-part3    ONLINE       0     0     0
            replacing-4                                         DEGRADED     0     0     0
              ata-INTEL_SSDSC2KB019TZ_PHYI334000GE1P9DGN-part3  REMOVED      0     0     0
              wwn-0x55cd2e4156563eaf                            ONLINE       0     0     0

可以看到整个过程还是很简单、快捷的。

标签: RAID, PVE

添加新评论