RAIDZ + 電源断

Posted on 2011/04/05(Tue) 00:32 in technical

UPSを設置したから、瞬断位なら耐えられるぜ!→初仕事は輪番停電。

という華麗なコンボを食らったので、RAIDZのresilver中に停電が来たらどうなるの、っと。

手順。

  1. ファイルを作る。(300MiBを2300個。
  2. replace開始
  3. 適当なところで電源ケーブル引っこ抜く
  4. 挿しなおして再起動
  5. おわり

最初はこんな状態。

~# zpool status replace-test
pool: replace-test
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
replace-test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0

errors: No known data errors

リプレース用のディスクは既に装着されている。

~# cfgadm -al \grep -v usb
Ap_Id Type Receptacle Occupant Condition
c3 scsi-sas connected configured unknown
c3::dsk/c3t1d0 disk connected configured unknown
c3::dsk/c3t2d0 disk connected configured unknown
c3::dsk/c3t3d0 disk connected configured unknown
c3::dsk/c3t4d0 disk connected configured unknown
c3::dsk/c3t6d0 disk connected configured unknown

リプレース開始。

~# zpool replace c3t4d0 c3t6d0 replace-test

順調に事が進む。

~# zpool status replace-test
pool: replace-test
state: ONLINE
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Apr 4 19:49:42 2011
76.3G scanned out of 901G at 238M/s, 0h59m to go
19.1G resilvered, 8.47% done
config:

NAME STATE READ WRITE CKSUM
replace-test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
replacing-3 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0 (resilvering)

errors: No known data errors

おもむろに電源ケーブルを抜く。

ちょっとトイレ。

電源ケーブルを挿す。

起動直後。

~# zpool status replace-test
pool: replace-test
state: ONLINE
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Apr 4 19:57:58 2011
1.85G scanned out of 901G at 189M/s, 1h21m to go
471M resilvered, 0.20% done
config:

NAME STATE READ WRITE CKSUM
replace-test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
replacing-3 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0 (resilvering)

errors: No known data errors

すごく…やり直しです…。

で、最後まで終わる。

~# zpool status replace-test
pool: replace-test
state: ONLINE
scan: resilvered 225G in 1h5m with 0 errors on Mon Apr 4 21:05:30 2011
config:

NAME STATE READ WRITE CKSUM
replace-test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0

errors: No known data errors

もっとひどい事が起きるかと期待してたんだけど、案外何も起きませんでした、終わり。

一応ファイルハッシュもSHA256で取得して、前後で比較してみたんだけど。:

# diff -s hash_diff.txt sha256_hashlist.txt
ファイルhash_diff.txtとsha256_hashlist.txtは同一

超つまんない…。

普通に一致しちゃったし、意味なさそうだからもういいよね。

上手くタイミング合わせないと、普通の電源断クラスじゃ割と生き返ってきちゃうね。