RAIDZ + 電源断
Posted on 2011/04/05(Tue) 00:32 in technical
UPSを設置したから、瞬断位なら耐えられるぜ!→初仕事は輪番停電。
という華麗なコンボを食らったので、RAIDZのresilver中に停電が来たらどうなるの、っと。
手順。
- ファイルを作る。(300MiBを2300個。
- replace開始
- 適当なところで電源ケーブル引っこ抜く
- 挿しなおして再起動
- おわり
最初はこんな状態。
~# zpool status replace-test pool: replace-test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM replace-test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 errors: No known data errors
リプレース用のディスクは既に装着されている。
~# cfgadm -al \grep -v usb Ap_Id Type Receptacle Occupant Condition c3 scsi-sas connected configured unknown c3::dsk/c3t1d0 disk connected configured unknown c3::dsk/c3t2d0 disk connected configured unknown c3::dsk/c3t3d0 disk connected configured unknown c3::dsk/c3t4d0 disk connected configured unknown c3::dsk/c3t6d0 disk connected configured unknown
リプレース開始。
~# zpool replace c3t4d0 c3t6d0 replace-test
順調に事が進む。
~# zpool status replace-test pool: replace-test state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 4 19:49:42 2011 76.3G scanned out of 901G at 238M/s, 0h59m to go 19.1G resilvered, 8.47% done config: NAME STATE READ WRITE CKSUM replace-test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 replacing-3 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 (resilvering) errors: No known data errors
おもむろに電源ケーブルを抜く。
ちょっとトイレ。
電源ケーブルを挿す。
起動直後。
~# zpool status replace-test pool: replace-test state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Apr 4 19:57:58 2011 1.85G scanned out of 901G at 189M/s, 1h21m to go 471M resilvered, 0.20% done config: NAME STATE READ WRITE CKSUM replace-test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 replacing-3 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 (resilvering) errors: No known data errors
すごく…やり直しです…。
で、最後まで終わる。
~# zpool status replace-test pool: replace-test state: ONLINE scan: resilvered 225G in 1h5m with 0 errors on Mon Apr 4 21:05:30 2011 config: NAME STATE READ WRITE CKSUM replace-test ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 errors: No known data errors
もっとひどい事が起きるかと期待してたんだけど、案外何も起きませんでした、終わり。
一応ファイルハッシュもSHA256で取得して、前後で比較してみたんだけど。:
# diff -s hash_diff.txt sha256_hashlist.txt ファイルhash_diff.txtとsha256_hashlist.txtは同一
超つまんない…。
普通に一致しちゃったし、意味なさそうだからもういいよね。
上手くタイミング合わせないと、普通の電源断クラスじゃ割と生き返ってきちゃうね。