Monday, April 22, 2013

Linux Software RAID-1 Rebuild


Well, I have just received the RMA replacement of my failed Samsung HD204UI 2TB drive.  Now it is time to put the drive back into my Linux server and rebuild the RAID-1 (mirror) array so that the data resumes its highly-available state.  Here are the steps I took:


1) Install the hard drive.  Since I am not using hot-swap drive bays, I will have to shutdown the server to attach the drive the SATA-II controller.

2) Locate the disk (new disk is /dev/sdd).  This was found by looking at "fdisk -l".  My replacement is the unpartitioned drive that has resumed the device spec of my failed unit.

3) A Linux RAID (fd) partition must be created on the new drive. Create the partitioning table on the new drive, /dev/sdd, identically to the drive in the array. Use the "sfdisk" command:

 
# sfdisk -d /dev/sdc > sdc_partition.out

# sfdisk /dev/sdd < sdc_partition.out

Checking that no-one is using this disk right now ... OK

Disk /dev/sdd: 243201 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature  /dev/sdd: unrecognized partition table type Old situation: No partitions found New situation: Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System /dev/sdd1            63 3907024064 3907024002  fd  Linux raid autodetect /dev/sdd2             0         -          0   0  Empty /dev/sdd3             0         -          0   0  Empty /dev/sdd4             0         -          0   0  Empty Warning: no primary partition is marked bootable (active) This does not matter for LILO, but the DOS MBR will not boot this disk. Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)



4) Verify that the disk was partitioned properly and similar to the surviving drive.  In the example, I am comparing the partition tables of the drive currently in the RAID array with my new drive.  Notice that the newly created partition is /dev/sdd1 on my replacement drive /dev/sdd.

 
# sfdisk -l /dev/sdc

Disk /dev/sdc: 243201 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdc1          0+ 243200  243201- 1953512001   fd  Linux raid autodetect
/dev/sdc2          0       -       0          0    0  Empty
/dev/sdc3          0       -       0          0    0  Empty
/dev/sdc4          0       -       0          0    0  Empty
[root@aquila ~]# sfdisk -l /dev/sdd

Disk /dev/sdd: 243201 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdd1          0+ 243200  243201- 1953512001   fd  Linux raid autodetect
/dev/sdd2          0       -       0          0    0  Empty
/dev/sdd3          0       -       0          0    0  Empty
/dev/sdd4          0       -       0          0    0  Empty


My degraded raidset is /dev/md3.  Here is it's current status. 

 
# mdadm -v -D /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Sat Jun  4 21:34:09 2011
     Raid Level : raid1
     Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul 12 20:27:30 2011
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 5de5eb25:d02318e7:da699fd5:65330895
         Events : 0.13444

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed


5) Now I need to reconstruct the degraded RAID array with the partition on the new drive.  Since the replacement drive is now properly partitioned,it can simply be added to the array, /dev/md3, using the mdadm --manage command.  

 
# mdadm -v --manage /dev/md3 --add /dev/sdd1
mdadm: added /dev/sdd1


Now taking a look at the /dev/md3 raidset, the new partition has been added as a spare and the array has automatically started recovering.  This is a 2TB drive, so it will take a little while to become in-sync.

 
# mdadm -D /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Sat Jun  4 21:34:09 2011
     Raid Level : raid1
     Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Tue Jul 12 20:27:30 2011
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 0% complete

           UUID : 5de5eb25:d02318e7:da699fd5:65330895
         Events : 0.13444

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       2       8       49        1      spare rebuilding   /dev/sdd1


Here is what the healthy mdadm status of the 2TB mirror raidset looks like (after about 8-10 hours with my /proc/sys/dev/raid/ configurations):

 
# mdadm -D /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Sat Jun  4 21:34:09 2011
     Raid Level : raid1
     Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Wed Jul 13 08:20:22 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 5de5eb25:d02318e7:da699fd5:65330895
         Events : 0.13498

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1


That was all that there was to it.  If you follow these steps with your own server, your mileage may vary, so be careful.  Make sure that you take care of your data first and make a consistent, recoverable backup before you start.  Remember, a backup that has never been restored is not a backup.



Share/Bookmark

No comments:

Post a Comment