Revision 2333 is a pre-publication revision. (Viewing current revision instead.)

Restoring a failed disk in Transparent RAID

Pre-requisite

Creating a Log RAID Configuration

Notice

When replacing a failed disk in your Transparent array, you do not need and should not format the replacement disk. The engine will override the replacement disk, and so the content is has does not matter. However, it is best to no create a volume on the replacement disk to avoid issues such as the operating system still having a handle on that volume. The replacement disk should ideally be a raw disk (a disk without partition or volume).

Tip: making use of live data reconstruction and recovering by just copying...

One thing that is not always apparent is that tRAID reconstructs a failed disk live such that one is able to read/write the data that would have resided on the failed disk. If you don't have a replacement disk handy but have free space on another disk, one recovery option would be to copy the data from the failed tRAID disk to the other tRAID disk(s) with free space. To do this, you first need to disable Storage Pooling caching if not already done, add drive letters to the tRAID disks, and then copy the data from the failed disk to the other tRAID disks under a folder. Finally, you would delete the RAID configuration and re-create one using the surviving disks. Again, this option is only if you don't have a replacement disk available. Similarly, users trying to be extra careful can copy the data off the failed disk to another disk just in case something goes wrong during recovery. Note that these are just tips. The proper recovery procedures for tRAID are explained further below.

The configuration override feature

It is highly recommended that users test the recovery features in Transparent RAID and get comfortable with the tasks involved. You shouldn't wait to have a disk failure to put the system that is protecting your data to test. You can skip this section and go to the next if you already have a failed disk. Otherwise, we will now discuss of how you can use the override feature to simulate disk failures. To aide in simulating data failure and recovery scenarios, the Web UI has an override feature that let's you fail and swap disks that are part of your array. 1. The override feature is engaged through the "Advanced Operations" menu as shown below. 1. Configuration Override 2. When engaged, the override feature will show a menu when you right-click on the grid with options to fail/unfail/swap a disk. 2. Fail a disk 3. A disk can be failed while the array is either offline or online. Both modes should be explored. For this tutorial, we are failing the disk while the array is online. If notifications are setup, you will be notified of the failure including log entries being present in the Web UI log file and the failed disk showing as failed in the UI. 3. Failed Disk

Restoring data while the array is online vs offline

The Transparent RAID platform is extremely powerful in that almost all operations can be executed while the array is either online or offline. Disks can be failed/unfailed/swapped/verified/restored while the array is live and being accessed by other applications. Doing things while the array is online provides great convenience and eliminates down times. Nonetheless, it is best to execute the Restore operation while the array is offline to minimize interferences and execute the process at the greater I/O speed possible. There is a performance penalty to executing the restore operation while the array is online. As the restore operation is of the greatest sensitivity, you want to execute it within the most sterile environment possible. You should do an online restore only if you really cannot afford any down time.

Dealing with a disk failure

1. This tutorial will showcase an array that is online with a failed disk. Because of the parity protection and live data reconstruction, your OS will be oblivious of the disk failure. As shown in the screenshot below, DRU2 has failed, but it is being reconstructed live such that the system is not impacted whatsoever. 4. Live Reconstructed Disk 2. If the override feature is engaged, the "Un-fail" and "Swap" menu options become available for a failed disk. That is you can only swap a disk that has failed or has been marked as failed. 5. Swap or Un-fail in Override mode 3. Once a disk has failed, restoring it is very straight forward. Click on the "Restore" button and choose the replacement disk. You also have the option of choosing to restore to the disk already in the array. This choice is applicable for cases where a disk is dropped out of the array but is proven to be in working condition later. That or you might have already swapped the failed disk with a working disk through the override menu. 6. Swap & Restore 4. As documented in http://wiki.flexraid.com/2013/06/22/creating-a-log-raid-configuration/, a Log RAID can optionally (but really recommended) be specified. Please refer to the linked topic and tooltips as shown in the screenshot below for further details. 7. Restore options 5. For this tutorial, we are choosing to restore the failed disk (DRU2) to DRU3 and to use a Log RAID. 8. Restoring to DRU3 With RAID Log 6. The Restore task will launch and you will be able to track its progress through the status window. 9. Restore Results 7. After the restore operation has completed, it is a good ideal to run the Verify+ operation to double check the state of the array. 10. Restored 8. Below is the result of the Verify+ task on our demo RAID following an online disk restoration. 11. Verify+ Results

What's next?

Dealing with a dropped disk in Transparent RAID

Revisions

Revision Differences

June 25, 2013 @ 22:44:23Current Revision
Content
<h1><u>Pre-requisite</u></h1> <h1><u>Pre-requisite</u></h1>
<a href="http:// wiki.flexraid.com/2013/06/ 22/creating-a- log-raid-configuration/" title="Creating a Log RAID Configuration" ><strong>Creating a Log RAID Configuration< /strong></a>  <a href="http:// wiki.flexraid.com/2013/06/ 27/creating-a- log-raid-configuration/" title="Creating a Log RAID Configuration" ><strong>Creating a Log RAID Configuration< /strong></a>
<br/> <br/>
  <h1><u><span style="color: maroon;">Notice< /span></u></h1>
  When replacing a failed disk in your Transparent array, you do not need and should not format the replacement disk.
  The engine will override the replacement disk, and so the content is has does not matter. However, it is best to no create a volume on the replacement disk to avoid issues such as the operating system still having a handle on that volume.
  The replacement disk should ideally be a raw disk (a disk without partition or volume).
  <h1><u>Tip: making use of live data reconstruction and recovering by just copying...</u></h1>
  One thing that is not always apparent is that tRAID reconstructs a failed disk <em>live</em> such that one is able to read/write the data that would have resided on the failed disk. If you don't have a replacement disk handy but have free space on another disk, one recovery option would be to copy the data from the failed tRAID disk to the other tRAID disk(s) with free space.
  To do this, you first need to disable Storage Pooling caching if not already done, add drive letters to the tRAID disks, and then copy the data from the failed disk to the other tRAID disks under a folder. Finally, you would delete the RAID configuration and re-create one using the surviving disks. Again, this option is only if you don't have a replacement disk available.
  Similarly, users trying to be extra careful can copy the data off the failed disk to another disk just in case something goes wrong during recovery.
  Note that these are just tips. The proper recovery procedures for tRAID are explained further below.
<h1><u>The configuration override feature</u></h1> <h1><u>The configuration override feature</u></h1>
It is highly recommended that users test the recovery features in Transparent RAID and get comfortable with the tasks involved. It is highly recommended that users test the recovery features in Transparent RAID and get comfortable with the tasks involved.
You shouldn't wait to have a disk failure to put the system that is protecting your data to test. You shouldn't wait to have a disk failure to put the system that is protecting your data to test.
You can skip this section and go to the next if you already have a failed disk. Otherwise, we will now discuss of how you can use the override feature to simulate disk failures. You can skip this section and go to the next if you already have a failed disk. Otherwise, we will now discuss of how you can use the override feature to simulate disk failures.
To aide in simulating data failure and recovery scenarios, the Web UI has an override feature that let's you fail and swap disks that are part of your array. To aide in simulating data failure and recovery scenarios, the Web UI has an override feature that let's you fail and swap disks that are part of your array.
<strong>1.</strong> The override feature is engaged through the <strong>"Advanced Operations"</strong> menu as shown below. <strong>1.</strong> The override feature is engaged through the <strong>"Advanced Operations"</strong> menu as shown below.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 1.-Configuration- Override.png" alt="1. Configuration Override" width="544" height="295" class="alignnone size-full wp-image-2315" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 1.-Configuration- Override.png" alt="1. Configuration Override" width="544" height="295" class="alignnone size-full wp-image-2315" />
<strong>2.</strong> When engaged, the override feature will show a menu when you right-click on the grid with options to fail/unfail/swap a disk. <strong>2.</strong> When engaged, the override feature will show a menu when you right-click on the grid with options to fail/unfail/swap a disk.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 2.-Fail-a-disk.png" alt="2. Fail a disk" width="454" height="294" class="alignnone size-full wp-image-2316" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 2.-Fail-a-disk.png" alt="2. Fail a disk" width="454" height="294" class="alignnone size-full wp-image-2316" />
<strong>3.</strong> A disk can be failed while the array is either offline or online. Both modes should be explored. <strong>3.</strong> A disk can be failed while the array is either offline or online. Both modes should be explored.
For this tutorial, we are failing the disk while the array is online. For this tutorial, we are failing the disk while the array is online.
If notifications are setup, you will be notified of the failure including log entries being present in the Web UI log file and the failed disk showing as failed in the UI. If notifications are setup, you will be notified of the failure including log entries being present in the Web UI log file and the failed disk showing as failed in the UI.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 3.-Failed-Disk.png" alt="3. Failed Disk" width="881" height="298" class="alignnone size-full wp-image-2317" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 3.-Failed-Disk.png" alt="3. Failed Disk" width="881" height="298" class="alignnone size-full wp-image-2317" />
<h1><u>Restoring data while the array is online vs offline</u></h1> <h1><u>Restoring data while the array is online vs offline</u></h1>
The Transparent RAID platform is extremely powerful in that almost all operations can be executed while the array is either online or offline. <strong>Disks can be failed/unfailed/ swapped/verified/restored while the array is live and being accessed by other applications.</strong> The Transparent RAID platform is extremely powerful in that almost all operations can be executed while the array is either online or offline. <strong>Disks can be failed/unfailed/ swapped/verified/restored while the array is live and being accessed by other applications.</strong>
Doing things while the array is online provides great convenience and eliminates down times. Doing things while the array is online provides great convenience and eliminates down times.
Nonetheless, it is best to execute the <strong>Restore</strong> operation while the array is offline to minimize interferences and execute the process at the greater I/O speed possible. There is a performance penalty to executing the restore operation while the array is online. As the restore operation is of the greatest sensitivity, you want to execute it within the most sterile environment possible. You should do an online restore only if you really cannot afford any down time. Nonetheless, it is best to execute the <strong>Restore</strong> operation while the array is offline to minimize interferences and execute the process at the greater I/O speed possible. There is a performance penalty to executing the restore operation while the array is online. As the restore operation is of the greatest sensitivity, you want to execute it within the most sterile environment possible. You should do an online restore only if you really cannot afford any down time.
<h1><u>Dealing with a disk failure</u></h1> <h1><u>Dealing with a disk failure</u></h1>
<strong>1.</strong> This tutorial will showcase an array that is online with a failed disk. <strong>1.</strong> This tutorial will showcase an array that is online with a failed disk.
Because of the parity protection and live data reconstruction, your OS will be oblivious of the disk failure. Because of the parity protection and live data reconstruction, your OS will be oblivious of the disk failure.
As shown in the screenshot below, DRU2 has failed, but it is being reconstructed live such that the system is not impacted whatsoever. As shown in the screenshot below, DRU2 has failed, but it is being reconstructed live such that the system is not impacted whatsoever.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 4.-Live-Reconstructed-Disk.png" alt="4. Live Reconstructed Disk" width="939" height="551" class="alignnone size-full wp-image-2318" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 4.-Live-Reconstructed-Disk.png" alt="4. Live Reconstructed Disk" width="939" height="551" class="alignnone size-full wp-image-2318" />
<strong>2.</strong> If the override feature is engaged, the "Un-fail" and "Swap" menu options become available for a failed disk. <strong>2.</strong> If the override feature is engaged, the "Un-fail" and "Swap" menu options become available for a failed disk.
That is you can only swap a disk that has failed or has been marked as failed. That is you can only swap a disk that has failed or has been marked as failed.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 5.-Swap-or-Un- fail-in-Override-mode.png" alt="5. Swap or Un-fail in Override mode" width="880" height="293" class="alignnone size-full wp-image-2319" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 5.-Swap-or-Un- fail-in-Override-mode.png" alt="5. Swap or Un-fail in Override mode" width="880" height="293" class="alignnone size-full wp-image-2319" />
<strong>3.</strong> Once a disk has failed, restoring it is very straight forward. Click on the <strong>"Restore"</strong> button and choose the replacement disk.  <strong>3.</strong> Once a disk has failed, restoring it is very straight forward. Click on the <strong>"Restore"</strong> button and choose the replacement disk.
You also have the option of choosing to restore to the disk already in the array. This choice is applicable for cases where a disk is dropped out of the array but is proven to be in working condition later. That or you might have already swapped the failed disk with a working disk through the override menu. You also have the option of choosing to restore to the disk already in the array. This choice is applicable for cases where a disk is dropped out of the array but is proven to be in working condition later. That or you might have already swapped the failed disk with a working disk through the override menu.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 6.-Swap-Restore.png" alt="6. Swap &amp; Restore" width="792" height="355" class="alignnone size-full wp-image-2320" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 6.-Swap-Restore.png" alt="6. Swap &amp; Restore" width="792" height="355" class="alignnone size-full wp-image-2320" />
<strong>4.</strong> As documented in <a href="http:// wiki.flexraid.com/2013/06/ 22/creating-a- log-raid-configuration/ ">http://wiki.flexraid.com/ 2013/06/22/creating-a-log- raid-configuration/</a>, a Log RAID can optionally (but really recommended) be specified. Please refer to the linked topic and tooltips as shown in the screenshot below for further details.  <strong>4.</strong> As documented in <a href="http:// wiki.flexraid.com/2013/06/ 22/creating-a- log-raid-configuration/ ">http://wiki.flexraid.com/ 2013/06/22/creating-a-log- raid-configuration/</a>, a Log RAID can optionally (<strong>but really recommended</strong>) be specified. Please refer to the linked topic and tooltips as shown in the screenshot below for further details.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 7.-Restore-options.png" alt="7. Restore options" width="882" height="387" class="alignnone size-full wp-image-2321" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 7.-Restore-options.png" alt="7. Restore options" width="882" height="387" class="alignnone size-full wp-image-2321" />
<strong>5.</strong> For this tutorial, we are choosing to restore the failed disk (DRU2) to DRU3 and you use a Log RAID.  <strong>5.</strong> For this tutorial, we are choosing to restore the failed disk (DRU2) to DRU3 and to use a Log RAID.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 8.-Restoring- to-DRU3-With- RAID-Log.png" alt="8. Restoring to DRU3 With RAID Log" width="615" height="270" class="alignnone size-full wp-image-2322" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 8.-Restoring- to-DRU3-With- RAID-Log.png" alt="8. Restoring to DRU3 With RAID Log" width="615" height="270" class="alignnone size-full wp-image-2322" />
<strong>6.</strong> The <strong>Restore</strong> task will launch and you will be able to track its progress through the status window. <strong>6.</strong> The <strong>Restore</strong> task will launch and you will be able to track its progress through the status window.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 9.-Restore-Results.png" alt="9. Restore Results" width="718" height="434" class="alignnone size-full wp-image-2323" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 9.-Restore-Results.png" alt="9. Restore Results" width="718" height="434" class="alignnone size-full wp-image-2323" />
<strong>7.</strong> After the restore operation has completed, it is a good ideal to run the Verify+ operation to double check the state of the array.  <strong>7.</strong> After the restore operation has completed, it is a good ideal to run the Verify+ operation to double check the state of the array.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 10.-Restored.png" alt="10. Restored" width="878" height="336" class="alignnone size-full wp-image-2324" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 10.-Restored.png" alt="10. Restored" width="878" height="336" class="alignnone size-full wp-image-2324" />
<strong>8.</strong> Below is the result of the Verify+ task on our demo RAID following an online disk restoration. <strong>8.</strong> Below is the result of the Verify+ task on our demo RAID following an online disk restoration.
<img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 11.-Verify+-Results.png" alt="11. Verify+ Results" width="890" height="481" class="alignnone size-full wp-image-2325" /> <img src="http://wiki.flexraid.com/ wp-content/uploads/2013/06/ 11.-Verify+-Results.png" alt="11. Verify+ Results" width="890" height="481" class="alignnone size-full wp-image-2325" />
<br/> <br/>
<h1><u>What's next?</u></h1> <h1><u>What's next?</u></h1>
<a href="#" title="TODO"> <strong>TODO< /strong></a>  <a href="http:// wiki.flexraid.com/2013/06/ 27/transparent- raid-dealing- with-a-dropped-disk/" title="Dealing with a dropped disk in Transparent RAID"><strong>Dealing with a dropped disk in Transparent RAID</strong></a>
<br/> <br/>

Note: Spaces may be added to comparison text to allow better line wrapping.

2 Responses to “Restoring a failed disk in Transparent RAID”

Leave a Reply