Understanding the RAID Monitoring Plugin

Analyzing a system with bad performance

Read & Write in Energy Saving Mode

RAID Monitoring

Notice:
You must enable RAID statistics on your RAID configuration for the stats to be captured!

The first capture above represents an SSD array in Energy Saving mode where the bottleneck is a slower 60GB SSD. This array does perform even better in performance mode.
For the second capture, we have staged a virtual system such that it exhibits poor relative write performance.
The VM was outfitted with 5 virtual disks all on the same physical disk, which we also kept busy with other activities.

Read Performance

Read performance should never be an issue in Transparent RAID as it operates at near source disk speed.
If you ever experience poor read performance, the source of the issue will be outside of the RAID system. Analyse the affected disk through SMART for fitness.

Lock Override Count

This value is global to the array and should ideally be zero or closer to it.

Transparent RAID uses a rather complex concurrency algorithm that does not use locks or traditional synchronization techniques (convoluted technical reasons).
The algorithm was really developed for NZFS and ensures that the system will never get itself in a dead-lock scenario (since it never uses locks ;) ), and it has many other benefits.
What it does is use an intricate scheduling strategy to manage concurrency. One of the RAID configuration parameters is a salt to the Concurrency Queuing setting. The salt is used in the concurrency scheduling strategy.
The algorithm purposely violates concurrency by intelligently playing with timing, but a too aggressive violation has side effects such as causing blocks to go out of parity sync.
“Lock Override” is a misnomer as it is more of a scheduling override. The override is to create a smoothing effect in the system by cheating on the concurrency scheduling.
There isn’t a one to one between Lock Override and blocks being out sync. However, a high override count increases the probability of a block going out of sync.

If you experience a high count on this value, please visit the forum and post on the issue for assistance.

Write Performance

Misalignment

The primary things to analyze are the misalignment values.

  • The Write Update misalignment value matters if you are using more than one PPU. This value will always be less than 1% and closer to zero at all times. The Transparent RAID system aligns the parity disk heads such that they operate in sync. Any value other than 0% represents anomalies in the sync. In the screenshot above, we staged the anomaly by keeping one of the PPUs busier than the other. In a typical system, a value above 1% should never happen.
  • The Read Update misalignment value is one that can be higher but should be less than 5% and ideally closer to zero. This value represents how the data disk being operated on is aligned to the parity disk(s). The RAID system does not try to align the data disk and parity disk(s) heads as the data disk should remain completely independent of the parity disk(s). Nonetheless, better performance is achieved if their random head operations are closer to being in sync.

Compute throughput & percentage

The compute performance should never be a factor in Transparent RAID. The throughput in itself is normalized and is not something to get hanged up on. It is there purely for information purposes. The percentage value should ideally be less than 1%.
CPU time is free in Transparent RAID since it usually happens during Read Update misalignment time. So, you should only care about the compute percentage if it is greater than the Read Update misalignment percentage.

Read Update throughput & percentage

What really matters here is the percentage value.
This element is most the performance robbing attribute. The higher the percentage the worst your write performance will be.
In the above example, 74% of the write time was spent reading for update.
This happens when:

  • the data disk being written to as well as the parity disk(s) are busy servicing other requests
  • the disks involved have a high disk random access time (latency)

Most spinning disks are designed for sequential operations. As such, most perform very poorly under random operation load.
How badly your disks perform under random operations will directly affect your write performance in Transparent RAID.
For this reason, it is recommended that you use fast 7200 rpm disks for parity disk(s).

Write Update throughput & percentage

The Write Update values are of less issue than the Read Update values. This is mostly due to the fact that both data disk and parity disk(s) are better aligned after a Read Update.
Looking at the above example, its Write Update throughput is around 47MB/s. However, since only 25% of the write time is spent actually writing, we end up with an effective write throughput of around 12MB/s.
So, the priority should be reducing the Read Update time percentage.

Be Sociable, Share!

No comments yet.

Leave a Reply

three × two =