Feed on
Posts
Comments

Note: Following this is not for the faint of heart. If you aren’t comfortable with partitioning, then don’t follow the steps here.
I’ve read many posts on how to handle ZFS/Raid-Z on differently sized disks. The goal is to gain the most disk space availability while still retaining the redundancy of surviving a single disk failure. The posts I’ve read either would achieve the theoretical capacity, or be capable of expansion, but not both. I devised a way to get both at the same time, and it’s relatively simple.

The problem is the following: A Raid-Z configuration uses n partitions, giving the user the capacity of n-1 of those partitions, with the nth being the redundant to survive a failure. If the n partitions are not the same size, with the smallest being x, only the first x bytes of each partition is used. One cannot remove a Raid-Z from an active pool without a backup/restore. One cannot add a disk to a Raid-Z without a backup/restore (maybe in the future). The only expansion that can be done is to replace the partitions with larger ones, and once the smallest partition is increased, the available space increases.

For the purposes of demonstrating this technique, I will use an example with 4 disks:

  • 125G
  • 250G
  • 500G
  • 750G

The biggest theoretical capacity of the array, while retaining single disk failure resiliency, is simply the total size of the array minus the largest disk. This means that the capacity of the above example is 875G. So, how does one achieve this capacity? I proposed the following structure using 3 raids:

If you add the capacity in the above, you can see it is 3*125G + 2*125G + 250G = 875G, which is the theoretical capacity.

Suppose I wish to replace the smallest disk, 125G, with a new disk, say 1T. Using the same layout, I should see something like this:


If you add the capacity in the above, you can see it is 3*250G + 2*250G + 250G = 1500G, which again is the theoretical capacity.

The question is, can I migrate from the first configuration to the last without a backup/restore process? The answer is YES, and data is moved/copied at most once.

For the purposes of demonstration, I’m going to show the expansion with 5 drives connected, but the expansion can be done by immediately replacing the smallest drive with the largest and relying on the redundancy to keep things intact in the process.

First, connect the 1T drive and partition it as 250G, 250G, 250G, leaving the rest free. Replace the first 125G partition on 750G disk with the first 250G partition on the 1T disk:

Continue the replacement:

Repeat the same process with the 3 partitions on the 500G disk:

At this point, the move of the blue raid is complete. Depending on the sizes of the disks, the blue raid may be bigger, at which point the pool immediately increases in size. In this example, it does not change size.

Repeat with the 2 partitions on the 250G disk:

At this point, the red raid is complete. In this example, the red raid has changed from 250G to 500G, and the job isn’t complete yet!

Repeat for the final time with partition on the 125G disk:

Now we are done, and can disconnect the 125G disk if not done already. In this example, the green raid has changed from 375G to 750G, yielding a total change of 875G to 1500G.

Note: There is one limitation with this structure. Ordering the disks in increasing size, disk(n) must be at least as big as 2*(disk(n-1))-disk(n-2). This essentially means that if a disk is xG bigger than it’s previous disk, then the next disk must be at least xG bigger than this disk. Since disk sizes tend to grow exponentially, this assumption shouldn’t be much of a problem since the requirement is at least linear growth.

The disk sizes in this example meet the requirement I listed above. Given the requirements of this expansion, the disk added in this example must be at least 1000G, since disk 4 is 250G bigger than disk 3, disk 5 must be at least 250G bigger than disk 4, or 1000G.

This technique also allows adding another disk, under the same conditions. Consider adding a 1T disk and creating another Raid-Z:

I didn’t structure the partitions in nice pretty rows to demonstrate the fact that not all of the data needs to be moved (in this case, 750G of data is not moved). In this example, the capacity increased from 875G to 1625G, again the theoretical capacity.

Anyway, I thought of this while considering using ZFS/Raid-Z in a FreeNAS setup. I haven’t tested any of this; it’s theoretical only. What do you think?

About 3 weeks ago, Kevin (developer of NitoTV) and I decided it was a bit silly how we were each writing playback mechanisms on the AppleTV with little to no collaboration between us. So, we decided to write a Common Media Player Framework, which is licensed using LGPL.

Kevin sent me the code he used for DVD playback inside NitoTV as a place to start. I stripped it down to a smaller piece, and started the framework. After I had it doing basic playback, I worked on overlays to provide feedback to the user. Now, hitting up and down changes the overlays between normal, chapter view, audio/subtitle selection, and zoom.

Chapter Overlay Sceenshot

Also, I added a pretty menu using a blurred image as the resume menu. This is more consistent with Apple’s own playback

Resume Overlay

In addition, I used my work on AC3 Passthrough in Perian to see if I could pull off the same in DVD Playback. The issue is the AppleTV claims to only have a device which can play uncompressed audio, not one that can send Dolby Digital to a decoder. So, I created an audio driver that claims to do exactly that and pipe the data through to the optical/HDMI port. It was a lot of fun getting that working, and then I added support for DTS.

Anyway, now Sapphire uses the common media player framework for DVD playback. In the future, we’ll add other playback mechanisms as well.

Passed My Defense

I haven’t posted here in a while, and with good reason. About two weeks ago, I passed my Ph.D. defense. So, come May, I will officially have a doctorate in Computer Engineering, though I am effectively done with my degree.

One of these days I will learn not to grab the latest and “greatest” software the moment it comes out. Yesterday, I upgraded my iTunes to version 9 and my iPod Touch’s firmware to 3.1.1. Then, I noticed that one of my podcasts was out of order, and furthermore, the release date was no longer showing up on the iPod itself. Strangely, only one podcast was having this issue.

I started to suspect that this was my fault, since this one podcast, Escape Pod was a podcast I was listening by going through its archives. I had written a program which downloads the archived episodes, and modifies the ID3 tags so iTunes will properly recognize the file as a podcast and insert it into its database as if it had downloaded it itself. Getting the release date working correctly was one of the harder points, so I figured that I still didn’t have it quite correct and the new iPod firmware was being more sensitive to it. I got even more annoyed by the fact that iTunes, after upgrading my iPod’s firmware, decided to trash the old firmware versions preventing me from ever downgrading.

Then, I read online discussions where others were having this problem. Now I know that it is not my fault, but rather something that Apple screwed up. I also saw someone who noted that they had upgraded their iTunes, but not their iPod, and had the same issue. So, I trashed iTunes, pulled out my Snow Leopard disk, found the iTunes package on the disk, installed it, reverted my iTunes library to the backup prior to the upgrade, and voila, my podcasts are sorting correctly again.

Now if only I will learn to let others try first, and upgrade myself after reading their reports. This is a really sloppy bug on Apple’s part, and likely means that iTunes 9 was rush. I’m actually happier back on 8, especially since the window zoom still functions as a toggle between the mini player and the normal window, which was changed in 9.

Hopefully someone will benefit from this post, and not upgrade to iTunes 9 until this regression is resolved.

A friend of mine develops on an Bible application for the iPhone, BibleXpress. Since his application includes several translations with the app, he once mentioned to me the possibility of compressing them to save space. Any compression that is done must achieve a good ratio, but more importantly, decompression must be fast. I took it upon myself to find a compression algorithm that could fit the bill.

In my test case, I worked with the NASB translation of the Bible. The raw text of this translation, minus formatting and book/chapter/verse identifiers is 3.965MB. Since the iPhone already has zlib, using gzip compression is an obvious choice. When compressed with gzip, the file size becomes 1.189MB, a significant savings. Even though bzip2 is not readily available on the iPhone (at least not that I could find), I tested its compression which produced a file size of 0.8548MB. While these mechanisms provide a significantly smaller file, when one desires a certain portion of the file, one must first decompress the entire file up to that point. This is an expensive operation on a small device such as the iPhone.

One compromise is to compress each book individually. This yields a file size of 1.242MB. However, some books, such as Psalms, are still quite large, requiring a long decompression operation to read some of it’s chapters. Since the application displays a chapter at a time, a logical compression block would be a single chapter which yields a file size of 1.628MB. While this is a large increase in file size, a single chapter can be decompressed without requiring decompression of any other parts of the file.

I was not happy with these compression ratios and was determined to find a better way. I looked into Huffman encoding, which uses a variable length string of bits to represent a symbol. Its compression ratio does not change with the order of symbols, which typically means it won’t provide as high a compression ratio. However, if you are told which bit is the start of a string, you may begin decompression without examining any of the file which precedes it. In addition, the decompression scheme only requires walking a binary tree, which means it is also quite fast. So, if Huffman encoding can be made to provide a good compression ratio, it will work well for this problem.

When using Huffman encoded, the question becomes how to decompose the string into symbols. One typical decomposition is to make each character into a symbol. While this is an easy representation, it doesn’t provide a good compression ratio. Instead, I chose to make each word into a symbols. After adding in punctuation, the tree contained 16,246 words. This resulted in a dictionary size of 0.1364MB. When compressed, the entire Bible was represented by a Huffman encoding of 0.9947MB, meaning a total of 1.131MB for the compressed stream and dictionary.

After this experiment, I concluded that Huffman encoding of words is the best for a large quantity of text. It yielded a compression ratio that was better than gzip (though not as good as bzip2), but at the same time a scheme that could decompress individual parts of the file without having to read preceding parts. In this scheme, a single chapter can be decompressed very quickly, and still have a high compression ratio.

« Newer Posts - Older Posts »