Feed on
Posts
Comments

One of these days I will learn not to grab the latest and “greatest” software the moment it comes out. Yesterday, I upgraded my iTunes to version 9 and my iPod Touch’s firmware to 3.1.1. Then, I noticed that one of my podcasts was out of order, and furthermore, the release date was no longer showing up on the iPod itself. Strangely, only one podcast was having this issue.

I started to suspect that this was my fault, since this one podcast, Escape Pod was a podcast I was listening by going through its archives. I had written a program which downloads the archived episodes, and modifies the ID3 tags so iTunes will properly recognize the file as a podcast and insert it into its database as if it had downloaded it itself. Getting the release date working correctly was one of the harder points, so I figured that I still didn’t have it quite correct and the new iPod firmware was being more sensitive to it. I got even more annoyed by the fact that iTunes, after upgrading my iPod’s firmware, decided to trash the old firmware versions preventing me from ever downgrading.

Then, I read online discussions where others were having this problem. Now I know that it is not my fault, but rather something that Apple screwed up. I also saw someone who noted that they had upgraded their iTunes, but not their iPod, and had the same issue. So, I trashed iTunes, pulled out my Snow Leopard disk, found the iTunes package on the disk, installed it, reverted my iTunes library to the backup prior to the upgrade, and voila, my podcasts are sorting correctly again.

Now if only I will learn to let others try first, and upgrade myself after reading their reports. This is a really sloppy bug on Apple’s part, and likely means that iTunes 9 was rush. I’m actually happier back on 8, especially since the window zoom still functions as a toggle between the mini player and the normal window, which was changed in 9.

Hopefully someone will benefit from this post, and not upgrade to iTunes 9 until this regression is resolved.

A friend of mine develops on an Bible application for the iPhone, BibleXpress. Since his application includes several translations with the app, he once mentioned to me the possibility of compressing them to save space. Any compression that is done must achieve a good ratio, but more importantly, decompression must be fast. I took it upon myself to find a compression algorithm that could fit the bill.

In my test case, I worked with the NASB translation of the Bible. The raw text of this translation, minus formatting and book/chapter/verse identifiers is 3.965MB. Since the iPhone already has zlib, using gzip compression is an obvious choice. When compressed with gzip, the file size becomes 1.189MB, a significant savings. Even though bzip2 is not readily available on the iPhone (at least not that I could find), I tested its compression which produced a file size of 0.8548MB. While these mechanisms provide a significantly smaller file, when one desires a certain portion of the file, one must first decompress the entire file up to that point. This is an expensive operation on a small device such as the iPhone.

One compromise is to compress each book individually. This yields a file size of 1.242MB. However, some books, such as Psalms, are still quite large, requiring a long decompression operation to read some of it’s chapters. Since the application displays a chapter at a time, a logical compression block would be a single chapter which yields a file size of 1.628MB. While this is a large increase in file size, a single chapter can be decompressed without requiring decompression of any other parts of the file.

I was not happy with these compression ratios and was determined to find a better way. I looked into Huffman encoding, which uses a is a variable length string of bits to represent a symbol. Its compression ratio does not change with the order of symbols, which typically means it won’t provide as high a compression ratio. However, if you are told which bit is the start of a string, you may begin decompression without examining any of the file which precedes it. In addition, the decompression scheme only requires walking a binary tree, which means it is also quite fast. So, if Huffman encoding can be made to provide a good compression ratio, it will work well for this problem.

When using Huffman encoded, the question becomes how to decompose the string into symbols. One typical decomposition is to make each character into a symbol. While this is an easy representation, it doesn’t provide a good compression ratio. Instead, I chose to make each word into a symbols. After adding in punctuation, the tree contained 16,246 words. This resulted in a dictionary size of 0.1364MB. When compressed, the entire Bible was represented by a Huffman encoding of 0.9947MB, meaning a total of 1.131MB for the compressed stream and dictionary.

After this experiment, I concluded that Huffman encoding of words is the best for a large quantity of text. It yielded a compression ratio that was better than gzip (though not as good as bzip2), but at the same time a scheme that could decompress individual parts of the file without having to read preceding parts. In this scheme, a single chapter can be decompressed very quickly, and still have a high compression ratio.

New Car

Well, after much hassle, I now have a new car. Here are some pictures:
New Car Front
New Car Side

The whole ordeal was mostly due to the CARS program, which is often known by “Cash for Clunkers”. Anyway, I finally got that resolved and got a lot more for my old truck than I would have ever gotten in trade-in value. So now I have a VW Jetta, and so far I’m happy with it. I just need to get used to driving a car instead of a truck, but otherwise it drives well.

I use MacPorts to get a whole host of utilities. In addition, I’ve always used it to obtain later versions of subversion that the one provided with the OS. Xcode, on the other hand, is locked into using the version of Subversion which is installed in /usr (currently version 1.4.4 where as I have 1.6.3). Since I often work in the command line, eventually I will use the newer version of Subversion on a directory which I also use in Xcode. This causes a problem since the command line utility will upgrade the local repository format in such a way that old versions of Subversion (Xcode’s) cannot use it.

So, my options are:

  • Only use the command line utility, or use Xcode, on a single repository checkout, but never both.
  • Use a horribly out of date version of Subversion
  • Hack Xcode’s plugin to work with newer versions of Subversion

I chose the last, as it should give me the best of both worlds. If you were to conduct a quick Google search, you would find a whole host of people posing a solution which involves moving around system libraries!!!!! This is a horrible idea, as it breaks other things such as Apache. Buried somewhere in the sea of bad ideas, I ran across a good one, written by Jean-Daniel Dupas and improved by Philippe Casgrain which only changes the path location of libraries used by Xcode’s Subversion plugin. This script worked well for Subversion 1.5, but recent versions of Subversion would cause Xcode to crash in libapr. The solution is to add libapr and libaprutil to the script. Here is the corrected script for your use.

#!/bin/sh

if [ "$#" -lt "2" ]
then
echo "Usage: xcode-svn-update.sh "
echo "Example: xcode-svn-update.sh /Developer/Library/Xcode/Plug-
ins/\c"
echo "XcodeSubversionPlugin.xcplugin/Contents/MacOS/
XcodeSubversionPlugin /opt/local"
exit 1;
fi

# Save a backup copy, if necessary
if [ -e "$1_Old" ]
then
echo "Backup copy of \"$1\" exists."
else
echo "Saving a backup copy of \"$1\"."
cp "$1" "$1_Old"
fi

echo "Updating install path using svn libraries in \"$2\"..."
install_name_tool -change \
/usr/lib/libapr-1.0.dylib \
$2/lib/libapr-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libaprutil-1.0.dylib \
$2/lib/libaprutil-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_client-1.0.dylib \
$2/lib/libsvn_client-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_delta-1.0.dylib \
$2/lib/libsvn_delta-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_diff-1.0.dylib \
$2/lib/libsvn_diff-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_fs_fs-1.0.dylib \
$2/lib/libsvn_fs_fs-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_fs-1.0.dylib \
$2/lib/libsvn_fs-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_ra_local-1.0.dylib \
$2/lib/libsvn_ra_local-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_ra_svn-1.0.dylib \
$2/lib/libsvn_ra_svn-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_ra-1.0.dylib \
$2/lib/libsvn_ra-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_repos-1.0.dylib \
$2/lib/libsvn_repos-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_subr-1.0.dylib \
$2/lib/libsvn_subr-1.0.dylib "$1"
install_name_tool -change \
/usr/lib/libsvn_wc-1.0.dylib \
$2/lib/libsvn_wc-1.0.dylib "$1"
echo "Done!"

Now, how long till Xcode supports Mercurial?

When I redesigned Sapphire, I decided that the metadata back end would be best served by Apple’s Core Data Framework. While the framework has a lot of power, several shortcomings in the implementation hindered its potential.

First, I should start with the many things that Apple did correctly in Core Data.

  • The whole data model with relationships and properties is quite powerful. With this data model, one can represent many data sets in a simple manner, such as the example below:
    core-data-model
    This example shows part of the data model within Sapphire pertaining to TV shows, where a TV shows contains multiple seasons, each of which contains multiple episodes. Additionally, an episode contains one or more sub-episodes, to handle the case where a single file or DVD contains multiple episodes. Lastly, the show and season objects extend from a superclass CategoryDirectory, which contains some common properties to all collections.
  • Since the relationships are defined, they can be automatically maintained. In the above example, if an episode’s show relationship is set to a particular TVShow object, that show’s object will automatically have the episode added to its episodes relationship.
  • Delete rules can be set such that if an object is removed, the delete can cascade to remove other objects as well. This is useful in the case of removing a directory, and all the files and directories contained within it.
  • Saving to a file is easy since the details of reading and writing a file are handled by Core Data
  • While I didn’t use it, undo management is also built into the system.

So, with all these advantages, why is Core Data not used more often. The answer is that it contains numerous short comings.

  • The compiler has no knowledge of the data model. One is expected to use setValue:forKey: and valueForKey: to set properties and relationships. This is prone to programmer errors such as assigning a value to the wrong type or even just misspellings. While a class name can be provided in the data model, there is no synchronization between the class files and that object. Being that this is the most glaring oversight in Core Data, Jonathan Rentzsch wrote mogenerator to resolve this. It creates a set of classes which are machine edited and human edited. The machine edited files contain the correctly typed setters and getters, making programmer errors detectible by the compiler, and thus reducing the debug cycle time. This is the kind of design that Apple should have done when they first made Core Data, or at the least when they redesigned it in Obj-C 2.
  • Lack of concurrency. One cannot have multiple programs edit the same object database at the same time, even when a SQLite format is used. Core Data will allow the edits to take place, but there is no apparent means by which another program will read these changes. Often, this results in a “nested transactions are not supported” exception when one tries to save. While concurrent editing is allowed within separate threads in a single program, this requires a separate context for each thread and synchronization commands sent between them. The only viable means I have found for concurrent editing between programs is to have a single master who saves all changes sent via interprocess communication by the other programs.
  • Unsafe relationship edit times. Editing the relationships is supposed to modify the inverse relationship so the two are consistent, but there are cases where this will happen and it has to do with the timings. Normally, if an object’s relationship is removed, the inverse relationship is also immediately removed, but if an object is deleted, that change is marked to be done later in a pending queue. These would not be an issue by themselves, but together they cause major headache.
    I will illustrate this through an example. Say I have a TV show with a two seasons, each with one episode, and I chose to delete one episode. The episode is marked for deletion and nothing else occurs until the pending changes are processed. Then, when episode is deleted, I no longer have any need for the season since it is empty. The natural solution is to override the changes to the episodes relationships so that if a season finds itself without any episodes, it should delete itself.
    This would be a great solution, except that it does not work. When Core Data is processing its pending changes, it appears to shut down its change notification, which is how these relationships are maintained. Additionally, objects marked for deletion in this time are not deleted. Due to these issues, I’ve actually managed to save an object model where objects contain relationships to objects that do not exist, and several objects that should have been deleted were not because the delete was ignored. My only solution to this has been to make my own pending queue, where objects are inserted while Core Data is processing its queue, then I process mine, and repeat until both are empty.

After overcoming these issues, I’ve been very happy with Core Data. It’s faster than the model that I could design, and consumes less memory. Furthermore, it has freed up much of the headache concerned with inter-object relationships, as well as object lifetime and retain cycles. So, if Apple would resolve these issues, then there should be no excuse to not use Core Data.

Older Posts »