Working With Core Data

Posted by Thoughts and Ramblings on Wednesday, May 20, 2009

When I redesigned Sapphire, I decided that the metadata back end would be best served by Apple’s Core Data Framework. While the framework has a lot of power, several shortcomings in the implementation hindered its potential.

First, I should start with the many things that Apple did correctly in Core Data.

  • The whole data model with relationships and properties is quite powerful. With this data model, one can represent many data sets in a simple manner, such as the example below:
    core-data-model
    core-data-model
    This example shows part of the data model within Sapphire pertaining to TV shows, where a TV shows contains multiple seasons, each of which contains multiple episodes. Additionally, an episode contains one or more sub-episodes, to handle the case where a single file or DVD contains multiple episodes. Lastly, the show and season objects extend from a superclass CategoryDirectory, which contains some common properties to all collections.
  • Since the relationships are defined, they can be automatically maintained. In the above example, if an episode’s show relationship is set to a particular TVShow object, that show’s object will automatically have the episode added to its episodes relationship.
  • Delete rules can be set such that if an object is removed, the delete can cascade to remove other objects as well. This is useful in the case of removing a directory, and all the files and directories contained within it.
  • Saving to a file is easy since the details of reading and writing a file are handled by Core Data
  • While I didn’t use it, undo management is also built into the system.

So, with all these advantages, why is Core Data not used more often. The answer is that it contains numerous short comings.

  • The compiler has no knowledge of the data model. One is expected to use setValue:forKey: and valueForKey: to set properties and relationships. This is prone to programmer errors such as assigning a value to the wrong type or even just misspellings. While a class name can be provided in the data model, there is no synchronization between the class files and that object. Being that this is the most glaring oversight in Core Data, Jonathan Rentzsch wrote mogenerator to resolve this. It creates a set of classes which are machine edited and human edited. The machine edited files contain the correctly typed setters and getters, making programmer errors detectible by the compiler, and thus reducing the debug cycle time. This is the kind of design that Apple should have done when they first made Core Data, or at the least when they redesigned it in Obj-C 2.
  • Lack of concurrency. One cannot have multiple programs edit the same object database at the same time, even when a SQLite format is used. Core Data will allow the edits to take place, but there is no apparent means by which another program will read these changes. Often, this results in a “nested transactions are not supported” exception when one tries to save. While concurrent editing is allowed within separate threads in a single program, this requires a separate context for each thread and synchronization commands sent between them. The only viable means I have found for concurrent editing between programs is to have a single master who saves all changes sent via interprocess communication by the other programs.
  • Unsafe relationship edit times. Editing the relationships is supposed to modify the inverse relationship so the two are consistent, but there are cases where this will happen and it has to do with the timings. Normally, if an object’s relationship is removed, the inverse relationship is also immediately removed, but if an object is deleted, that change is marked to be done later in a pending queue. These would not be an issue by themselves, but together they cause major headache. I will illustrate this through an example. Say I have a TV show with a two seasons, each with one episode, and I chose to delete one episode. The episode is marked for deletion and nothing else occurs until the pending changes are processed. Then, when episode is deleted, I no longer have any need for the season since it is empty. The natural solution is to override the changes to the episodes relationships so that if a season finds itself without any episodes, it should delete itself. This would be a great solution, except that it does not work. When Core Data is processing its pending changes, it appears to shut down its change notification, which is how these relationships are maintained. Additionally, objects marked for deletion in this time are not deleted. Due to these issues, I’ve actually managed to save an object model where objects contain relationships to objects that do not exist, and several objects that should have been deleted were not because the delete was ignored. My only solution to this has been to make my own pending queue, where objects are inserted while Core Data is processing its queue, then I process mine, and repeat until both are empty.

After overcoming these issues, I’ve been very happy with Core Data. It’s faster than the model that I could design, and consumes less memory. Furthermore, it has freed up much of the headache concerned with inter-object relationships, as well as object lifetime and retain cycles. So, if Apple would resolve these issues, then there should be no excuse to not use Core Data.


Legacy Comments:

Push Eject - Jun 26, 2009

I’ve said it before, but *I freaking LOVE Sapphire* Thanks for all you do, Graham. Now – if only it had a “Most recently added…” smart folder. :)