Thursday, September 29, 2011

Another Metadata Framework

The quest for consistent metadata storage


I sought to create an web application to access my mp3's and photos remotely
I wanted a new way to store information, every file should be its own record
The database should be the files themselves and I should just need to maintain a directory


I needed a consistent metadata framework:

  • Id3 was archaic and pretty much specifically for mp3s
  • Exif was just for photos

Enter XMP

I found XMP which could tag mp4 files and photos both and had been in
development since 2005 by adobe. It used XML and was capable of storing
any type of information in any file.

Code was available in C++/Java and I immediately undertook the task
of writing a native extension using Rice. I ran into environment issues
and explored who I might implement the specification manually.

I checked out the specification and o_O, I could smell the stank of corporate governance
IPTC schemas and namespaces everywhere, the specification had rigid expectations of using a specific
schema and was full of all the nastiness of when XML came to be owned and defined by large corporate bodies.

I took the best parts of the idea

  • Add metadata to any file
  • Implement a special marker to identify an XMP segment



And added my own ideas
  • Use BSON (10gen's Binary JSON format)
  • Provide support for JSON schemas and namespaces


Of course there is no existing specification for schemas and namespaces, so the tests' namespaces refer to a google group where namespace implementation is being discussed.

Metahash

Check it out at Github