The quest for consistent metadata storage
I sought to create an web application to access my mp3's and photos remotely
I wanted a new way to store information, every file should be its own record
The database should be the files themselves and I should just need to maintain a directory
I needed a consistent metadata framework:
- Id3 was archaic and pretty much specifically for mp3s
- Exif was just for photos
Enter XMP
I found XMP which could tag mp4 files and photos both and had been in
development since 2005 by adobe. It used XML and was capable of storing
any type of information in any file.
Code was available in C++/Java and I immediately undertook the task
of writing a native extension using Rice. I ran into environment issues
and explored who I might implement the specification manually.
I checked out the specification and o_O, I could smell the stank of corporate governance
IPTC schemas and namespaces everywhere, the specification had rigid expectations of using a specific
schema and was full of all the nastiness of when XML came to be owned and defined by large corporate bodies.
I took the best parts of the idea
- Add metadata to any file
- Implement a special marker to identify an XMP segment
And added my own ideas
- Use BSON (10gen's Binary JSON format)
- Provide support for JSON schemas and namespaces
Of course there is no existing specification for schemas and namespaces, so the tests' namespaces refer to a google group where namespace implementation is being discussed.
Metahash
Check it out at Github