Tuesday, December 27, 2011
Wednesday, November 23, 2011
While evaluating caching strategies in Rails 3.1, I found existing articles comparing rails cache store backends to be quite lacking and/or outdated. The last article I could find compares file_store to mem_cache_store. Given that mem_cache_store is being replaced by "Dalli", it seems that existing benchmarks comparing the available options for rails cache backends are lacking in their ability to provide value with respect to the options available today.
- File Store
- Memcached Store
- Mongo Store
- Redis Store
Though it looks like mongo-store demonstrates the best overall performance, it should be noted that a mongo server is unlikely to be used solely for caching (the same applies to redis), it is likely that non-caching related queries will be running concurrently on a mongo/redis server which could affect the suitability of these benchkmarks.
Thursday, September 29, 2011
The quest for consistent metadata storage
I sought to create an web application to access my mp3's and photos remotely
I wanted a new way to store information, every file should be its own record
The database should be the files themselves and I should just need to maintain a directory
I needed a consistent metadata framework:
- Id3 was archaic and pretty much specifically for mp3s
- Exif was just for photos
I found XMP which could tag mp4 files and photos both and had been in
development since 2005 by adobe. It used XML and was capable of storing
any type of information in any file.
Code was available in C++/Java and I immediately undertook the task
of writing a native extension using Rice. I ran into environment issues
and explored who I might implement the specification manually.
I checked out the specification and o_O, I could smell the stank of corporate governance
IPTC schemas and namespaces everywhere, the specification had rigid expectations of using a specific
schema and was full of all the nastiness of when XML came to be owned and defined by large corporate bodies.
I took the best parts of the idea
- Add metadata to any file
- Implement a special marker to identify an XMP segment
And added my own ideas
- Use BSON (10gen's Binary JSON format)
- Provide support for JSON schemas and namespaces
Of course there is no existing specification for schemas and namespaces, so the tests' namespaces refer to a google group where namespace implementation is being discussed.
Check it out at Github
Monday, August 29, 2011
While watching Food, Inc. the other night, I felt sorry for the soybean farmers who were dominated and regulated by Monsanto's patents. Monsanto produces genetically-modified seeds with extremely favorable characteristics and possesses a patent on their strains. Much has been written on the "evils of patenting food and seeds". I couldn't help but think about how Monsanto's reign over the seed industry is similar to the domination of Microsoft in the software industry of the 90s. Though their monopoly's fall had more to do with the fact that since their software was so ubiquitous it could not reap the benefits of competition. I feel that the rise of open-source software in the early millenium had a large part to play in cultivating a revolution against the corporate machine. Open-source software's ability to flourish is due in large part to the internet and its ability to dissolve geographic boundaries. The selection of seeds from generation to generation has largely been a locally-based operation for millenia. It is not readily possible for a farmer in georgia to view strains of farmers in missouri, there is no coordination for the civilization to organize mass selection in an effective manner.
Could we imagine a world where there exists a physical repository with a protocol for checking-in, checking-out , forking strains of seed? Like a github for organize mass artificial selection? Could we standardize a method of describing quantifiable measurements of seed quality and strain strength and index all forks and repositories? Couldn't we even mirror the actual evolution of a genome with source control? Aren't you in fact, a fork?
Saturday, June 04, 2011
Attempting to upload an image to the refinerycms system yielded a stack trace returned to the user. In this case, refinery's images_controller is picking up an error in dragonfly.
When we try to upload an image in refinery, we get
Let's check out the top file
So it appears that the error has to do with the IO.popen. Since we know we wouldn't need that call if "use_filesystem" were true, and since line 9 suggests there is a configuration directive for this setting somewhere. We should try to find it.
So we go down the stack trace to the last known point the execution was in another gem. It turns out to be images_controller in the refinerycms gem.
Knowing the name of the controller , I tried some bash-fu and was presently suprised when it worked!
None the less. There did not appear to be any configuration in that file. I went to the refinery gem root's directory and did a "grep -R dragonfly ." to flesh out any config files. I noticed "lib/refinerycms-images.rb."
We check out the file and see the Dragonfly app initialization at line 22. We google around for the Dragonfly docs looking for a reference to where exactly the "use_filesystem" configuration directive must be set. Our search lands us on docs for Dragonfly::Analysis::FileCommandAnalyser
An example for the config is referenced which includes the directive we are looking for.
We then modify the source of lib-refinerycms.rb to include the modifications to the analyzer config.
We attempt the image upload again and the upload succeeds. Now, how do I get involved in the refinerycms repo to discuss the changes with the leads? Something like this? https://github.com/resolve/refinerycms/pull/738
Wednesday, May 25, 2011
Robust application frameworks will include the ability to log all database activity. There are cases where you may encounter a situation where access to this functionality is limited, obscure or completely absent. This is especially the case with Pentaho where reports may often fail with no explanation and nothing but a long stack trace. In these cases, it is helpful to implement a way to implement logging on the database side.
MySQL General Log
There is of course the ability to turn on general logging in mysql through the --general-log and --general-log-file options. There are cases where this isn't very helpful. Especially if you are working with a development environment that has multiple developers and applications where the volume of queries from applications other than the one you are working with causes this method to be cumbersome.
mysql-proxy is a lua-based framework for intercepting and manipulating communication between a mysql client and server. It is capable of rewriting queries on-the-fly as well as rewriting result sets on the fly. In this case we can use it for auditing the queries from pentaho and its mysql-connection.
Mysql-proxy scripts are written in lua and are passed using --proxy-lua-script=. The server you are proxying to is specified by --proxy-backend-addresses=
The implementation simply involves creating a new JNDI through the Pentaho administration console and setting the host to server running the proxy. (The port for mysql-proxy defaults to 4040). The credentials passed to mysql-proxy are passed on to the backend server. Once the JNDI is setup, implementing the proxy is only a matter of changing the data source for the report being debugged to the new JNDI.