Tuesday, December 27, 2011

Goodbye Blogspot

This serves a reminder to whoever might actually read this blog (As well as myself) that this will be the last post on this blog. Blogspot kind of stinks and I am amazed I have lasted this long here. Future posts will be found at Neo-sanskrit on tumblr

Wednesday, November 23, 2011

Rails Benchmarking Reloaded

While evaluating caching strategies in Rails 3.1, I found existing articles comparing rails cache store backends to be quite lacking and/or outdated. The last article I could find compares file_store to mem_cache_store. Given that mem_cache_store is being replaced by "Dalli", it seems that existing benchmarks comparing the available options for rails cache backends are lacking in their ability to provide value with respect to the options available today.

  • File Store
  • Memcached Store
  • Dalli
  • Mongo Store
  • Redis Store
Test
require 'benchmark'
task :benchmark => :environment do
stores = {
:file_store=>[Rails.root+"/tmp/cache"],
:mem_cache_store=>["localhost"],
:dalli_store=>["localhost"],
:redis_store=> [],
:mongo_store=> [],
}
actions = {
:hit => lambda {
Rails.cache.fetch("test")
},
:miss => lambda {
Rails.cache.read("test"+rand().to_s[0..10])
}
}
stores.each do |store,args|
ActionController::Base.cache_store = store, *args
puts "Cache Store: #{store}"
marks = Benchmark.bm(15) do |x|
actions.each do |label, proc|
puts "Action: #{label}"
Rails.cache.delete("test")
Rails.cache.fetch("test") { [Time.now, 1.year.ago] }
x.report("times:") do
20000.times do
proc.call
end
end
end
end
end
end
view raw cachemark.rake hosted with ❤ by GitHub

Results


Though it looks like mongo-store demonstrates the best overall performance, it should be noted that a mongo server is unlikely to be used solely for caching (the same applies to redis), it is likely that non-caching related queries will be running concurrently on a mongo/redis server which could affect the suitability of these benchkmarks.

Thursday, September 29, 2011

Another Metadata Framework

The quest for consistent metadata storage


I sought to create an web application to access my mp3's and photos remotely
I wanted a new way to store information, every file should be its own record
The database should be the files themselves and I should just need to maintain a directory


I needed a consistent metadata framework:

  • Id3 was archaic and pretty much specifically for mp3s
  • Exif was just for photos

Enter XMP

I found XMP which could tag mp4 files and photos both and had been in
development since 2005 by adobe. It used XML and was capable of storing
any type of information in any file.

Code was available in C++/Java and I immediately undertook the task
of writing a native extension using Rice. I ran into environment issues
and explored who I might implement the specification manually.

I checked out the specification and o_O, I could smell the stank of corporate governance
IPTC schemas and namespaces everywhere, the specification had rigid expectations of using a specific
schema and was full of all the nastiness of when XML came to be owned and defined by large corporate bodies.

I took the best parts of the idea

  • Add metadata to any file
  • Implement a special marker to identify an XMP segment



And added my own ideas
  • Use BSON (10gen's Binary JSON format)
  • Provide support for JSON schemas and namespaces


Of course there is no existing specification for schemas and namespaces, so the tests' namespaces refer to a google group where namespace implementation is being discussed.

Metahash

require 'metahash'
# Writing to a file
mh = Metahash::Metahash.new "path/to/beiber.mp3"
mh["id3:artist"] = "Dave Meowtthews"
# Reading the metadata
mh = Metahash::Metahash.new "path/to/beiber.mp3"
puts mh.to_h
# It acts just like a hash
puts mh["id3:artist"]
# Unlike exif/id3 it can handle more complex structures
mh["genres"] = ["rap","hip-hop technofunk"]
mh["album"] = {
:title => "Dark side of the moon",
:year => 2003
}
mh["references"] = {
:soundcloud => "http://www.soundcloud.com/kultiv8tor"
}
# Unlike XMP, Its not made by a corporate monster and full of XML nightmare-fuel
# (No way to demonstrate this in code)
view raw gistfile1.rb hosted with ❤ by GitHub

Check it out at Github

Monday, August 29, 2011

Thoughts on version-control and agriculture

Could we use version control methods and techniques to create a "physical strain repository" for creating a distributed workflow for genetic selection which could be licensed under open-source and protected from large corporate machines like Monsanto via GNU or similar licenses?

While watching Food, Inc. the other night, I felt sorry for the soybean farmers who were dominated and regulated by Monsanto's patents. Monsanto produces genetically-modified seeds with extremely favorable characteristics and possesses a patent on their strains. Much has been written on the "evils of patenting food and seeds". I couldn't help but think about how Monsanto's reign over the seed industry is similar to the domination of Microsoft in the software industry of the 90s. Though their monopoly's fall had more to do with the fact that since their software was so ubiquitous it could not reap the benefits of competition. I feel that the rise of open-source software in the early millenium had a large part to play in cultivating a revolution against the corporate machine. Open-source software's ability to flourish is due in large part to the internet and its ability to dissolve geographic boundaries. The selection of seeds from generation to generation has largely been a locally-based operation for millenia. It is not readily possible for a farmer in georgia to view strains of farmers in missouri, there is no coordination for the civilization to organize mass selection in an effective manner.


Could we imagine a world where there exists a physical repository with a protocol for checking-in, checking-out , forking strains of seed? Like a github for organize mass artificial selection? Could we standardize a method of describing quantifiable measurements of seed quality and strain strength and index all forks and repositories? Couldn't we even mirror the actual evolution of a genome with source control? Aren't you in fact, a fork?

Saturday, June 04, 2011

Take a look at that gem!

Attempting to upload an image to the refinerycms system yielded a stack trace returned to the user. In this case, refinery's images_controller is picking up an error in dragonfly.



When we try to upload an image in refinery, we get



dragonfly (0.8.5) lib/dragonfly/analysis/file_command_analyser.rb:16:in `popen'
dragonfly (0.8.5) lib/dragonfly/analysis/file_command_analyser.rb:16:in `mime_type'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:37:in `call'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:37:in `call_last'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:36:in `catch'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:36:in `call_last'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:35:in `each'
dragonfly (0.8.5) lib/dragonfly/function_manager.rb:35:in `call_last'
dragonfly (0.8.5) lib/dragonfly/analyser.rb:26:in `analyse'
dragonfly (0.8.5) lib/dragonfly/job.rb:200:in `analyse'
(eval):3:in `mime_type'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/attachment.rb:152:in `send'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/attachment.rb:152:in `set_magic_attributes'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/attachment.rb:152:in `each'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/attachment.rb:152:in `set_magic_attributes'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/attachment.rb:33:in `assign'
dragonfly (0.8.5) lib/dragonfly/active_model_extensions/class_methods.rb:22:in `image='
activerecord (3.0.7) lib/active_record/base.rb:1559:in `send'
activerecord (3.0.7) lib/active_record/base.rb:1559:in `attributes='
activerecord (3.0.7) lib/active_record/base.rb:1555:in `each'
activerecord (3.0.7) lib/active_record/base.rb:1555:in `attributes='
activerecord (3.0.7) lib/active_record/base.rb:1407:in `initialize'
activerecord (3.0.7) lib/active_record/base.rb:497:in `new'
activerecord (3.0.7) lib/active_record/base.rb:497:in `create'
refinerycms-images (0.9.9.21) app/controllers/admin/images_controller.rb:48:in `create'
view raw stack_trace hosted with ❤ by GitHub


Let's check out the top file



module Dragonfly
module Analysis
class FileCommandAnalyser
include Configurable
configurable_attr :file_command, "file"
configurable_attr :use_filesystem, false
configurable_attr :num_bytes_to_check, 255
def mime_type(temp_object)
content_type = if use_filesystem
`#{file_command} -b --mime '#{temp_object.path}'`
else
IO.popen("#{file_command} -b --mime -", 'r+') do |io|
if num_bytes_to_check
io.write temp_object.data[0, num_bytes_to_check]
else
io.write temp_object.data
end
io.close_write
io.read
end
end.split(';').first
content_type.strip if content_type
end
end
end
end

So it appears that the error has to do with the IO.popen. Since we know we wouldn't need that call if "use_filesystem" were true, and since line 9 suggests there is a configuration directive for this setting somewhere. We should try to find it.



So we go down the stack trace to the last known point the execution was in another gem. It turns out to be images_controller in the refinerycms gem.



Knowing the name of the controller , I tried some bash-fu and was presently suprised when it worked!



root@tara:~/tmp/1008583# locate images_controller.rb
/usr/lib/ruby/gems/1.8/gems/refinerycms-images-0.9.9.21/app/controllers/admin/images_controller.rb
root@tara:~/tmp/1008583# vim `locate images_controller.rb`
view raw commands0 hosted with ❤ by GitHub


None the less. There did not appear to be any configuration in that file. I went to the refinery gem root's directory and did a "grep -R dragonfly ." to flesh out any config files. I noticed "lib/refinerycms-images.rb."



# grep -R Dragonfly .
./lib/refinerycms-images.rb: app_images = Dragonfly[:images]
./lib/refinerycms-images.rb: app_images.analyser.register(Dragonfly::Analysis::ImageMagickAnalyser)
./lib/refinerycms-images.rb: app_images.analyser.register(Dragonfly::Analysis::FileCommandAnalyser)
./lib/refinerycms-images.rb: app.config.middleware.insert_after 'Rack::Lock', 'Dragonfly::Middleware', :images, '/system/images'
./lib/refinerycms-images.rb: app.config.middleware.insert_before 'Dragonfly::Middleware', 'Rack::Cache', {
./config/routes.rb: match '/system/images/*dragonfly', :to => Dragonfly[:images]
./app/controllers/admin/images_controller.rb: rescue Dragonfly::FunctionManager::UnableToHandle
view raw commands1 hosted with ❤ by GitHub


We check out the file and see the Dragonfly app initialization at line 22. We google around for the Dragonfly docs looking for a reference to where exactly the "use_filesystem" configuration directive must be set. Our search lands us on docs for Dragonfly::Analysis::FileCommandAnalyser


An example for the config is referenced which includes the directive we are looking for.


app.analyser.register(Dragonfly::Analysis::FileCommandAnalyser) do |a|
a.use_filesystem = false # defaults to true
a.file_command = '/opt/local/bin/file' # defaults to 'file'
a.num_bytes_to_check = 1024 # defaults to 255 - only applies if not using the filesystem
end

We then modify the source of lib-refinerycms.rb to include the modifications to the analyzer config.



git diff refinerycms-images.rb
diff --git a/refinerycms-images.rb b/refinerycms-images.rb
index 3fef0f7..e3d3caa 100644
--- a/refinerycms-images.rb
+++ b/refinerycms-images.rb
@@ -31,7 +31,9 @@ module Refinery
app_images.define_macro(ActiveRecord::Base, :image_accessor)
app_images.analyser.register(Dragonfly::Analysis::ImageMagickAnalyser)
- app_images.analyser.register(Dragonfly::Analysis::FileCommandAnalyser)
+ app_images.analyser.register(Dragonfly::Analysis::FileCommandAnalyser) do |a|
+ a.use_filesystem = true
+ end
# This url_suffix makes it so that dragonfly urls work in traditional
# situations where the filename and extension are required,
view raw refinery_change hosted with ❤ by GitHub


We attempt the image upload again and the upload succeeds. Now, how do I get involved in the refinerycms repo to discuss the changes with the leads? Something like this? https://github.com/resolve/refinerycms/pull/738



Right?

Wednesday, May 25, 2011

MySQL Proxy

Robust application frameworks will include the ability to log all database activity. There are cases where you may encounter a situation where access to this functionality is limited, obscure or completely absent. This is especially the case with Pentaho where reports may often fail with no explanation and nothing but a long stack trace. In these cases, it is helpful to implement a way to implement logging on the database side.



MySQL General Log


There is of course the ability to turn on general logging in mysql through the --general-log and --general-log-file options. There are cases where this isn't very helpful. Especially if you are working with a development environment that has multiple developers and applications where the volume of queries from applications other than the one you are working with causes this method to be cumbersome.



mysql-proxy


mysql-proxy is a lua-based framework for intercepting and manipulating communication between a mysql client and server. It is capable of rewriting queries on-the-fly as well as rewriting result sets on the fly. In this case we can use it for auditing the queries from pentaho and its mysql-connection.


Mysql-proxy scripts are written in lua and are passed using --proxy-lua-script=. The server you are proxying to is specified by --proxy-backend-addresses=. The implementation of the log itself is rather simple. This implementation prints to STDOUT.


-- mysql_proxy.log
-- mysql-proxy --proxy-lua-script=/home/tdevol/tmp/mysql_proxy_log.lua --proxy-backend-addresses=192.168.1.236:3306
--
function read_query(packet)
if string.byte(packet) == proxy.COM_QUERY then
print("Intercepted Query : " .. string.sub(packet,2))
end
end
view raw mysql_proxy.lua hosted with ❤ by GitHub


The implementation simply involves creating a new JNDI through the Pentaho administration console and setting the host to server running the proxy. (The port for mysql-proxy defaults to 4040). The credentials passed to mysql-proxy are passed on to the backend server. Once the JNDI is setup, implementing the proxy is only a matter of changing the data source for the report being debugged to the new JNDI.

Wednesday, November 24, 2010

Best of both worlds. Modifying source/configure options for rpms

RPM's are great. They let you keep track of what's installed, dependencies and manages the removal of packages. In 99% of cases the RPM works great. In 1% of cases you may run across a bug in the package where the widely accepted solution is to remove or add an extra compile flag to fix the issue.




  1. Stop any services using the rpm

  2. Uninstall the rpm

  3. Install the rpm but do not confirm.
    In the rpm output before the confirm it will tell you which repo the package is at.

  4. Read the baseurl from the appropriate repo config file in /etc/yum.repos.d

  5. Create a tmp/working directory

  6. Go to that baseurl and use wget to get the somepkg.src.rpm package

  7. rpm -i /path/to/src.rpm

  8. cd /usr/src/redhat/

  9. To edit config flags, modify /usr/src/redhat/SPECS/somepkg.spec

  10. Rebuild the rpm ( rpmbuild -bb /usr/src/redhat/SPECS/somepkg.spec) (You may need to install some *devel packages)

  11. The RPM will be located in /usr/src/redhat/RPMS/{arch} .. where {arch} is the architecture for your machine (usually i386)