Victus Spiritus

home

Each Time I Get My Hands Dirty, I Learn

08 Feb 2010

As long as I'm unwilling to face an obstacle, my mind invents clever schemes to prevent me from having to deal with undesirable tasks. That results in me treating the obstacle as a spooky animal. It also means I will miss out on any advantages of understanding the taboo subject.

My biggest hurdle over the past dozen years has been learning a new programming language. After getting comfortable with C++ (and c) I couldn't imagine starting out all over. Learning how to translate thoughts into data structures, objects, and methods took me a good two years to understand, and five to get very comfortable with. But last year that barrier finally got worn away by a potent force, my enthusiasm to build a highly capable web information tool.

At that point I began to familiarize myself with php and common web programming languages. I had never consciously written code to access a web API before writing a script to access Twitter late last summer. The documentation was rich, but I kept looking for something like an include to dig into source code. That's how I used to figure interfaces out, often my own source code written years earlier. But APIs only reveal the interface, they are the block box of the web world.

In the fall of last year I toyed with Scala on Lift, and Python with Django. Setting up Scala in Netbeans and Eclipse IDEs was nontrivial for a first timer. To clarify, getting the right compilers/Jre/sdk and mixing this concept with the Google App Engine took some time to straighten out (poor Benjamin Golub was kind enough to put up with my ignorance at the time the friendfeed API v2 was released).

My reliance on IDEs is another hurdle, but I far prefer rich IDEs for code navigation to editors and makefiles (I started coding on SGI unix, VI and emacs are not the ideal coding tools for me). I was clueless how to tweak the sample code frameworks besides obvious logical hints in the code structure. I judge a language by how much sense it makes to me when looking at sample code. If I have to pour over documentation to clarify dense arcane syntax, it's a bad sign, although I'm becoming acclimated to hitting web docs for help.

Last November Tyler kicked me into learning the basics of Ruby through Rails our current framework for the IMM (Intelligent Media Manager). As a long time engineer who primarily did numerical/algorithm development in C++ I was a little shocked at how slow run times for interpreted languages can be. Why would anyone choose to develop in a language that was an order of magnitude or two slower? Some simple facts I was missing:

In contrast to the lack of need for speed with some web development languages, there are fundamental interfaces that we all want to be as responsive as possible. For these (our database and public API) we'll want things to be hypertuned. In fact if we had to, I'm confident Tyler and I could write the entire site/service over again in a natively faster language now that we've gone through it once *ducks the rock Tyler will throw at me for this comment*.

Ruby on Rails is a great tool for learning, developing and understating the process, data, and user flow. There's still a great deal I can learn about the language (they didn't choose dynamic typing on a whim). While working with Rails I have even learned a little JavaScript to enable users to do remote hits to the Twitter search API (we use jquery as a support lib). This was the last step for the rudimentary twitter client we now have, Social Gravity. Our tools excel at identifying the important topics of status updates (metadata) and puts us in a position to better invent useful search and interaction tools.

Unlimited API hits aren't free

The past few days I've been digging into Open Calais' API. They have another great semantic tool which we can leverage. The big plus for Calais is there 100k hits per day limit. While debugging our list processing, I've already gone beyond the Zemanta API limit of 10k. Alchemy's Orchestr8 is our goto semantic API, and it gives us 30k hit per day to work with. If necessary we can explore hitting DBPedia directly to identify our own entities from text. Relying on external tools could be more expensive than our limited revenue can practically support. It all depends on how many affiliate sales we can get per API hit. Zemanta's pricing architecture is $1200/month for 50k hits per day at the time of writing this post. Calais may be our most viable option. Even better than that, in house semantic lookups could be the best long term solution. Lower software and maintenance cost versus API hits, and in theory we could have a massive local database to minimize response time.