How to Make Simple Node.js Modules Work in the Browser

Node.js is all about writing small, simple modules that do one thing, and do it well. This can be taken to extremes by crazy people. There’s even a module for multi-line strings!

Some people can’t resist writing frameworks either. This post is not for them.

Actually, it is. I wrote the first version of Seneca, my micro-services framework, suffering from Java withdrawal. It’s too big and needs to be refactored into small modules. That’s what I’m doing now. As it happens some of these modules seem like they could be useful on the front-end, so I decided to see if I could make them browser compatible.

Now, the right way to do this is to use browserify. But where there’s a right way, there’s always a wrong way too. This is it. Sometimes, you just need things to be standalone. It’s not every coder that’s lucky enough to use Node.js, you know.

Instead of reinventing the wheel, I went straight to my number one module crush, underscore, the utility belt of JavaScript. I’m so much in love with this module I automatically bang out npm install underscore at the start of a new project. It feels like part of the language. Underscore started out as just a browser library, and then had to support Node.js. What you have here is exactly what you need:

  • Main file has to work in the browser, and be a Node module;
  • Tests have to be run in the browser, and for Node;
  • You need to provide a minified version;
  • Good choices (one assumes!) for third party tools to do all this.

Go take a look at the github repo. You’re going to copy the approach. This post just documents it so you don’t have to figure it out for yourself.

Let’s start with the final product. Here are three small modules, each of which can run in the browser or Node. They can be installed with npm, or bower.

  • jsonic – A JSON parser for Node.js that isn’t strict
  • gex – Glob Expressions for JavaScript
  • patrun – A fast pattern matcher on JavaScript object properties.

You can use any of these as a template. They all follow the same approach. Here’s what to do.

Browserize/Nodize your Code

You may be starting from either a Node module, or a browser library. In both cases, you need to wrap your code up into a single file. Assuming you’re calling your module mymodule, here’s the start of that file, which you’ll call mymodule.js:

"use strict";

(function() {
  var root = this
  var previous_mymodule = root.mymodule

Let’s dissect the code. You should always “use strict”;. No excuses.

Oh. as an aside, I’m an inveterate avoider of semi-colons. Why can’t we all just, …, get along?

The (function() { isolates your code from anything else in the browser, and has no effect on the Node module. You’re going to call this anonymous function with the current context at the end of the module, which in the browser makes this equal to the window object:

}).call(this);

OK, back to the top. The next thing you’re doing is storing the context (this) in a variable name root, and keeping a reference to any previous variable with the same name as your module. This lets you provide the noConflict feature, like jQuery.

Your module is either going to be a JavaScript object or function (not much point otherwise!). Define it like so, inside the isolation function:

var mymodule = function() {
  ...
}

mymodule.noConflict = function() {
  root.mymodule = previous_mymodule
  return mymodule
}

Now users of your module, in the brower, can call noConflict to avoid conflicts over the mymodule name:

var othername = mymodule.noConflict()
// the variable mymodule is back to its old value from here

How do you export your module into the browser? Or for Node.js? By checking for the existence of module.exports, and acting accordingly. Again, this goes inside your isolation function.

  if( typeof exports !== 'undefined' ) {
    if( typeof module !== 'undefined' && module.exports ) {
      exports = module.exports = mymodule
    }
    exports.mymodule = mymodule
  } 
  else {
    root.mymodule = mymodule
  }

If you’re in a Node.js context, you’ll end up exporting in the normal way: module.exports = mymodule. If you’re in the browser, you’ll end up setting the mymodule property on the window object.

There’s one final niggle. What if you depend on other modules, such as, say, underscore? Place this near the top of your isolation function:

  var has_require = typeof require !== 'undefined'

  var _ = root._

  if( typeof _ === 'undefined' ) {
    if( has_require ) {
      _ = require('underscore')
    }
    else throw new Error('mymodule requires underscore, see http://underscorejs.org');
  }

This code checks for the existence of the require function, and uses require if it exists. Otherwise it assumes you’ve loaded the dependency via a script tag (or a build process), and complains with an exception if the dependency can’t be found.

That’s pretty much all you need on the code front. Check out the source code of the jsonic, gex, and patrun modules on github for real-world examples.

Also, don’t forget to do

npm init

if your project needs a package.json file for Node.

Test Your Code

The jasmine framework is the one to use here. It works nicely both for Node and the browser. Create a test folder and write a mymodule.spec.js file containing your tests:

if( typeof mymodule === 'undefined' ) {
  var mymodule = require('..')
}

describe('mymodule', function(){

  it('something must be done', function(){
    expect( mymodule() ).toBe( 'doing something' )
  })

})

The test file should check to see if the module itself needs to be required in. This is for the Node case, where the module package.json is in the parent folder. In the browser case, you’ll load the module using a script tag before you load this test file, so it’s already defined.

It’s a good idea to get your project built automatically by the Travis-CI service. Follow the Travis-CI instructions to get this working. You’ll need to ensure that Travis can run the jasmine tests, so you’ll need to do

$ npm install jasmine-node --save-dev

This makes sure that jasmine get’s installed locally inside the projects node_modules folder. Which means that in your package.json, you can say:

  "scripts": {
    "test": "./node_modules/.bin/jasmine-node ./test",
  }

and then the command line

npm test

will run your jasmine tests for you, and they will be run by Travis-CI for each commit, which is what you really want.

This setup only tests your code in a Node.js context. Now you need to test in a browser context too. Actually, you need two types of browser test. One you can automate, and one you can run manually in browsers on your machine. The automated test comes courtesy of phantomjs, which is a “headless” browser environment (there’s no visible GUI). This runs your tests in a WebKit browser engine, so it’s pretty much like testing in Chrome or Safari (to a first approximation).

Install phantomjs globally. You don’t need a local copy, as you don’t need it for Travis-CI, and you’ll probably use it for other projects too:

$ sudo npm install phantomjs@1.9.1-0 -g

(At the time of writing, Sep 2013, the latest version of phantomjs does not install due to a bug, so I’m explicitly specifying a known-good version.)

To actually run the tests, you’ll need some supporting infrastructure. This consists of the jasmine library code, a “runner” script, and a HTML file to load it all. Your best bet is to copy the files from one of my example repos (see above). The files are:

  • test/jasmine.html
  • test/run-jasmine.js
  • test/jasmine-1.3.1/*

In jasmine.html, you load up everything you need, and execute the test using the jasmine API. It’s all boilerplage, just replace mymodule anywhere it occurs with the name of your module. If you have additional dependencies, add them under the script tag that loads underscore (this assumes you’ve done an npm install underscore –save already):

<!DOCTYPE html>
<html>
<head>
  <title>mymodulejasmine test</title>

  <link href="jasmine-1.3.1/jasmine.css" rel="stylesheet">

  <script src="../node_modules/underscore/underscore.js"></script>

  <script src="../mymodule.js"></script>

  <script src="jasmine-1.3.1/jasmine.js"></script>
  <script src="jasmine-1.3.1/jasmine-html.js"></script>

  <script src="mymodule.spec.js"></script>

  <script type="text/javascript">
    (function() {
      var jasmineEnv = jasmine.getEnv();

      ... jasmine boilerplate ... 

      function execJasmine() {
        jasmineEnv.execute();
      }

    })();
  </script>
</head>
<body></body>
</html>

Now you add a custom script to your package.json:

  "scripts": {
    "test": "./node_modules/.bin/jasmine-node ./test",
    "browser": "phantomjs test/run-jasmine.js test/jasmine.html",
   }

And run it with:

$ npm run-script browser

And your tests should execute as before.

To test in a real browser, just load the test/jasmine.html file directly. This will execute the tests immediately and display some nicely formatted results.

Publish your Module

Time to inflict your code on the world! Muhaha, etc.

First, you’ll want to minify it for use in browsers. Use uglifyjs2 for this. Note: use version 2! Again, no need to install this locally, as it’s a development tool you’ll want to use for other projects:

sudo npm install uglify-js -g

And then in your package.json:

  "scripts": {
    ...
    "build": "uglifyjs mymodule.js -c \"evaluate=false\" -m --source-map mymodule-min.map -o mymodule-min.js"
  },

This generates mymodule-min.js, and also mymodule-min.map, a source map, which makes debugging in the browser easier.

Now you’re ready. Make sure you’ve filled in all the details in package.json, and, in particular, have chosen a version number, say 0.1.0 to start. Then publish!

$ npm publish

If you’re not yet registered with npm, you need to do so first.

Next, you should also publish your module onto bower.io, which is like npm for browser libraries. Bower uses your github repository tags to generate versions.

So, first, generate a tag using the same version number (unless you are evil):

$ git tag 0.1.0
$ git push --tags

Make sure you’ve committed everything before you do this!

And then register with bower:

$ bower register mymodule git://github.com/your-account/mymodule.git

And now you can do bower install mymodule, as well as npm install mymodule.

If you have improvements to this approach, let me know in the comments!




Posted in Node.js | 10 Comments

Introducing NodeZoo.com, a search engine for Node.js modules

I made a bet on a new programming platform 3 years ago, and it paid off. Every line of code that has earned me money since then has been run by Node.js. In case you missed it, Node.js is the evil step-child of Netscape and Gmail that is going to take over software development for the next decade. Starting now.

Brendan Eich invented the JavaScript programming language in 10 days during a death-march in the late 90’s while working at Netscape. It’s both utterly broken and brilliantly productive. Google invented Gmail, and then realized it was too slow. So they wrote a fast web browser, Chrome. Along the way, they wrote an entirely new JavaScript engine, called V8, that is damn fast. Now, V8 can be taken out of Chrome, and used by itself. Throw in some scalable event-driven architecture, and you get the most badass way to build web stuff going. That’s Node.js. Oh, and it does flying robots too…

The best thing about Node.js is the module system, npm. This is separate, and emerged slightly later than Node itself. It comprehensively solves the “versioning problem”. When you break software into modules, you end up with many versions of these modules. As time goes by, different modules depend on different versions of other modules, and it gets messy.

Think of it like this, Rachel invites Joey to a party in her apartment, and Joey in turn invites Phoebe. Rachel also invites Chandler, who in turn invites Ursula, Phoebe’s evil twin sister, who is pretending to be Phoebe. The party ends up with incompatible guests, and fails. Here’s what NPM does. The party is split between two apartments. One apartment gets Joey and Phoebe, the other Chandler and Ursula. Rachel hangs out in the hallway and is none the wiser. Happy days.

When you publish a Node.js module to npm, you specify the other modules you depend on, and the versions that you can work with. npm ensures that all modules you install only see compatible dependencies. No more breakage.

There’s another great thing about the Node.js module system. It emerged organically from the nature of the JavaScript language and the founding personalities (npm was started by @izs). Node.js modules are small. Really small.  Which is great, because that makes Node.js anti-fragile. There are no dependencies on standards, no need for curation, no need for foundations that bless certain modules, no blockage when module authors dig their heels in. The system tolerates bad coders, module abandonment, personality implosions, and breaking changes. With 23,000 plus modules at the time of writing, you’ll always find what you need. You might have to fix a few things, tweak a few things, but that’s better than being completely stuck.

This wonderful anarchy does introduce a new problem. How do you find the right module? There’s a chicken and egg period when you’re new to Node – it seems like there are ten options for almost anything you want to do.

The question is how to solve this problem in a scalable way – not everybody can go to all the conferences or hang out on IRC – although you really should if you can. “Ask the community” doesn’t scale, and the latency is pretty bad too. Also, if your goal is pick one module, the über-helpfulness of the Node community sort of works against that, as you’ll get more than one recommendation! The npm site npmjs.org delegates the search question to Google. The results are less useful than you’d think. Google’s algorithms don’t give us what we need, and the search results, in terms of scannability, are pretty lame. The npm command line has a free text search function. It’s nice, but the results are pre-Google internet quality, and for the same reasons – free text search doesn’t do great finding relevant results. Then there’s the Node Toolbox, which is like a 90’s Yahoo for Node. There’s a human limit to curation and the amount of modules that can be categorized. Ontology building is, frankly, Sisyphusarbeit.

This situation is itchy. Just annoying enough to make you write some code to solve it. Towards the end of last year I randomly ended up reading that wonderful article “The Anatomy of a Large-Scale Hypertextual Web Search Engine” – written back when Larry and Sergey still had Stanford email addresses. The thing that hits you is how simple the idea of PageRank is: if popular web pages point to your web page, your web page must be really good! And the math is just lovely and so … simple! It should have been obvious to everyone.

In a gross misapplication of the underlying mathematical model (random web surfing), the same idea can be applied to Node.js modules to generate a NodeRank – a measure of how awesome your module is. How does it work. Modules depend on other modules. If lots of modules depend on a particular module, then that module must be pretty popular. A good example is express, a web framework. But that’s not enough! The algorithm asks you to look further. The modules that express itself depends on are more popular still. Case in point, connect, a HTTP server framework. The connect module needs to get some NodeRank juice from the express module. That’s what the algorithm does: your module is awesome if it’s used by other awesome modules!

Implementing the algorithm is tricky. But Google to the rescue! (ironic, capital I). I found a great blog post, with python code, that explains how to calculate a fast approximation. Thanks Michael Nielsen! Of course, a little part of me was betting a Node.js port would run even faster (it did, much faster!). So I hacked up an implementation.

Now, You can pull down the entire npm registry, it’s just a CouchDB database. A bit of manipulation with Dominic Tarr’s excellent JSONStream, and out pops a NodeRank for every module.

A ranking by itself does not a search engine make. At the risk of being branded a heretic, I’m using ElasticSearch for the main search engine. Yes, it’s Java. No, it’s not Node. Hey Whadda You Gonna Do! ElasticSearch lets you add a custom score to each document – that’s where the NodeRank goes. I hacked all this together into a little website that lets you search for Node modules: nodezoo.com

You use nodezoo to search for modules in the same way as you use Google: just type in some relevant terms and something reasonable should come back. It’s not necessary for the module description or keywords to contain your search terms. The results still need refinement (big time!), but I need complaints to know where it’s going wrong – tweet me: @rjrodger.

The search results also attempt to provide some additional context for deciding which module to use. They include a link to the github project (if it exists), the stars and forks count, and a few other pieces of meta data.

The nodezoo system itself is still pretty hacky. One key piece that’s missing is updating the search index in real time as people publish modules. At the moment it’s a batch job. And it downloads the entire database each time. That’s probably not a good thing.

I’m going to do a series of blog posts on this little search engine, explaining how it works, and walking through the refactoring. The code is all on github https://github.com/rjrodger/nodezoo if you want to follow along. This is part 1. More soon!




Posted in Node.js | Leave a comment

Why I Have Given Up on Coding Standards

Every developer knows you should have a one, exact, coding standard in your company. Every developer also knows you have to fight to get your rules into the company standard. Every developer secretly despairs when starting a new job, afraid of the crazy coding standard some power-mad architect has dictated.

It’s better to throw coding standards out and allow free expression. The small win you get from increased conformity does not move the needle. Coding standards are technical ass-covering. At nearForm I don’t want one, because I want everyone to think for themselves.

There’s a lot of noise out there. The resurrection of JavaScript is responsible. One “feature” in particular: optional semi-colons. Terabytes of assertion, conjecture and counter-argument are clogging up the intertubes. Please go write some code instead. You know who you are.

Well-meaning, and otherwise fabulous developers are publishing JavaScript coding standards and style guides. You are all wrong. Stop trying to save the world.

Here’s what’s happening: when you started coding you had no idea what you were doing. It was all fun and games until you lost an eye. Once you hurt yourself one too many times with sloppy code, you came to understand that you were a mere apprentice. Starting on the path to master craftsman, you soaked up Code Complete, The Pragmatic Programmer, and of course, Joel.

And then, it happened. On the road to Damascus you gained insight. Your new grab bag of tricks would make you a rock star programmer. Your productivity had already doubled (looking back, that’s hardly surprising). And now you needed to spread the word. What worked for you will save others. You cajoled, you preached, you pestered. You lectured your boss on the need for best practices and standards. And most unforgivable of all, you blogged.

Most developers don’t make noise. Those who make noise, get promoted. You got promoted. You imposed your brilliant ideas on others, certain of victory. You wrote a coding standards document, and you made it law.

And then, nothing. The same old slog, the same death marches, the same bugs, the same misery. No silver bullet.

After a few years, you stopped coding and became a manager. You still know that coding standards, rules and regulations are vital. All it requires is proper implementation. You’ve never quite got there, but you’ll keep trying. Hit ’em over the head a bit more. Code metrics! In any case, as a manager you get to delegate the pain away.

There is another road. Perhaps you went back to coding, or never left. Over time you came to realize that you know so little, and all your wonderful ideas are sand castles. You’re washed up. This is the next level of insight.

Other people are smarter than you. Not some of them. All of them. The coder writing the user interface? They are smarter than you … about the user interface. You’re not writing the code. Why don’t you trust them? No, that’s not the right question. They will still mess up. Why are you making a bigger mess by telling them what to do?

You get to the point where you understand that people are not machines. You need to push intelligence out to the edges. You need to give up control to get the best results.

So why do most intelligent coders do exactly the opposite? What makes us such ready dictators?

First, you transfer your own experiences onto others. But not everybody thinks like you. Brains are pretty weird.

Second, control feels good. It’s a comfortable hole in the sand. But you can’t tell coders what to do. Cats don’t herd.

Third, you get to duck responsibility. Everybody on the team does. We followed the rules! You failed. Yes, but we followed the rules! Well in that case, here’s another project…

Fourth, good intentions; best practices; professionalism; engineering – the seductions of process. You are chasing the same gold stars you got when you were eight years old. But how is the master craftsman judged? By results, only.

Fifth, idealism, the belief that you can understand the world and bend it to your will. Something we’re pretty good at as a species … after we fail a thousand times, and with repeatable processes. Software projects are always one of a kind.

There are worse sins than these. You only need one of them to end up with a coding standard.

The truly evil thing about coding standards is what they do to your heart, your team’s heart. They are a little message that you are not good enough. You cannot quite be trusted. Without adult supervision, you’ll mess up.

We started nearForm about a year ago, and one thing we really care about is writing great code for our clients. In earlier lives, I’ve tried all the processes and methods and rules of thumb. They all suck. None of them deliver.

Starting with the principle that our coders are really smart. That does work.

I expect everyone to write good clean code. You decide what that means. You decide if you can sleep at night with random code layouts and inconsistent variable names. But you know what, maybe they just don’t matter for a 100 line node.js mini-server that only does one thing. You decide.

It is your responsibility, because you can code.




Posted in Uncategorized | 306 Comments

Introducing the Parambulator module for validating “options” objects in Node.js

If you’ve used any Node.js modules at all, you’ll have noticed a common pattern with configuration. Many modules provide complex functionality that you can control by providing a JavaScript object with a bunch of named options, possibly with sub options. Here’s an example:

var toast = require('toaster').cook({ duration:5, bread:'white', sides:['A','B'] })

The duration, bread, and sides options need to be a number, a string, and an array respectively. It would be nice to validate this, and provide useful error messages if the calling code gets it wrong.

For real world examples of this pattern, check out:

In each of these modules, validation of the options is ad hoc and mixed in with other code. Sometimes you get an error message, if you’re lucky.

A real solution to this problem should be:

  • easy and quick to use, otherwise why bother,
  • declarative (i.e. no code required), that’s just good practice,
  • provide good error messages, which is pretty much the whole point,
  • customizable, in case you have weird and wonderful option structures.

So what are your options for validating the options, and providing decent error messages?

The basic approach is just to do ad hoc validation in code, and set a few defaults while you’re at it:

function handleoptions(opts) {
  if( !opt.foo ) throw new Error(“No foo, bro!”)
  opt.bar = opt.bar || “default value”
  var port = parseInt(opt.port,10)
  ...
}

That’s nice. But you need to read the code to get a clear idea of what’s going on. Also, things start out neat, but eventually that validation logic ends up all over the place. And, unless you’re incredibly pedantic, you’ll miss many error conditions. Oh, and don’t forget about the bugs in the validation code itself – it happens! So easy, quick and customizable, but not declarative, and error messages are a bit take it or leave it.

A more sophisticated approach is to recognize that this problem can be generalized. So use JSONSchema. As with XML schema, this way lets you specify exactly what you want the object to conform to. But it’s hard work (you have to be pretty exact), and JSON Schema are hard to read, and it’s a lot of overhead. You do get to tick the declarative box, but it’s not easy or quick, the error messages are nasty, and customization is “here be dragons” territory.

So, in the best Node.js tradition, I decided to solve this little problem with a new module: Parambulator.  Actually this module was also written with another purpose in mind: as example code for the nodejsdublin developer meetup to provide a small example module for those looking to write their own.

Parambulator let’s you define a quick and easy schema for your options. You don’t have to be exhaustive. Unlike JSONSchema, you can use the same object structure as your expected input, so there’s fewer mental gymnastics. And you get nice error messages by default, which you can customize. Here’s an example:

var toastcheck = parambulator({
  duration: {type$:'integer'},
  bread:    {enum$:['white','brown']},
  sides:    {type$:'array'}
})

toastcheck.validate( options, function(err) { … } )

This gives you “good enough” validation – about as much as you bother writing with ad hoc code. And you can be stricter if you like. It’s declarative, so you can see at a glance what the validation checks are. It’s still a schema, of sorts, but it’s a lot simpler than a full JSONSchema. And you can customize the validation rules and error messages (see the docs for details).

The next time you’re writing a module, and you’d like to make your options handling a little more robust, try it out and let me know. Also if you write any cool validation rules, send me a pull request!

Parambulator: https://github.com/rjrodger/parambulator

npm install parambulator

 




Posted in Node.js | Leave a comment

Node.js – How to Write a For Loop With Callbacks

Let’s say you have 10 files that you need to upload to your web server. 10 very large files. You need to write an upload script because it needs to be an automated process that happens every day.

You’ve decided you’re going to use Node.js for the script because, hey, it’s cool.

Let’s also say you have a magical upload function that can do the upload:

upload('myfile.ext', function(err){
  if( err ) {
    console.log('yeah, that upload didn't work: '+err)
  }
})

This upload function uses the callback pattern you’d expect from Node. You give it the name of the file you want to upload and then it goes off and does its thing. After while, when it is finished, it calls you back, by calling your callback function. It passes in one argument, an err object. The Node convention is that the first parameter to a callback is an object that describes any errors that happened. If this object is null, then everything was OK. If not, then the object contains a description of the error. This could be a string, or a more complex object.

I’ll write a post on that innards of that upload function – coming soon!

Right, now that you have your magical upload function, let’s get back to writing a for loop.

Are you a refugee from Javaland? Here’s the way you were thinking of doing it:


var filenames = [...]

try {
for( var i = 0; i < filenames.length; i++ ) { upload( filenames[i], function(err) { if( err ) throw err }) } } catch( err ) { console.log('error: '+err) } Here's what you think will happen: 1. upload each file in turn, one after the other 2. if there's an error, halt the entire process, and throw it to the calling code Here's what you just did: 1. Started shoving all 10 files at your web server all at once 2. If there is an error, good luck catching it outside that for loop – it's gone to the great Event Loop in the sky Node is asynchronous. The upload function will return before it even starts the upload. It will return back to your for loop. And your for loop will move on to the next file. And the next one. Is your website a little unresponsive? How about your net connection? Things might be a little slow when you push all those files up at the same time. So you can't use for loops any more! What's a coder to do? Bite the bullet and recurse. It's the only way to get back to what you actually want to do. You have to wait for the callback. When it is called, only then do you move on to the next file. That means you need to call another function inside your callback. And this function needs to start uploading the next file. So you need to create a recursive function that does this. It turns out there's a nice little recursive pattern that you can use for this particular case:

var filenames = [...]

function uploader(i) {
  if( i < filenames.length ) {
    upload( filenames[i], function(err) {
      if( err ) {
        console.log('error: '+err)
      }
      else {
        uploader(i+1)
      }
    })
  }
}
uploader(0)
Do you see the pattern?
repeater(i) {
  if( i < length ) {
     asyncwork( function(){
       repeater( i + 1 )
     })
  }
}
repeater(0)
You can translate this back into a traditional for(var i = 0; i < length; i++) loop quite easily: repeater(0) is var i = 0,
if( i < length ) is i < length, and
repeater( i + 1 ) is i++

When it comes to Node, the traditional way of doing things can mean you lose control of your code. Use recursion to get control back.




Posted in Node.js | 23 Comments

The JavaScript Disruption

The mainstream programming language for the next ten years will be JavaScript. Once considered a toy language useful only for checking form fields on web pages, JavaScript will come to dominate software development. Why this language and why now?

What is JavaScript? It is the language that web designers use to build web pages. It is not the language the software engineers use to build the business logic for those same web sites. JavaScript is small, and runs on the client, the web browser. It’s easy to write unmaintainable spaghetti code in JavaScript. And yet, for all these flaws, JavaScript is the world’s most misunderstood language. Douglas Crockford, a senior engineer at Yahoo, is almost singlehandedly responsible for rehabilitating the language. In a few short, seminal online essays published shortly after the turn of the century, Crockford explains that JavaScript is really LISP, the language of artificial intelligence. JavaScript borrows heavily from LISP, and is not really object-oriented at all. This curious design was well suited to a simple implementation running in a web browser. As an unintended consequence, these same mutations make JavaScript the perfect language for building cloud computing services.

Here is the prediction then: within ten years, every major cloud service will be implemented in JavaScript. Even the Microsoft ones. JavaScript will be the essential item in every senior software engineer’s skill set. Not only will it be the premier language for corporate systems, JavaScript will also dominate mobile devices. Not just phones, but tablets, and whatever enters that device category. All the while, JavaScript will continue to be the one and only language for developing complex interactive websites, completing drowning out old stalwarts such as Flash, even for games. For the first time in a history, a truly homogeneous programming language infrastructure will develop, with the same toolkits and libraries used from the top to the bottom of the technology stack. JavaScript everywhere.

How can such a prediction be made? How can one make it so confidently? Because it has all happened before, and it will happen again. Right now, we are at a technology inflexion point, yet another paradigm shift is upon us, and the JavaScript wave is starting to break. We have seen this before. Every ten years or so, the
programming world is shaken by a new language, and the vast majority of developers, and the corporations they work for, move en mass to the new playground. Let’s take a look at the two technology shifts that have preceded this one, so that we can better understand what is happening right now.

Prior to Java in the first decade of this century, the C++ language was dominant in the final decade of the last. What drove the adoption of C++? What drove the subsequent adoption of Java? And what is driving the current adoption of JavaScript? In each case, cultural, technological and conceptual movements coalesced into a tipping point that caused a sudden and very fast historical change. Such tipping points are difficult to predict. No such prediction is made here – the shift to JavaScript is not to come, it has already begun. These tipping points are driven by the chaotic feedback channels at the heart any emerging technology. One need only look at the early years of the motor vehicle: steam, electric and oil-powered vehicles all competed for dominance in similar historical waves.

What drove C++? It was the emergence of the object-oriented programming paradigm, the emergence of the PC and Microsoft Windows, and support from academic institutions. With hindsight such large-scale trends are easy to identify. The same can be done for Java. In that case, the idea of the software virtual machine, the introduction of garbage collection – a language feature lacking in C++ that offers far higher programmer productivity, and first wave of internet mania. Java, backed by Sun Microsystems, became the language of the internet, and many large corporate networked systems today run on Java. Microsoft can be included in the “Java” wave, in the sense the Microsoft’s proprietary competitive offering, C#, is really Java with the bad bits taken out.

Despite the easily recognizable nature of these two prior waves, one feature that both share is that neither wave led to a true monoculture. The C++ wave was splintered by operating systems, the Java wave by competing virtual languages such as C#. Nonetheless, the key drivers, the key elements of each paradigm shift, created a decade- long island of stability in the technology storm.

What is happening today? What are the key changes? Cloud computing is one. For the first time, corporations are moving their sensitive data and operations outside of the building. They are placing mission critical systems into the “cloud”. Cloud computing is now an abused term. It means everything and nothing. But one thing that it does mean, is that computing capacity is now metered by usage. Technology challenges are no longer solved by sinking capital into big iron servers. Instead, operating expenses dominate, driving the need for highly efficient solutions. The momentum for green energy only exacerbates this trend. Needless to say, Java/C# are not up to the job. We shall see shortly that JavaScript is uniquely placed to benefit from the move to cloud
computing.

Mobile computing represents the other side of the coin. The increasing capabilities of mobile devices drive a virtuous circle of cloud-based support services leading to better devices that access more of the cloud, leading to ever more cloud services. The problem with mobile devices is severe platform fragmentation. Many different platforms, technologies and form factors vie for dominance, without a clear leader in all categories. The cost of supporting more than one or two platforms is prohibitive. And yet there is a quick and easy solution: the new HTML5 standard for websites. This standard offers a range of new features such as offline apps and video and audio capabilities that give mobile websites almost the same abilities as native device applications. As HTML5 adoption grows, more and more mobile applications will be developed using HTML5, and of course, made interactive using JavaScript, the language of websites.

While it is clear that the ubiquity of HTML5 will drive JavaScript on the client, it is less clear why JavaScript will also be driven by the emergence of cloud computing. To see this, we have to understand something of the way in which network services are built, and the challenges that the cloud brings to traditional approaches. This challenge is made concrete by what is known as the C10K problem, first posed by Dan Kegel in 2003.

The C10K problem is this: how can you service 10000 concurrent clients on one machine? The idea is that you have 10000 web browsers, or 10000 mobile phones, all asking the same single machine to provide a bank balance or process an e-commerce transaction. That’s quite a heavy load. Java solves this by using threads, which are way to simulate parallel processing on a single physical machine. Threads have been the workhorse of high capacity web servers for the last ten years, and a technique known as “thread pooling” is considered to be industry best practice. But threads are not suitable for high capacity servers. Each thread consumes memory and processing power, and there’s only so much of that to go round. Further threads introduce complex programming programs, including a particularly nasty one known as “deadlock”. Deadlock happens when two threads wait for each other. They are both jammed and cannot move forward, like Dr. Seuss’s South-going Zax and North-going Zax. When this happens, the client is caught in the middle and waits, forever. The website, or cloud service, is effectively down.

There is a solution to the this problem – event-based programming. Unlike threads, events are light-weight constructs. Instead of assigning resources in advance, the system triggers code to execute only when there is data available. This is much more efficient. It is a different style of programming, one that has not been quite as fashionable as threads. The event-based approach is well suited to the cost structure of cloud computing – it is resource efficient, and enables one to build C10K-capable
systems on cheap commodity hardware.

Threads also lead to a style of programming that is known as synchronous blocking code. For example, when a thread has to get data from a database, it hangs around (blocks) waiting for the data to be returned. If multiple database queries have to run to build a web page (to get the user’s cart, and then the product details, and finally the current special offers), then these have to happen one after other, in other words in a synchronous fashion. You can see that this leads to a lot of threads alive at the same time in one machine, which eventually runs out of resources.

The event based model is different. In this case, the code does not wait for the database. Instead it asks to be notified when the database responds, hence it is known as non-blocking code. Multiple activities do not need to wait on each other, so the code can be asynchronous, and not one step after another (synchronous). This leads to highly efficient code that can meet the C10K challenge.

JavaScript is uniquely suited to event-based programming because it was designed to handle events. Originally these events were mouse clicks, but now they can be database results. There is no difference at an architectural level inside the “event loop”, the place where events are doled out. As a result of its early design choices to solve a seemingly unrelated problem, JavaScript as a language turns out to be perfectly designed for building efficient cloud services.

The one missing piece of the JavaScript puzzle is a high performance implementation. Java overcame it’s early sloth, and was progressively optimized by Sun. JavaScript needed a serious corporate sponsor to really get the final raw performance boost that it needed. Google has stepped up. Google needed fast JavaScript so that its services like Gmail and Google Calendar would work well and be fast for end-users. To do this, Google developed the V8 JavaScript engine, which compiles JavaScript into highly optimized machine code on the fly. Google open-sourced the V8 engine, and it was adapted by the open source community for cloud computing. The cloud computing version of V8 is known as Node.js, a high performance JavaScript environment for server.

All the pieces are now in place. The industry momentum from cloud and mobile computing. The conceptual movement towards event-based systems, and the cultural movement towards accepting JavaScript as a serious language. All these drive towards a tipping point that has begun to accelerate: JavaScript is the language of the next wave.




Posted in Uncategorized | 72 Comments

Node.js – Dealing with submitted HTTP request data when you have to make a database call first

Node’s asynchronous events are fantastic, but they can have a sting in the tail. Here’s a solution to something that you’ll probably run into at some point.

If you have a HTTP endpoint that accepts JSON, XML, or even a streaming upload, you normally read the data in using the data and end events on the request object:

var bodyarr = []
request.on('data', function(chunk){
  bodyarr.push(chunk);
})
request.on('end', function(){
  console.log( bodyarr.join('') )
})

This works in most situations. But when you start building out your app, adding in production features like user authentication, then you run in trouble.

Let’s say you’re using connect, and you write a little middleware function to do user authentication. Don’t worry if you are not familiar with connect – it’s not essential to this example. Your authentication middleware function gets called before your data handler, to make sure that the user is allowed to make the request and send you data. If the user is logged in, all is well, and your data handler gets called. If the user is not logged in, you send back a 401 Unauthorized.

Here’s the catch: your authentication function needs to talk to the database to get the user’s details. Or load them from memcache. Or from some other external system. (Don’t tell me you’re still using sessions in this day and age!)

So here’s what happens. Node will happily start accepting inbound data on the HTTP request, but before you’ve had a chance to bind your handler functions to the data and end events. Your even set up code only gets called after the authentication middleware is finished its thing. This is just the way that Node’s asynchronous event loop works. In this scenario, by the time Node gets to your data handler, the data is long gone, and you’ll stall waiting for events that never come. If your response handler depends on that end event, it will never get called, and Node will never send a HTTP response. Bad.

Here’s the rule of thumb: you need to attach your handlers to the HTTP request events before you make any asynchronous calls. Then you cache the data until you’re ready to deal with it.

Luckily for you, I’ve written a little StreamBuffer object to do the dirty work. Here’s how you use it. In that authentication function, or maybe before it, attach the request events:

new StreamBuffer(request)

This adds a special streambuffer property to the request object. Once you reach your handler set up code, just attach your handlers like this:

request.streambuffer.ondata(function(chunk) {
  // your funky stuff with data
})
req.streambuffer.onend(function() {
  // all done!
})

In the meantime, you can make as many asynchronous calls as you like, and your data will be waiting for you when you get to it.

Here’s the code for the StreamBuffer itself. (Also as a Node.js StreamBuffer github gist).


function StreamBuffer(req) {
var self = this

var buffer = []
var ended = false
var ondata = null
var onend = null

self.ondata = function(f) {
for(var i = 0; i < buffer.length; i++ ) { f(buffer[i]) } ondata = f } self.onend = function(f) { onend = f if( ended ) { onend() } } req.on('data', function(chunk) { if( ondata ) { ondata(chunk) } else { buffer.push(chunk) } }) req.on('end', function() { ended = true if( onend ) { onend() } }) req.streambuffer = self } This originally came up when I was trying to solve the problem discussed in this question in the Node mailing list.




Posted in Node.js | Leave a comment

The Six Key Mobile App Metrics you Need to be Tracking.

Mobile applications are web sites, and traditional web analytics are not appropriate for mobile applications. What you need is insight that will make your app more effective. You will not find this insight by tracking downloads and installs, phone platforms and versions, screen sizes, new users per day, frequency of use, or any of the traditional metrics. Many of these have been dragged over, kicking and screaming, from the world of web analytics. Yes, these numbers will give you surface measures of the effectiveness of your app. Yes, they are important to know. Yes, you can use them to make pretty charts. But they are all output measures. They measure the results of your app design, interaction model and service level. They do not tell you what to change to achieve your business goals.

To gain real insight into your app and its users, insight that you can use to make your app more effective, you need to measure inputs. There are the six key input metrics that we cover in this article. Funnel analysis tells you why users are failing to complete your desired user actions, such as in-app purchases, or ad clicks. Measuring social sharing tells you what aspects of your app are capturing the hearts and minds of your users. Correlating demographic data with user behaviour will tell you why your user base does what it does. Tracking time and location, together, gives you insights into the contexts in which your app is used. Mobile apps design naturally tends toward deeply hierarchical interfaces – how optimised is yours? Finally, the real business opportunity may be something you never even thought of, so capturing the emergent behaviours of your user base is critical. Let’s tale a closer look at each of these metrics, and then take a look at how you can get this data with today’s services.

Funnel analysis allows you to determine the parts of your application that are preventing your users from reaching your business goals. Let’s take a simple unit converter app as a example. The canonical unit converter app lets you convert between kilograms and pounds, or inches and centimeters, and so on. Let’s say one of your business goals is to get your users to sign up to a mailing list from within the app. If you look at the user journey this requires, you might have a call-to-action button on the main screen, followed by a form to capture the email, followed by an acknowledgment page telling people to check their email accounts to verify their subscription. Funnel analysis breaks this user journey down into discrete steps: the tap on the button, typing in the email address, submitting the email address, reading the acknowledgement page. You need to know the percentage of users you are losing at each stage. Probably more than 50%. Understanding this activity funnel to your desired business goal is critical to building an effective app. Perhaps the next version should drop the call-to-action button, or use better copy text. Use funnel analysis to measure this.

Social media are a key element in the promotion of your app. When you leverage these media, you need to track the viral spread of your app. This is more than simple counting the number of tweets or facebook likes. You need to understand the structure of the social network you are attempting to permeate. You need to find the highly interconnected individuals, those who recommendations are actively followed by their friends and acquaintances. In any social network there are always a small set of key individuals who know everybody. You need to identify these people and engage with them. This might be as simple as special promotions, or even making them employees! Your mobile analytics solution should be telling you who these people are.

Do you understand the demographic constitution of your users, and can you correlate these demographics with user behaviour? This is the classic diapers and beer effect. A major UK supermarket chain found, through mining their purchase data, that increased beer purchases were correlated with increased diaper purchases. Cross-referencing this with the demographic data they have collected via a loyalty card scheme, the supermarket chain was able to figure out that parents with new babies were staying at home having a homemade meal and a beer, rather then going out to restaurants. This allowed for far more effective targeted advertising. Demographic data are more difficult to capture in the mobile app space, but carriers such as Sprint are now beginning to offer this information.

Location is an important element of the mobile user experience, and many mobile analytics services will offer location analysis. However this is not enough. Again, simply counting the number of users in various geographies does not tell you very much. It validates a business goal, but does not give you insight. You actually need to track the temporal dimension as well. Time and space must be analyzed together. Take our unit converter app. Usage of the app on Sunday afternoons within DIY store differs from at-home usage at mealtime during the week. In the first case you might like to show ads for power tools, in the second ads for food products. Mobile analytics offerings have yet to reach this level of capability, so you may need to consider custom solutions for this type of analysis.

Mobile application interfaces are very hierarchical in nature. This means that there are lots of screens with small amounts of information that the user has to navigate through. There simple isn’t enough screen space to show too much information at once. As result, the careful design of the screen hierarchy is critical to effective use of the app. If a particular function, such as in-app purchases, is too deeply buried, you will not achieve your goals for the app. Therefore it is very important to measure of the navigation pathways within the app. Berkeley University in California determined the layout of their campus walkways by not laying any paths at first. After the students had trampled the lawns for a year, they then build the pathways where the students had walked. This is what you need to do. (Actually, the Berkeley story is an urban legend, but it’s still a great one)

The final metric is something that requires a certain open mindedness. It can be measured using some heavy mathematics, but it can also be noticed intuitively. When you put a product on the market, it may well be the case that your customers start using it in weird and wonderful ways, that you never imagined. Hashtags (#thesethings) on twitter are a good example. Twitter did not invent them, but noticed that their users had come up with this interesting convention for marked content themes. They embraced this emergent behavior and were handed a core product feature on a plate. Of all the metrics in this article, this one, emergent behaviour, is the most precious. It could turn you into the next facebook (relationship status? What a feature!), or you could kill the golden goose without even knowing it by ignoring your users (Iridium satellite phones anyone?). Detecting emergent behavior is both and art and a science – keep your eyes open.

First published in GoMoNews Nov 2010.




Posted in Uncategorized | Leave a comment

Debug PhoneGap Mobile Apps Five Times Faster

PhoneGap is a fantastic open source project. It lets you build native mobile apps for iPhone, Android and others using only HTML, CSS and JavaScript. It’s a real pleasure to work with. It makes developing mobile apps a lot faster.

Still, you might find that your debug cycle is still too slow. After all, you still have to deploy your app to your phone for proper testing, and this can chew up precious time. The faster you can wash, rinse and repeat, the faster you can debug, and the faster you can deliver.

One way to speed things up is to use Safari on your desktop. There’s an even faster technique, but we’ll get to that in a minute. Using a WebKit-based desktop browser like Safari means that your development cycle is almost as fast as building a static website. Edit, Save, Reload. Just point Safari at the www/index.html file in your PhoneGap project and away you go.

Well almost.

Desktop browsers don’t offer exactly the same API, nor do they work in exactly the same way. Some mobile functions, like beeping or vibrating the phone are not really testable. The biggest issue though is that desktop browsers are too fast. Don’t forget that your runtime target is a mobile version of WebKit, such as Mobile Safari. Another issue is that touch gestures are tricky to handle, and have to be simulated with click events. It is worth it though for the fast development turnaround for certain kinds of functionality.

The obvious next step is to compile up your app in XCode and deploy to the simulator. Again, this works pretty well, but even the simulator has differences from the actual device, and again, it is just too fast. So what else can you do?

Why not install your native app as a web app? Sounds weird I know. The whole point of using PhoneGap is so that your apps can be native! But, if you install your app as a web app, guess what? No more installs! You just reload the app directly on your device every time you make a change.

Setting this up requires a little configuration. You need to run a web server to serve up the files in the www folder of the PhoneGap project. nginx is a good choice – here’s a simple configuration snippet:

[gist id=”604802″]

You can then point your browser at http://<em>your-ip</em>/myapp/index.html and there’s your app! Do this using mobile Safari on your device, hit the + button and select “Add to Home Screen” to install as a web app, and away you go.

The big advantage to this approach is that you can test your app pretty much as it will appear and behave. You can even access the mobile safari debug log. Just remember to use the special meta tags to get rid of the browser chrome.

[gist id=”604807″]

One further advantage is that the API environment will now be slightly closer to the full PhoneGap mobile API. Of course, you won’t be able to do things that can only be done using PhoneGap, but this gets you quite far along the road.

One final trick. Do the same thing on the desktop iPhone emulator and speed up your testing there as well!




Posted in Uncategorized | 1 Comment

The Difference Between Alchemy and Chemistry

Paris. It is the 8th of May, 1794. Antoine Lavoisier, a partner in the despised Ferme général, stands before the guillotine. As a senior partner of the Ferme général, a tax collection agency for Louis XVI, Lavoisier is one of many wealthy aristocrats beheaded during the French revolution. Later, Joseph-Louis Lagrange, the esteemed Italian mathematician (and without whom today no satellite would make it into orbit), would write: “It took them only an instant to cut off his head, but France may not produce another such head in a century.”

Why would Lagrange care for a wealthy tax-collector? This tax-collector, the chemist Antoine Lavoisier, put to death the most embarrassing of the pseudo-sciences: alchemy. He did this not by great experiments (although he did some of those), nor by great denouncements (he left that to Robert Boyle’s The Sceptical Chymist), nor by great popularity (not many rushed to save him from the guillotine). Alchemy was ultimately defeated by the creation of a common language for naming chemical elements. We still use much of this language today when we talk of sulfates or oxides. Published in 1787, Lavoisier’s Méthode de nomenclature chimique describes an organised systematic method for naming chemical compounds, both those already known, and importantly, those yet to be discovered.

Why does this matter? And why does this matter more than Lavoisier’s other work (funded by all that tax collecting)? The establishment of a common language and a common standard for chemistry allowed this new science to separate itself from the the medieval confidence trick that is alchemy. The core cultural aspects of the practice of alchemy are secrecy, obfuscation, indirection, and mysticism. To read a given alchemical text, and to then attempt to reproduce the activities (calling them experiments is too kind) described, was often impossible, even for experienced alchemists. The language of alchemy is one of multiple dialects, private jokes, and over-the-top jargon. The pinnacle of alchemical exposition is the wonderful illustrated manuscript Mutus Liber. Published in France in 1677, this book consists of nothing more than a series of mystical illustrations, presented without explanation. These illustrations describe, in considerable detail, the process whereby one can obtain Gold from Mercury. I have taken some time to divine the meaning of this great work. I alone can finally reveal the mystical secrets contained within its fifteen sublime pages. It now seems clear that it is a fairly straightforward introduction to the photoneutron process (briefly: Mercury 198 + 6.8MeV gamma ray 1 neutron + Mercury 197 Gold 197 + 1 positron). Sadly, schematics for a nuclear research reactor were not included.

How powerful ultimately, was Lavoisier’s new chemical language? Powerful enough to convince Richard Kirwan, proponent of the phlogiston theory of fire (this was a magical substance released via burning), to renounce his views and accept those of Lavoisier. Of course, the science and the experiments did the grunt work. But the conversion of Kirwan has more to do with open communication and open data than lab work. Written in English, Lavoisier would not have been able to read Kirwan’s 1787 Essay on Phlogiston and the Constitution of Acids, were it not for the remarkable scientific partnership that he formed with his wife, Marie-Anne Pierette Paulze, one of the great unsung heroes of modern chemistry. Not only did Marie-Anne, proficient in Latin and English, translate Kirwan’s book, she also translated much of Lavoisier’s copious correspondence. This was possible only because she herself was a subject matter expert, working closely with Lavoisier as a co-researcher. Her documentation of their work, particularly in the form of engravings, is an important part of our shared scientific heritage.

Open communication between scientific collaborators led to open communication between scientific rivals. In 1791 Kirwan pronounced himself a convert to Lavoisier’s Oxygen theory of combustion (carefully established by weighing the reaction components before and after burning to establish that fire does not create or destroy matter, only converts it to another form). How was this possible? Lavoisier and Kirwan communicated using a common chemical language. By freeing themselves from the obtuseness of alchemy, they were able to communicate directly and openly. Kirwan could always be certain, using the Méthode de nomenclature chimique (an english translation was available as early as 1788), that he and Lavoisier were talking about the same chemicals.

The development of an open standard lead to an exponential explosion of research progress in the nascent field of chemistry. This is simply another instance of Metcalfe’s law: the value of a network (of scientists) grows exponentially with the number of interlinked nodes (scientists who can communicate with other scientists). There was no sort of mathematical alchemy in the 1700’s – mathematics already had a common set of concepts. Joseph Priestley, discoverer of Oxygen, writing nine short years after Lavoisier’s book, shows us the power of exponential growth: “There have been few, if any, revolutions in science so great, so sudden, and so general, as the prevalence of what is now usually termed the new system of chemistry…”. Open standards enable open data, and both enable rapid scientific progress.




Posted in Uncategorized | 1 Comment