Archive for the ‘tech’ Category

If you’ve been following development trends in the past couple of years you have surely heard of Node.js by now. Node.js has been getting all sorts of attention lately, some of it has been positive and some of it has been, shall we say… critical. I think the reason many people seem to love Node.js is because it seems to offer a way for backend/server-side developers and front end developers to potentially work in the same language: the easy to learn, ubiquitous and well understood Javascript.

One of the first things a developer wishes to do when trying Node.js for the first time is to see how easy it might be to create a web server. In fact, many tutorials, documentation pages and blog posts provide instructions on how to code a simple “Hello World” web server to demostrate just how easy it is to create a fast webserver with Node.js. Building on this, one can quickly arrive at a very popular library used for creating web applications: Express.js. I installed it with the use of the fantastic npm tool, a package manager for Node.js (which was recently bundled with the Node.js installation itself). After reading the documentation for a bit, I was quickly able to set up some routes for my webserver to be able to respond to various different types of requests. Express also allows for all sorts of configurability. It is able to integrate with HTML templating tools such as , allows for custom “middleware” to modify or inspect the HTTP request object, logging, and all sorts of other powerful features. I really encourage you to have a look at Express.js. It’s wonderful. Really.

So, fast forward a few weeks and I begin working on a RESTful API and have decided to go with Node.js and Express.js to power the implementation. With Express’s route handling, nothing could be simpler for assigning URLs to functions to implement the API. Since the REST service was going to be dealing with JSON exclusively, I needed a way to easily grab any JSON documents that would be PUT or POSTed to the various URLs that I desired. Express has a mechanism that allows for the extraction of that sort of content from HTTP requests called bodyParser. With the use of bodyParser (an Express.js middleware), the request object will be equipped with a new property called rawBody, which is perfect for obtaining POSTed JSON with minimal hassle.

Unfortunately, in a recent upgrade of Express.js, the rawBody property in the request object was silently removed. I think it’s telling that something as important as this was removed so casually and with very little documentation. Personally, I think it should have waited for a major release or, perhaps, been much better publicized as a “heads up” to the user community. This is the sort of thing I think folks getting serious with Node are going to have to learn to deal with for a while until the project stabilizes in the years ahead.

Needless to say, the loss of rawBody had a catastrophic effect on the implementation I was working on and I quickly reverted the version of the libraries I was using backwards so that rawBody was restored. I resolved to look into it further at a later date.

Here’s how to add rawBody back again if you need it like I did: make your own custom Express.js middleware

     // For Express 3 (won't work with express 2.x)
     var app = express();
     app.use (function(req, res, next) {
         var data = '';
         req.setEncoding('utf8');
         req.on('data', function(chunk) { 
            data += chunk;
         });
         req.on('end', function() {
             req.rawBody = data;
             next()
         });
     });

     // Now your routes from here down
     app.get('/something/:id', something_handler.get_something);

What’s happening is that we have injected a custom middleware function that sets the encoding to UTF-8, then sets up callbacks that accumulate any sent JSON data into the “data” variable. When the request is finished, the contents of “data” are complete and req.rawBody is defined before next() is called, which passes control to any subsequent middleware functions.

The critical thing here is that this code must be added BEFORE your routes are setup. If you use the middleware afterwards, it won’t work properly…

Advertisements

So I must confess, I took the easy way out and installed the Condor resource scheduler via rpm. Seems like the folks that put Condor together at the University of Wisconsin-Madison have in recent years decided to create easy to install packages for Linux users including .deb files for Debian and .rpm files for Red Hat systems. That’s a huge time saver of course, especially if you want to get up and running quickly, or if you are provisioning multiple Condor machines and don’t necessarily want to use a shared filesystem for the installation. Very nice.

So, I downloaded the 64-bit RHEL 5 rpms from the downloads page and installed them on two virtual maachines. One became the Condor “master”, running all the daemons: COLLECTOR, MASTER, NEGOTIATOR, SCHEDD and STARTD. The other VM became a submitter and executor only, running just MASTER, SCHEDD, and STARTD daemons. After having quickly built a Condor pool this way, I listed the files in the rpms to see what else was included, and to my delight, the package builders were kind enough to include the DRMAA libraries as well. For those of you who aren't familiar with DRMAA, it's an API for how to talk to distributed resource schedulers such as Condor, Sun Grid Engine (now Oracle Grid Engine), and others such as LSF and Torque. This was cool because I could easily test if my code was going to run unchanged and submit jobs to my nascent Condor pool.

And… It worked. However, I noticed that I was getting of a lot of debugging related output on my terminal as I ran my code. Tracing backwards, I quickly concluded that the DEBUG messages had to be coming from the compiled shared object code that came with the Condor rpm in libdrmaa.so. Although the DEBUG messages are nice, I don't want to see them in a production environment. Here's how to get rid of them.

The Condor rpm includes the tarball containing the drmaa source code:

$ rpm -q --filesbypkg condor | grep drmaa
condor                    /usr/include/condor/drmaa.h
condor                    /usr/lib64/condor/libcondordrmaa.a
condor                    /usr/lib64/condor/libdrmaa.so
condor                    /usr/src/drmaa/drmaa-1.6.tar.gz

I copied the drmaa .tar.gz file to a working directory and unpacked it. After running ‘configure’ and ‘make’, one should see a newly created header file called config.h. This is the file that gets included by auxDrmma.h when the compiler is called with the -DHAVE_CONFIG_H option. The relevant lines in auxDrmaa.h are:

#ifdef HAVE_CONFIG_H
        #include <config.h>
#endif

So, I simply edited the “#define” out of the config.h file that was setting DEBUG. Once I ran “make”, I basically had a new version of the libdrmaa.so file in my current directory. The next step was to overwite the .so that was bundled with the RPM, so I copied my new custom libdrmaa.so to /usr/lib64/condor and created a symbolic link (/usr/lib64/libdrmaa.so.1.0) to point to it. <<Abra Capocus, Hocus Cadabra>>, DEBUG messages gone…

A much better solution would be to alter the .spec file used to create the RPM such that DEBUG messages are suppressed by default and so that the /usr/lib64 symlink is created. This would obviate having to make these changes manually on each machine that needs to use the DRMAA library. That’ll be for another post though. Perhaps by then the Condor developers will put out updated RPMs where these issues are addressed.

So let’s say you want to restrict a bunch of data or pages on your website to users that have agreed to what is commonly referred to as a “Terms of Service” agreement, or ToS. These agreements are typically filled with lots of legalese and in a very small font size. Very few people people read such agreements in their entirety, however, they are an important requirement in many scenarios of information dissemination on the internet. What are some scenarios where this could be important?

  1. Software distribution. You might want to get a user’s agreement that he or she will not redistribute an application your company or organization has created without written approval.
  2. Media and content distribution: Perhaps you publish music, video, online games or articles and blogs online and wish to protect that content with a ToS page.
  3. Forums and chat: You may wish to obtain consent from your users that they will not engage in abusive behavior in your online community.

The possibilities are basically endless and the aforementioned examples are only the most obvious scenarios. However, to make this work (and in many jurisdictions to make it legally enforceable) one should not be able to simply bypass the ToS page and link directly to the URL of the data that is being protected. A naive web administrator might throw up a ToS page as an intermediate navigation stop before arriving at the download area containing links to the data. However, if anyone has a bookmark, or enters the direct URL, or publishes the direct link to the data online, then the data can be easily retrieved without ever agreeing to your organization’s terms. Good luck enforcing your ToS in court if necessary…

For this post, I’m going to assume that you are using the Apache web server. In order to prevent direct linking and access to your content, we somehow have to instruct Apache not to serve it until some condition is satisfied. There are many way of doing this of course. One can write a custom Apache module or handler for the task in a language such as C, or Perl if you are using mod_perl. However, we can also use a very useful module called mod_rewrite to help us out. A nice advantage here is the mod_rewrite is automatically included and enabled by default in many Apache installations. We are going to configure mod_rewrite to check for the existence of a special cookie that our ToS page will set. If you have the cookie (and the correct value in it), Apache will let you have the data. If not, Apache will redirect you to the ToS page. Sounds easy right? Using cookies for this task has some additional benefits:

  1. We can set the cookie to expire whenever we like. Let’s say, 90 days. That way the user doesn’t need to accept the ToS every single time they want access to the data, just every 90 days… One can also make the cookie permanent if that behavior is desired.
  2. All browsers of any consequence support cookies, even the text based browsers and download utilities such as wget, lynx and w3m.

Let us assume that the files we wish to protect all have the same extension: .tgz. In the Unix world, .tgz, or .tar.gz files are very common and are simply compressed archives of data, conceptually similar to .zip for you MS Windows users out there. This is a trivially easy example because we can define a mod_rewrite configuration on the web server as follows:

RewriteEngine on
RewriteCond %{REQUEST_URI} \.tgz$
RewriteCond %{HTTP_COOKIE} !tos=accepted
RewriteRule ^(.*) /terms-of-service.html [R,L]

RewriteLog /var/log/httpd/rewrite.log
RewriteLogLevel 9

The first line simply activates the rewrite engine. The second and third lines apply conditions as to when the RewriteRule on line 4 goes into effect. Both conditions need to be satisfied. The first condition of line 2, is that the resource requested needs to end in .tgz. Please consult the documentation on regular expressions for why we need to escape the “.” with a backslash and what the trailing $ is for.

The second condition states that we must NOT have a cookie called “tos” with the value “accepted” in it. More on how a user acquires the cookie later.

So, with this configuration, if the server receives a request for a .tgz file and the client does not have the required cookie, the RewriteRule will be triggered. Basically, it redirects us to the terms-of-service.html page. The [R,L] flags specify the type of redirect used and ensure that this is the last rule to apply. The page redirected to doesn’t have to be a .html file of course. It could just as easily be a .php, or .jsp file…

You may ask, what if the user disables cookies or clears the cookies every time the browser is exited? Well, in the former, the user will not be able retrieve the data. It’s important that such limitations be discussed openly and up-front. This is an unfortunate consequence of relying on cookies to accomplish such a task. However, cookies have become such an integral part of modern web usage that their use hardly constitutes a radical departure from the norm. One can view a user with cookies disabled as a special case of a user that wishes to implicitly decline your ToS… One can add a note on the ToS page itself describing how cookies are required to be enabled for the authorization mechanism to work. In the case of a user that routinely or automatically flushes cookies away, that user will simply need to agree to the ToS each time they wish to access or download content.

Giving the user the Cookie
Once the server has been configured, to deny (redirect) requests for data that aren’t provided with the cookie, it’s time to consider just how we will grant the cookie to a user that accepts the Terms of Use/Service. Fortunately, this is a task that can be done with JavaScript. If you’re a fan of JQuery like I am, it can be done even more elegantly with the JQuery cookie plugin, found here:


http://plugins.jquery.com/project/cookie

Our page will have two buttons, one to accept and one to deny the ToS. The markup should look something like so:

<form action="#" method="get">
    <input id="accept" type="button" value="Accept" />
    <input id="decline" type="button" value="Decline" />
</form>

The exact formatting of the form and inputs may depend on the doctype you’ve selected for your site and pages… Once this is inserted into the appropriate place beneath the text of your conditions, it’s time to wire the buttons up with a bit of JavaScript. We’ll create a new JavaScript file called tos.js:

$(document).ready(function() {
  $("#cancel").click(function () {
    document.location.href="/index.html";
  });
  $("#accept").click(function () {
    // Set a cookie to expire in 90 days.
    $.cookie("tos", "accepted", { path: '/', domain: ".example.com", expires: 90 });
    document.location.href="/downloads.html";
  });
});

This code registers handler functions for when each button is clicked. In the terms-of-service.html file you would simply have to include the JQuery file, the JQuery cookie plugin, and the tos.js file, like so:

<head>
  <script type="text/javascript" src="/js/jquery-1.4.1.min.js"></script>
  <script type="text/javascript" src="/js/jquery.cookie.js"></script>
  <script type="text/javascript" src="/js/download_tos.js"></script>
</head>

Summary
To recap, what we have achieved is a page where if the user has hit the “Accept” button, then he will be issued a cookie that the web server will honor by allowing the download of the content, .tgz files. If the user presses the “Decline” button, the he will be forwarded to a configurable alternate page, in this case the index.html page. One really can’t get around this system by linking or navigating directly to the content. However, a skilled person could craft the cookie manually and post the cookie along with the request without ever having agreed to the ToS. However, to do that the user must examine the JavaScript and go through the trouble of crafting the cookie manually… However, protecting your content from being republished by users even though they agreed to the ToS, well, that’s tough… You might want to ask the RIAA about that.