So I must confess, I took the easy way out and installed the Condor resource scheduler via rpm. Seems like the folks that put Condor together at the University of Wisconsin-Madison have in recent years decided to create easy to install packages for Linux users including .deb files for Debian and .rpm files for Red Hat systems. That’s a huge time saver of course, especially if you want to get up and running quickly, or if you are provisioning multiple Condor machines and don’t necessarily want to use a shared filesystem for the installation. Very nice.
So, I downloaded the 64-bit RHEL 5 rpms from the downloads page and installed them on two virtual maachines. One became the Condor “master”, running all the daemons: COLLECTOR, MASTER, NEGOTIATOR, SCHEDD and STARTD. The other VM became a submitter and executor only, running just MASTER, SCHEDD, and STARTD daemons. After having quickly built a Condor pool this way, I listed the files in the rpms to see what else was included, and to my delight, the package builders were kind enough to include the DRMAA libraries as well. For those of you who aren't familiar with DRMAA, it's an API for how to talk to distributed resource schedulers such as Condor, Sun Grid Engine (now Oracle Grid Engine), and others such as LSF and Torque. This was cool because I could easily test if my code was going to run unchanged and submit jobs to my nascent Condor pool.
And… It worked. However, I noticed that I was getting of a lot of debugging related output on my terminal as I ran my code. Tracing backwards, I quickly concluded that the DEBUG messages had to be coming from the compiled shared object code that came with the Condor rpm in libdrmaa.so. Although the DEBUG messages are nice, I don't want to see them in a production environment. Here's how to get rid of them.
The Condor rpm includes the tarball containing the drmaa source code:
$ rpm -q --filesbypkg condor | grep drmaa condor /usr/include/condor/drmaa.h condor /usr/lib64/condor/libcondordrmaa.a condor /usr/lib64/condor/libdrmaa.so condor /usr/src/drmaa/drmaa-1.6.tar.gz
I copied the drmaa .tar.gz file to a working directory and unpacked it. After running ‘configure’ and ‘make’, one should see a newly created header file called config.h. This is the file that gets included by auxDrmma.h when the compiler is called with the -DHAVE_CONFIG_H option. The relevant lines in auxDrmaa.h are:
#ifdef HAVE_CONFIG_H #include <config.h> #endif
So, I simply edited the “#define” out of the config.h file that was setting DEBUG. Once I ran “make”, I basically had a new version of the libdrmaa.so file in my current directory. The next step was to overwite the .so that was bundled with the RPM, so I copied my new custom libdrmaa.so to /usr/lib64/condor and created a symbolic link (/usr/lib64/libdrmaa.so.1.0) to point to it. <<Abra Capocus, Hocus Cadabra>>, DEBUG messages gone…
A much better solution would be to alter the .spec file used to create the RPM such that DEBUG messages are suppressed by default and so that the /usr/lib64 symlink is created. This would obviate having to make these changes manually on each machine that needs to use the DRMAA library. That’ll be for another post though. Perhaps by then the Condor developers will put out updated RPMs where these issues are addressed.