Skip to content
cstackpole edited this page May 19, 2013 · 16 revisions

Torque Resource Manager

Torque is a resouce manager. Meaning Torque watches over the nodes to provide control over which jobs run on which compute nodes/gpus/resources in the batch queue. Adapative Computing are the main developers of this Open Source product and you can find the download for Torque can be found here.

I usually prefer to stick to the releases, however, at this time there is a very annoying bug that is only fixed in the 4.2.2 github release. The instructions will cover the building from github first, then after the release of 4.2.2 the guide will also add how to build from the release.

Pre-build environment

sudo yum groupinstall "development tools"
sudo yum install git imake libtool libxml2-devel openssl-devel
sudo yum install pinentry-gui If you have a GUI, you will need this package.
mkdir ~/Code

Signing RPMS

It is good to sign your RPMs so that the servers can trust the RPMs from your repository safely. To sign your RPMs, you need to make a GPG key as the dev user authorized to do so. You should not ever compile as root. Bad things are bound to happen.

gpg --gen-key

  • Select default type of key: 1
  • Set the key size: 4096
  • Key is valid for the life of the cluster: 4y
  • Is this correct?: y
  • Real Name: Users Fullname
  • Email Address: [email protected]
  • Comment: Not Needed
  • Is this OK?: O
  • It will ask for a new password to be created.
  • This next part may take time as it generates the keys.

Now find your gpg fingerprint
gpg --fingerprint

The number needed is the second part of the number of the pub line. Find the first grouping of numbers after "pub" and take the second group of those numbers. This is YOUR_FINGERPRINT_ID in the next step.
gpg --fingerprint | awk '/^pub/{print $2}' | awk -F'/' '{print $2}'

We need to export the public key ID from this.
gpg -a -o ~/RPM-GPG-KEY-your_user_name_or_project_name --export $YOUR_FINGERPRINT_ID

Add the fingerprint to your RPM macro file.
echo '%_gpg_name $YOUR_FINGERPRINT_ID' >> ~/.rpmmacros

Copy the Key to the HTTP webserver that was created in the earlier CreateRepo step. scp ~/RPM-GPG-KEY-your_user_name_or_project_name [email protected]:/var/www/html/ProjectRepo/.

Building Torque from Github source.

cd ~/Code
git clone https://github.com/adaptivecomputing/torque.git
cd torque
./autogen.sh
./configure
make rpm
or
make srpm

If you want to build a signed RPM, make a SRPM then rebuild and sign it.
rpmbuild --rebuild --sign ~/rpmbuild/SRPMS/torque-4.2.3-1.adaptive.el6.src.rpm

Building Torque from release source.

Find the latest release of Torque. The link is given at the top of this section. At this time the latest version is 4.2.2.

To build the rpm run:
rpmbuild -tb torque-4.2.2.tar.gz

To build the signed rpm run:
rpmbuild -tb --sign torque-4.2.2.tar.gz

To build the SRPM rpm run:
rpmbuild -ts torque-4.2.2.tar.gz

Copy RPMS to the webserver

On the webserver http.cluster.domain, copy the rpms and build the repo.
cd /var/www/html/Cluster_Repo
scp -r [email protected]:rpmbuild/RPMS/x86_64/*.rpm .
createrepo .

Install Torque on the frontend

Now that it is in the repository, install Torque server and devel packages. See the CreateRepo section for details on the yum repo file.

Download the cluster.repo file.
sudo wget http://oracle.stack.linux/repo_files/cluster.repo

Verify that the repo file works and torque is coming from your repository.
sudo yum clean all && sudo yum update && sudo yum info torque

Install the torque packages. Verify that the packages are installing from your repository! sudo yum install torque-devel torque-server torque-scheduler

Configure Torque

Create the checkpoint directory.
$ sudo mkdir /var/spool/torque/checkpoint
Edit the /var/spool/torque/server_priv/nodes file to tell it what nodes have resources for use. For now, configure just how many processors each node has.
$ sudo vim /var/spool/torque/server_priv/nodes

node01 np=2
node02 np=2
node03 np=2
node04 np=2

Edit the file /var/spool/torque/server_name for configuring the server hostname.
$ sudo sh -c 'echo frontend01.cluster.domain > /var/spool/torque/server_name'
Restart the Torque service.
$ sudo service pbs_server restart
Verify that the Torque service is running.
$ qmgr -c 'p s'

Clone this wiki locally