Image cm-logo-computemode-sphere
 


 
ComputeMode: On-demand HPC 
Cluster Manager 
Version 2.0
 
http://computemode.imag.fr/

 


Contents


List of Figures


List of Tables

Glossary

In the following definitions list, the letters '(CM)' indicate a meaning specific to the ComputeMode on-demand HPC cluster manager system.

'Always CM' schedule:
(CM) by default this schedule always boot the nodes in computation mode
batch manager:
also known as Queue Manager, Job Manager, Task Manager
CM administrator:
(CM) ComputeMode administrator - related to the root Unix account on the CM server and to the admin account on the web interface
Note: the cmwebadmin administrator may differ from the cmserver administrator
CM boot mode:
(CM) a type of boot mode whose role is to help instantiate a CM image OS (for the client nodes)
CM:
(CM) ComputeMode: on-demand HPC cluster manager
CMGM:
(CM) see CM
client node:
(CM) the workstation which will be able to boot in Computation Mode (by opposition to CM server)
computation mode:
(CM) the mode in which the client nodes should be to successfully handle the tasks submitted to the CM server
computing mode:
(CM) see computation mode
DHCP:
Dynamic Host Configuration Protocol
grid user:
(CM) a user for which the cmserver administrator has created a Unix account and who is allowed to submit jobs through a batch manager
job manager:
see batch manager
NFS:
Sun's Network File System
NIS:
Network Information Service (a.k.a. yellow pages)
OS:
operating system
Owner:
(CM) see user
PXE:
Pre-execution Environment
queue manager:
see batch manager
RAW PXE boot mode:
(CM) a kind of boot mode whose role is to chain other pxelinux static configuration file
Such a mode can be used for instance to make a machine boot through the network a floppy disk image (FreeDOS, Symantec's Ghost, ...)
remote wake-up:
see Wake-on-LAN
RWU:
see remote wake-up
standard mode:
(CM) the mode in which the client nodes would be if there were no CM server
Task Manager:
see Batch Manager
User Mode:
(CM) see Standard Mode
User:
(CM) the owner of a given node, that is, the person who is using a workstation on a regular basis
Wake-on-LAN:
a functionality which lets a user wake up a powered-off system through the network by sending a specifically crafted packet (used in the phrase: to send Wake-on-LAN packets)
WoL:
see Wake-on-LAN
boot image:
(CM) it is indeed a Linux boot image which is composed of a Linux kernel and a specific initrd tuned for CM - extra parameters may be used too.
boot mode:
(CM) indicates how to boot a machine
cmserver, server:
(CM) ComputeMode Manager server
cmwebadmin-data:
ComputeMode Manager web resources (images, style sheets, logos)
cmwebadmin:
(CM) ComputeMode Manager web administration interface
image
OS: (CM) this is the file hierarchy located in /cm/<OS_NAME> (for instance /cm/debian/...) and which is aimed at being mounted on the clients - it may be hosted on a read only server and requires specific tuning
label:
(CM) it is a logical tag attached to a node - its main role is to simplify the handling of large number of hosts by binding it to a symbolic name - labels may for instance be bound to the room number or the hardware brand
local boot mode:
(CM) a special reserved boot mode which basically tells the machine to boot on its local hard disk
node, host:
(CM) it is a client node, that is a machine which boots a CM OS - a node uses a schedule, and may hold an unspecified number of labels (from none to any number)
off boot mode:
(CM) this is a RAW PXE boot mode whose aim is to shut down the machine which has used it
processing node:
(CM) see Client Node
schedule:
(CM) it is a calendar for a week telling CM which OS to use based on the weekday and the time
special labels, system labels:
(CM) some label names have special meanings to CM - basically they are labels and can be applied and removed as any other label but they cannot be edited or renamed and they may have some special behavior
ssh:
secured shell - this may refer to either the protocol or its implementation (a popular implementation being OpenSSH)

1. ComputeMode Manager Overview

This section aims at describing slightly more thoroughly what CM can and cannot do.

1.1 Quick overview

A ComputeMode server can help you:

1.2 General overview

ComputeMode relies on a master-slave architecture built by using a central server as a master. Though centralized, some services may be distributed to other servers.

The CM server maintains the availability of registered PCs on the local network. Each PC owner, in accordance with your company policy, may choose to let his/her PC be used at nights, or during weekends or vacations: most workstations are used interactively 50 hours a week. A PC used by CM when it is idle is told to be a client (or processing or slave) node. The other mode is known as 'user mode'.

The two modes are:

The ComputeMode administrator can easily manage the computing PCs through a web-based interface available on the ComputeMode server.

A grid user (standard Unix user) may submit computational jobs to the system through the use of a classical batch manager by logging in on the server (through ssh for instance), or computation scheduler (for large parametric computation campaigns for instance)

The batch manager will then take care of:

The Open Source (GPL) batch manager 'OAR' ships with CM but other products such as Platform's LSF, OpenPBS, TORQUE or Sun Grid Engine are known to work with ComputeMode.

Load balancing with an already installed job manager is also something which can be accomplished: for instance, your dedicated cluster usually handle the work load but for peak periods, some extra-CPU power would be useful...

When the ComputeMode server detects that a client node becomes unavailable, the latter returns to the previous User Mode so that the owner will not even notice that his/her PC has been used by ComputeMode.

Submitted jobs can take advantage of the NFS distributed file system made available from the ComputeMode Server. Each Cluster User has his/her own private home directory and can use it to store the data required by the computational jobs, as well as to retrieve the generated results.

If a PC owner comes back and needs his/her PC at once, he/she is still the boss: simply by pressing keys (alt-ctl-del), the PC owner can abort any ongoing computational activity on his/her PC, and ComputeMode will restore theUser Mode in about one minute. The Owner won't be bothered again as his/her PC will be 'quarantined' and will no longer be used for further computations till 'un-quarantined'.

1.3 CM: on-demand HPC cluster manager is a grid building software suite?

Yes, it is. The differences between 'cluster' and 'grid' is somewhat fuzzy and the meanings of these words vary according to different CS schools. To help you understand what CM can do for you:

If you happen to already own and use a dedicated cluster, ComputeMode will offer extra power without extra costs: think of the costs of new machines, a system administratore, AC, ...

If you want to initiate students to distributed infrastructures, as well as cluster tools and technologies, or if you simply want to experiment with a cluster prior to buying a dedicated system, CM will help you reach those goals easily.

If you are working in a scientific laboratory, you may want to use CM to offer your searchers or colleagues an extra infrastructure to test-bed your computations.

1.4 ComputeMode is Open Source

ComputeMode is indeed a whole set of software available under Open Source licenses (most tools being available through a GNU GPL) - an acknowledgement list will be given in appendix sec:acknowledgement-oss.

It is currently based on Linux and ships a Debian GNU/Linux OS for the server. The solution to ship a compete distribution was chosen since a lot of configuration has to be done to let the system run smoothly. If you really want to use your own distribution, you may seek help from CM mailing lists.

As such it is available under Open Source license or Open Source-friendly licenses :

1.5 Current requirements

As CM is a distributed architecture it has some requirements on the server and the client nodes, as well as your network topology.

1.5.1 Server requirement

The server should be a fast machine dedicated to running ComputeMode. To give very minimal figures, the settings should be worth at least:

You will have to alter these figures according to your needs as, of course, more client nodes require more power. To give a few figures : a Dual Xeon 2.4GHz with 2GB RAM and a gigabit network cards handles around 120 client nodes. If your application makes a lot of I/O, you will have to boost the network link and the storage device or use a NAS system.

1.5.2 Client requirement

Processing nodes will be booted remotely when required by the ComputeMode Server. This process involves running a disk-less Linux (through a network boot) on each of these PCs. Few requirements are made for these PCs, but of course the faster the machines, the higher the computing power will be:

1.5.3 Network architecture requirements

The current version of ComputeMode relies heavily on the use of PXE (also known as network boot protocol) to setup disk-less distributions. Thus, if your network already uses PXE for other purposes, or if your workstations are too old and do not support PXE boot, this may be a show-stopper for this current version.

CM also uses DHCP but it is configured so as to be transparent and not interact with your current set up.

Besides, the following constraints should be enforced:

1.5.4 Access nodes requirements

Some hosts of your network will be used to access the ComputeMode system. For instance, you will need to access the ComputeMode Web interface located on the ComputeMode server. You will also need to access the system to submit computational jobs and retrieve their results. Such operations may be done using any PC from your network (including processing nodes).

The ComputeMode server embeds a minimal X display with a web browser, but as it is a server, it will most likely be more comfortable to access it from your office instead of the server room !

The requirements are rather pieces of advice:

1.6 ComputeMode appliances

1.6.1 ComputeMode VMware, VirtualBox or KVM appliances

For testing purpose (as there is some kind of performance limitations), an appliance may be downloaded:

http://computemode.imag.fr/files/appliances/
It is a fully functional ComputeMode server which lets you test the software prior to dedicating a real server to it.

I. Getting a ComputeMode server up and running

2. Installing a CM server...

2.1 ... on a real machine

First, make sure the network and system requirements are fulfilled.

In the following section, we will assume your CM server will have:

This is the IP you will have to use to access the CM server from within your site.

You can then proceed to the installation of ComputeMode using the online documentation:

http://computemode.imag.fr/mediawiki/index.php/ComputeMode_on_top_of_Debian_Squeeze

The installation screens should be rather straightforward. When requested by the installer, enter the ComputeMode Server network parameters.. Once the installation is finished, go to your own machine (which will be used as an Access Node), open your Web browser and navigate to the following ComputeMode Server address, prepended by http://. For example, open the http://1.2.3.42/ Web page. This is the ComputeMode Server administration page.


2.2 ... in a virtual machine

In most virtualization software, I/O operations are privileged actions which require the virtualization software intervention, hence performances are often degraded compared to a real dedicated server.To get more informations about virtual appliances installation, you can checkthe following on-line documentation:
http://computemode.imag.fr/mediawiki/index.php/ComputeMode_server_appliances

2.2.1 VMware Workstation or VMware Server

If you already use VMware's virtualization products you may simply install the CM software inside a virtual machine. You have however to be sure that it will get access to the same network as your clients (whether these are real or virtual):

Note:
you may have to alter the /dev/vmnet0 permissions to allow to allow promiscuous mode (you will know when you need it as VMware will complain - for instance if you want to run a network sniffer to troubleshoot possible networking issues)

2.2.2 VirtualBox

VirtualBox is a general-purpose full virtualizer for x86 hardware, targeted at server, desktop and embedded use.For a thorough introduction to virtualization and VirtualBox, please refer to the online version of the VirtualBox User Manual's first chapter:

http://www.virtualbox.org/manual/ch01.html.


3. Starting the ComputeMode Virtual Appliance

This chapter describes how to use the VMware appliance of the ComputeMode software.

3.1 VMware Player

The appliance version of CM is built to work out-of-the-box with VMware Player. You can get this tool at no charge from the VMware Player download page

http://www.vmware.com/download/player/

3.2 Recommended usage

Since VMware adds some overhead on I/O operations (disk, network), and since CM is mainly doing I/O, it should mainly be used for evaluation purposes:

3.3 Appliance Walk-through

3.3.1 Requirements

You need to know: your netmask, an available IP address, your DNS servers, if PXE is used in your service.

3.3.2 Starting the engine

The appliance has been designed to be simple to start and provides a common web page as shown on figure [*].

Figure: Welcome screen
Image cm-appliance-welcome-2

The main part of CM is in the 'Admin' box, especially the 'Webadmin' link. You may now proceed to the chapter cha:cmwebadmin if you want to have detailed explanations about administration or simply proceed to the next chapter to resume your ComputeMode walk-through.

II. Launching Computations


4. Starting Computing

Now that your CM server is running - everything from now on will apply whether your CM server may be real or virtual.

4.1 Logging into cmwebadmin

Once you have logged in your Unix account (or the guest account), you will be shown with a login screen (see figure [*]). You may also reach cmwebadmin through a web browser located on your desktop machine.

Figure: cmwebadmin login screen
Image cm-login-screen

The default account and password are given in section cha:default-accounts-passwords. It is a safe practice to change this password now.

Once logged you are taken to the nodes management page (see figure [*]).

You may choose which labels to show by selecting the drop-down menu at the top. Upon selection, the page will be automatically reloaded (if you have Javascript) to only show the nodes holding the selected label. Clicking on a label name in the right part of a node entry will also select this label to be shown.

The nodes are listed and some of their properties are displayed in columns:

Figure: Nodes management page
Image cm-nodes-management-1

Clicking on the 'Show/Hide advanced options' button, will toggle the display of several buttons.

Note:
the buttons shown by your CM version may differ from the one shown in figure [*] as the display is highly configurable and dependent on which modules are available and enabled.
Figure: Nodes management page - with advanced options shown
Image cm-nodes-management-2

4.2 Getting some computation nodes registered

Client (or computation) nodes may be added automatically, or by hand. The following sections is to explain the two options.

4.2.1 Automatic discovery

First, you have to check that this feature is enabled in the BIOS settings of the machines you want to use. A sample BIOS screen is shown in figure [*].

Figure: Enabling network boot in the BIOS
Image cm-vmware-bios1

You can now check whether the node is listed in the nodes management screen. There should not been any node if you never booted it.

Figure: Nodes management before client node first boot
Image cm-adding-newnode-1

Once you start the client machine: you should see a screen similar to the one in figure [*]. During this step, you machine tries to find a PXE boot server within your LAN. This request is caught by cmserver which then registers this node as an unconfigured node and adds an automatic name of Uxx-xx-xx-xx-xx-xx (where xx-xx-xx-xx-xx-xx is your client node MAC address). If you choose to reload the nodes management page (as shown on figure [*]), you will see a new node was added with the label 'Unconfigured'.

If you feel like it, you can verify the MAC address you may have seen in figure [*] is the same as the one you will see in figure [*].

Note:
if several machines happen to boot at the same time, there may be issues to know which one is which unless you already know the association MAC address - machine name.
Figure: Client node first boot: PXE request
Image cm-adding-newnode-2

Figure: Nodes management after client node first boot:
a new node is listed
Image cm-adding-newnode-3

4.2.2 Manual addition

If you know your machine MAC address, you may simply add the corresponding entry into cmwebadmin: just click the 'Add node' button. You will have to fill the form - see the next section, or chapter cha:cmwebadmin, for further information about every line.

4.3 Editing the newly-added node properties

If you click on the 'Uxx-xx-xx-xx-xx-xx' name in the nodes management, you will be taken to the node edition properties screen. You should chose the 'Always CM' schedule so as to observe something when you reboot the client node. When you are satisfied with the form, click 'Update' button (see figure [*]).

Figure: Node properties edition
Image cm-node-edit-properties

If everything went smoothly, you will be taken back to the nodes management screen with a message shown at the top of the page as shown on figure [*].

Figure: Node management after successful node properties edition
Image cm-node-edit-properties-success

Now, if you reboot the machine for which you have just modified the configuration, screens similar to the ones in table [*] will be shown.


Table: CM client boot screens
Screen shot Comments
Image cm-client-boot-1 Boot underway
Image cm-client-boot-2 Boot over


4.4 Getting some work done: cluster computing demonstration

4.4.1 Principle of the demonstration

A simple cluster demonstration is provided so that you may test your setup. This application is made to be simple and visual: basically a 3D scene is split in stripes, each stripe is a task submitted to the job manager. The job manager dispatches the tasks on every available computation node.

Meanwhile, the ComputeMode server will merge and update the display of the computed stripes till every task has been executed.

4.4.2 How to start it?

You can check how many nodes you have by selecting the OAR menu option and choosing Monika. A page similar to the one below will be shown.

Figure: OAR page listing the ComputeMode nodes
Image cm-pov-demo-1

Then you have to select the 'POV demonstration item' in CM demonstration or to follow the link: http://172.28.255.253/public_html/

Figure: POV scenes listing
Image cm-pov-demo-2

A few sample scenes are given - choose one and click its name - the rendering process will now start. If you launch Monika again (in another window) you will see that several tasks have been queued (see the figure below).

Figure: OAR page listing the ComputeMode nodes executing jobs
Image cm-pov-demo-3

After a few seconds, the first results will begin to appear:

Figure: POV scene rendering
Image cm-pov-demo-4

The nodes scheduling may be seen through the Gantt drawings (as an option in the OAR sub-menu):

Figure: DrawOARGantt showing submitted jobs
Image cm-pov-demo-5

Eventually, the full picture is displayed.

Figure: Complete 3D scene rendered
Image cm-pov-demo-6

III. Cluster Manager for users

5. As a Cluster User, How Do I...

This section tries to list frequently asked questions. If you cannot find an answer, just ask and it will be added in later revisions of this manual.

5.1 ... log in on the CM server?

Your CM administrator has to create a Unix account on the server. Once this is done, you will be allowed log in by using a ssh client.

5.2 ... submit jobs to the batch manager?

The batch scheduler installed by default is OAR and provides the de facto standards command line tools: *sub/*del/*stat.

Please consider reading the complete OAR manual at: http://oar.imag.fr/if you have advanced needs. For sake of completeness and ''quickstart-ness'', the most common usage patterns will be listed below.

5.2.1 oarnodes (monitoring)

oarnodes provides information about the nodes registered in ComputeMode: basically it will let you know which nodes may join the computing cluster. It will also list their properties if they have already booted and registered in OAR previously (CPU, RAM, etc.)

Example:
 
oarnodes
or
oarnodes -l

5.2.2 oarstat (monitoring)

oarstat displays the current jobs status and its behavior may be altered with the following options:

More concise high level interfaces are also available through Monika and DrawOARGantt. Both tools are available as web pages from the cmwebadmin portal (see section cap:welcome-screen).

To summarize, Monika displays a snapshot of the current OAR status with regards to the nodes occupation and will list jobs and their states.

As for DrawOARGantt, it displays a Gantt diagram of the past nodes reservations. When jobs have a walltime set, it will plot them the way they would execute if they lasted up to their walltimes.

5.2.3 oarsub (submit)

oarsub lets a task be submitted to the job manager.

If you want to have an interactive shell, type:

oarsub -I
You may specify the number of nodes you want by adding the -l option followed by 'nodes=number':
oarsub -I -l nodes=4
The contents of the nodes allocated to your job will be in the $OARNODES environment variable.

You may also submit script files:

oarsub -l nodes=1 ~/myscript.sh
or in-line scripts:
oarsub -l nodes=1 '~/mybin param1 param2'
Note:
when submitting scripts files, you have to make sure the script is executable (chmod +x ~/myscript.sh)
For further options, please read the OAR manual.

5.2.4 oardel (delete a job)

To remove a job from the OAR queue, you have to get its job ID (use oarstat for this purpose) and then use the following command:

oardel <job_ID>
If the job is currently running on a client node, it will be killed.

6. As an owner, how do I...

This chapter aims at giving quick recipes to solve common issues for an administrator.

6.1 ... let my system be a part of the CM grid?

Everything depends on your administrator and your site policy :

By default if you boot, during a computation period and, if your machine was registered in the CM server by the administrator, then you can see the CM boot process happen.

6.2 ... get out of the CM grid?

To achieve this, you simply have to hit 'Alt + Ctl + Del' when the system is in computation mode and a wait a few seconds so that the servers acknowledges it. Once done, you will never see CM again unless you tell your system administrator that it is OK to use your machine as a computation node again.

IV. Cluster Manager for administrators


7. ComputeMode web administration interface

ComputeMode Web Administration interface (cmwebadmin for short) is the central place where most of CM behavior can be tuned.

cmwebadmin is built as follows:

7.1 The 'Nodes' menu item

The management page lists all the nodes registered in cmwebadmin.

Figure: Nodes management page - with advanced options shown
Image cm-nodes-management-2

7.1.1 Editing or adding a node

To reach the node edition page, simply click its name or its MAC address in the node management page.

To add a new node, click the 'Add' button in the management page.

Figure: Adding a node
Image cm-node-add

The fields in the form have the following uses:

A few read-only fields are shown when you edit a node (not when you add/create it), namely OAR properties and the list of labels having been applied.

When you are satisfied with the node characteristics, click the 'Add' (or 'Update') button.

7.1.1.1 Special field: Comment & IP field

The IP is mainly used by the users page which let users tell when they are do not use their computer. Some options can be given to alter the behavior of CM by means of the comments field (mainly in the user pages) :

  1. if the IP field has a valid IP, use it => END, else proceed to step 4
  2. (IP field is empty) ComputeMode parses the comment field lookin for a string matching exactly (case & spaces matter !) ip=x.y.z.t with x.y.z.t as a valid IP address. If this string is found, it uses it => END, else proceed to step 4
  3. (nothing in IP and nothing in comment) Try to find the MAC address associated to the connecting IP by probing the network. If a MAC is found, use it => END else proceed to step 4
  4. Has a MAC address been found ?
    Yes: user accepted
    No: user rejected


7.1.2 Change schedule

Select the nodes for which you want to alter the schedule in the management page, and click the 'Change schedule' button.

A summary page indicating which the nodes chosen are, and asking you for the new schedule is shown such as in figure [*].

Figure: Changing a node schedule
Image cm-node-change-schedule

Click the 'Change schedule' button to confirm.

7.1.3 Change WoL

The principle is similar to the one of the 'Change schedule' button (see section [*]): select your nodes and use the 'Change WoL' button instead.

7.1.4 Apply Label

The principle is similar to the one of the 'Change schedule' button (see section [*]): select your nodes and use the 'Apply Label' button instead.

Note:
the label to apply must be created before by using the 'Nodes/Labels' menu (see section sub:add-label)

7.1.5 Remove Label

The principle is similar to the one of the 'Change schedule' button (see section [*]): select your nodes and use the 'Remove Label' button instead.

Note:
the nodes will not be deleted - the same thing applies for labels not tied to any node

7.1.6 Delete Nodes (advanced options)

Select the nodes to delete, click the 'Delete' button. A summary confirmation screen will be shown. Click 'Delete' to confirm.

7.1.7 Export Nodes (advanced options)

Select the nodes you want to export, then click the 'Export' button. A screen as the one shown below will then be shown.

Figure: Exporting nodes
Image cm-node-export

You have to click the download button to download the CSV file corresponding to the node you selected. The CSV file (an ASCII text file indeed) may then be imported in most spreadsheets.

7.1.8 Import Nodes (advanced options)

Click the 'Import' button - no node has to be selected. You will be taken to a screen similar to the one shown in figure [*].

To fill the form, you have to choose a local CSV file (with a specific format) you will upload to the server and tick optionally checkboxes:

Figure: Importing nodes
Image cm-node-import

As stated in the previous section concerning exports, the first line of the CSV has to follow some rule so that the rest of the data may be parsed. Try to export nodes to obtain a sample of a correct file format.

7.1.9 Statistics (advanced options)

Select the nodes of which you want to have statistics, then click the 'Statistics' button. You will then be taken to a page similar to the one shown in figure [*].

Figure: Node statistics
Image cm-node-statistics

You may change the date and time period for which you want statistics to be drawn.

Note:
if you select lots of nodes and a long time period, over a slow network link, the page rendering may either be slow, or even timeout. You may have to alter the apache server configuration to workaround this issue.

7.1.10 Add Exception (advanced options)

Select the nodes to which you want to add exceptions, then click the button 'Add exception'. Basically these are positive exceptions : whenever a node is in exception, it will put be in an exception schedule. This is configurable system wide in config.php.

Figure: Adding an exception to a set of nodes
Image cm-node-add-exception-1 Standard screen
Image cm-node-add-exception-2 Click on an edit box to have a calendar shown.

7.2 The 'Nodes / Labels' sub-menu item

Figure: Nodes / Labels management page
Image cm-labels-mgt-15

The column named 'Nodes with this label' indicates the number of nodes which currently have this label applied.

There are two special labels which will be described below.

7.2.1 Special label 'Unconfigured'

This label is added to automatically newly-discovered nodes.

When you edit the node properties and click 'Update' this label will be automatically removed.


7.2.2 Special label 'Quarantine'

This label is automatically added to the nodes which were in Computing modes and whose owner hit the ctl-alt-del to get their machine back.

To disable automatic 'Quarantine' also called within CM node inhibition, edit the CM configuration file: config.php (which should be located in /var/www/cm/).

In this file, find the line reading:

$GLOBALS['ALLOW_INHIBIT'] = true;
Either replace it by :
$GLOBALS['ALLOW_INHIBIT'] = false;
or add this line to locale_config.php.

7.2.3 Label edition

You may rename a label by clicking its name in the labels management page (see figure [*]). This is just a label rename and not a label creation: that is if nodes A, B and C have the label 'old_foo', if 'old_foo' is renamed as 'new_foo' then, A, B and C will have the 'new_foo' label.


7.2.4 Add label

Simply click the 'Add' button in the management page (see figure [*]). A form will be shown asking you to enter a label name (basically only alpha-numerical characters) - click 'Add' to create the label.

7.2.5 Delete labels

Select the labels to erase in the management page (see figure [*]) the click the 'Delete' button. A confirmation screen will be shown.

Note:
the nodes will not be erased, only the label deleted will be removed

7.2.6 Change schedule

This button lets a schedule be changed for the set of nodes having any of the selected labels.

Once you click the button, you will be taken to a page as the one in the screenshot below: select the new schedule, then click the 'Change schedule' button.

Figure: Nodes / Labels management page / Change schedule
Image cm-labels-change-schedule-15

7.3 The 'Nodes / Exceptions' sub-menu item

Exceptions can only be listed and deleted in this page (see figure [*]). They are ''positive'' exceptions in the sense that they indicate periods of availability of machines (mainly vacations).

The rationale is that employees who want to let their computers join the grid when they are off sites can declare it from their local browser.

Figure: Nodes / Exceptions management page
Image cm-exceptions-mgt

7.3.1 Delete exceptions

Select the exceptions you want to remove then click this button.

7.4 The 'Accounts' menu item

This is where you go if you want to change the administrator information. Please note that the accounts shown here are only related to cmwebadmin and are not tied at all with the Unix accounts.

Figure: Accounts management page
Image cm-accounts-mgt


7.4.1 Account edition

Figure: Accounts edition
Image cm-accounts-edition

The main field in this form are obviously the login field and the password.

Note:
if the password field is left empty, the previous password is kept.

7.4.2 Add/Delete account

Currently, there is no use adding extra CM accounts - the only limitation is there must be at least one admin account.

7.5 The 'Schedules' menu item

This page list the existing schedule (see figure [*]) as well as the number of nodes which are using it. Please note that some schedule are reserved (though you can edit them to fit your needs): default is seen as a template for new schedules, 'Always' and 'Never' should correspond respectively to always-computing and always-local.

Their serial numbers may be referred in the config.php file so be extra-careful when you delete schedules you have not created - there is currently no failsafe.

Figure: Schedules management page
Image cm-schedules-mgt

7.5.1 Schedule edition

Schedule edition consists in editing setting which boot mode to apply based on the day of the week and the time of the day.

Tip:
You can select some parts by clicking then dragging the mouse to select a rectangular portion of the week - the rectangular selection is not shown - this requires Javascript.
Figure: Schedule edition
Image cm-schedules-edition-e+we

By default, the schedule granularity is 120 minutes. It can be set to other values such as they divide 1 day, that is 24 * 60 = 1440 minutes. This granularity is stored in config.php at the line reading :

$GLOBALS['DEFAULT_TIMESLOTGRANULARITY'] = 120;
WARNING
If some timeslots are not compatible with the old and new granularities, the display may be somewhat messed up (for instance if you have a timeslot from 9.30 to 10.00 with a 30min granularity and you go to a 60min granularity). You have to fix your bootmodes selection then update and it will be fine.

7.5.2 Add schedule

Clicking the 'Add' button in the nodes management page (as shown in figure [*]) will create a new schedule as a copy of the one named 'default'.

7.5.3 Copy schedule

Select the schedule you want to copy by ticking its check-box, then click the 'Copy' button. A new schedule named 'Copy of ...' will be created.

7.5.4 Delete schedule

Select the schedules to delete in the management page (as shown in figure [*]), then click the 'Delete' button. You will be shown with a confirmation page to confirm the deletion.

7.6 The 'Boot modes' menu item

This menu (see figure [*]) lists the boot modes available as well as their types (see below).

Figure: boot modes management page
Image cm-bootmodes-mgt

7.6.1 Local boot mode edition

This is a special boot mode (not configurable) which tells a machine to boot locally. It cannot be edited or added since, there is, indeed, only one way to do a local boot (think singleton design pattern).

Figure: Local boot mode
Image cm-bootmode-edition-local

7.6.2 Raw PXE boot mode edition

This is a special boot mode you can use to boot, for instance, floppy disc images through PXE.

Figure: RAW PXE boot mode edition
Image cm-bootmode-edition-rawpxe

On the server side, the binary files must be stored in the TFTPD root (by default, this is: /srv/tftp/PXEClient/)

Let the content below goes into a file named ''default'':

label linux

kernel memdisk

append initrd=foo.bin
foo.bin goes into /srv/tftp/PXEClient/ and the configuration file default goes into /srv/tftp/PXEClient/pxelinux.cfg.

7.6.3 ComputeMode boot modes edition

This mode is editable but some parameters may be slightly tricky - feel free to ask for help.

If you are altering an existing ComputeMode boot mode or creating a new one from scratch, you have to know that a few options are mandatory namely MASTER and NFSROOT.

Figure: ComputeMode boot mode edition
Image cm-bootmode-edition-computemode

The table cap:cm-boot-modes-options lists the available options. Some are here for compatibility reasons and should not be used.


Table: ComputeMode boot modes options
(spans several pages)


Option name Mandatory
Sent at boot time (PXE)
Notes & examples
   
 
MASTER YES
YES
Hostname of the ComputeMode server in the CM subnet.

Syntax: numerical IPv4 or hostname

Default: 172.28.255.253

Note: you can use a litteral name but as there may be DNS resolving issues, you'd better stick to an IP if possible

NFSROOT YES
YES
ComputeMode root to use for the diskless boot

Syntax: nfsserver:/cm/distribution

Default: 172.28.255.253:/cm/debian

Note: you can use a litteral name but as there may be DNS resolving issues, you'd better stick to an IP if possible

AUTOSTART no
no
Specify which commands to start automatically after the distribution has finished booting.

Syntax: user1:cmdfilename1+user2:cmdfilename2

meaning as user1, launch cmdfilename1, then as user2, launch cmdfilename2. Paths of cmfilename[12] must be absolute.

Default: root:/var/lib/oar/cm/oar_start.sh

(starts OAR)

 
 
AUTOSTOP no
no
Specify which commands to stop automatically before the node starts shutdowning

Syntax: same syntax as AUTOSTART just above

Default: root:/var/lib/oar/cm/oar_stop.sh (stops OAR)

EXIT no
no
Specify which exit method to use (another way is to set it in /var/diskless/exit)

Syntax:

halt or reboot or no

Default: reboot

Note: Only these 3 keywords are supported. 'no' means do not exit.

NFSHOME no
no
Specify how to mount the /home directory (may be done using the MOUNTS parameter too)

Syntax: some.host:/some/dir

Default: cmserver:/home

DDNS no
YES
Should Dynamic DNS be activated.

Syntax: yes or no

Default: yes

DEBUG no
YES
Activate debug mode. Value corresponds to the linuxrc breakpoint to stop at.

Syntax: positive integer, 0 means stop at all breakpoints.

Default: not set

DHCPNOKILL no
YES
Do not kill initrd DHCP client (workaround to ensure the initrd and the distribution do not get two different leases from two different DHCP server, which would break everything. The drawback is the initrd will not be unmounted and its memory will not be freed.

Syntax: yes or no

Default: not set

Note: not much tested, use with care

DHCPPORT no
YES
DHCP port that dhclient must use.

Syntax: positive integer

Note: not much tested, use with care

DHCPREJECT no
YES
Should dhcp configuration reject some dhcp servers offers.

Syntax: x.y.z.t+a.b.c.d

where x.y.z.t and z.b.c.d are IPv4 addresses  
 
DHCPSERVER no
YES
Force the use of a DHCP server. Please prefer the DHCPREJECT option.

Syntax: x.y.z.t where x.y.z.t is an IPv4 address.

IP no
YES
Used to setup network manually. Value may be append which means let PXElinux give the information to the initrd. Other possible values are: ipaddr:tftpserver:gateway:netmask You may use the NS parameter to specify the DNS configuration.

Note: not much tested, use with care

MODULES no
no
Specify which modules should be loaded upon distribution startup (another way is to edit the /etc/modules file directly). Syntax: module1+module2
MOUNTS no
no
Specify what the distribution should mount during startup (another way is to modify the fstab of the distribution)

Syntax: host1:/exp/dir%/mnt/dir1%nfs%ro,hard,

intr+host2:/exp/dir%/mnt/dir2%nfs%ro,

hard,intr

NS no
YES
Specify a DNS server and a domain name if the network configuration is not done with DHCP.

Syntax: dns1,dns2:domain1,domain2

Note: not much tested, use with care

NTP no
no
Specify the NTP (time) server to use (another way is to set it in /var/diskless/ntp)

Syntax: IPv4 address, hostname

Default: not set

PULL no
no
Specify the pull method that the node will use to notify and get information from the server (another way is to put it in the /var/diskless/pull file directly)

Note: not much tested, use with care

USERS no
no
Specify users which should be created on the system (another way is to directly modify /etc/passwd)

Syntax: user1:uid1+user2:uid2

Note: not much tested, use with care

   
 
   
 
   
 

7.7 The 'Boot images' menu item

This menu describes which Linux boot images may be booted and with which parameters it should run (see figure [*]).

Figure: boot images management page
Image cm-bootimages-mgt

7.7.1 Adding or editing a boot image

Figure: Boot image edition
Image cm-bootimage-edition

You will be shown with a screen similar to the one in figure [*].

A few informations may be given regarding specific fields:

7.7.2 Delete boot images

Select the boot image to delete in the management page (see figure [*]) then click the 'Delete' button. You will be shown with a summary and a confirmation button.

7.8 The 'OAR' menu item

Some statistics regarding OAR batch system may be shown in this page.

Figure: OAR statistics page
Image cm-oar

7.9 The 'About' menu item

This page holds nothing special but a list of the components used but for the sake of completeness, here is the mandatory screen shot.

Figure: cmwebadmin about page
Image cm-about

8. Client nodes OS execution

Several solutions are available to add nodes to your CM grid.

8.1 Native node by rebooting

The simplest way to add client nodes is probably to enable PXE and boot the node: the CM server will then detect the PXE request and add the new machine under the name UXX-XX-XX-XX-XX-XX where XX-XX-XX-XX-XX-XX is indeed replaced by the MAC address. The newly detected machine will have the default label 'Unconfigured' applied.

8.2 Virtualized nodes

Another means to test CM is to start a virtual machine (such as VMware Player) and enable PXE so that it may boot on the CM server.

Figure: CM virtual machine for VMware Player download page
Image cm-vclient-download

CM provides a page to download and register easily such a configuration package for VMware Player. Some files may not be redistributed for licensing issues - but you can do it by yourself - so you will have to do a bit of work if you want to enable most options available here. If you are interested in doing this, then please read section cha:enabling-extra-options-in-vclient-download-page.

9. As a CM administrator, how do I...

This chapter aims at giving quick recipes to solve common issues for an administrator.

Note:
Some sections may be empty and just redirect you to others - this is a feature aimed at simplifying a search through keywords.


9.1 ... change CM administrator's password?

  1. Click on the 'Users' menu item.
  2. Click on the account name you want to edit.
  3. Enter a new password (it will not be shown).
  4. Click the 'Update' button.
Note:
The new password is enabled at once.

9.2 ... reset cmwebadmin administrator's password?

This should never happen since, as everybody knows it, nobody has ever forgotten a password :-)

This procedure is slightly annoying but as it's a last resort solution.

  1. Log in as 'root' on your cmserver (ssh, console).
  2. Become the postgres user:
    su - postgres
  3. Launch the SQL command line client:
    psql -U cmu CMDB
  4. Enter the password CMDB password (see section cha:default-accounts-passwords)
  5. Type:
    UPDATE users SET password = '42A0ASfCJSzOg' where login='admin';
  6. Type:
    \q
  7. Type twice (one to exit the postgres user shell, one to exit the root shell):
    exit
Note:
The new password is 'icatis' (without the quotes).

9.3 ... register a node?

See next section.

9.4 ... add a node?

  1. Click on the 'Nodes' menu item.
  2. Click on the 'Add node' tab.
  3. Fill the form.
  4. Click the 'Add' button.

9.5 ... unregister a node?

See next section.

9.6 ... make a node disappear?

  1. Click on the 'Nodes' menu item.
  2. Click on the node(s) you want to delete.
  3. Click the tab 'Delete node(s)'
Note 1:
if you deleted a node, it may appear again in the future if it boots through PXE and auto-detection is enabled.
Note 2:
if you do not want to see detected nodes, then choose to show only the nodes configured in the default screen.

9.7 ... change the default label used in the nodes management page?

Edit config.php and replace the line reading:

$GLOBALS['NODE_MANAGEMENT_DEFAULT_LABEL_ID'] = 0;
by
$GLOBALS['NODE_MANAGEMENT_DEFAULT_LABEL_ID'] = <XXX>;
where <XXX> is among:
Note 1:
getting a label ID different outside of -1 .. 2 is currently not supported
Note 2:
for the record, you may try getting it by looking at the hyper-links generated by the label management pages (for instance http://.../cm/main.php?...&labelid=42 - note well that you use this on your own and this is not supported)

9.8 ... cope with an owner who no longer wants to be part of ComputeMode?

See next section.

9.9 ... black list a machine?

See next section.

9.10 ... put a machine in 'Quarantine'?

  1. Click on the 'Nodes' menu item.
  2. Select the node you want to blacklist.
  3. Click on the tab 'Apply label'
  4. Choose the 'Quarantine' label.
  5. Click the 'Apply' button.
CM will now act as if it were disabled for this host that is:

9.11 ... remove machines from 'Quarantine' mode?

  1. Click on the 'Nodes' menu item.
  2. Select the node you want to get out of 'Quarantine' mode.
  3. Click 'Remove label'
  4. Select the label named 'Quarantine'
  5. Click 'Remove label'
Note:
Though this label has a special behavior, it is still a label which can be removed like any other standard label.

9.12 ... remove machines from 'Unconfigured' mode?

  1. Click on the 'Nodes' menu item.
  2. Select the node you want to get out of 'Unconfigured' mode.
  3. Click 'Remove label'
  4. Select the label named 'Unconfigured'
  5. Click 'Remove label'
Note:
Though this label has a special behavior, it is still a label which can be removed like any other standard label.

9.13 ... disable auto 'Quarantine'?

See next section.

9.14 ... disable node automatic inhibition?

To disable automatic 'Quarantine' also called 'node inhibition', edit the CM configuration file config.php (which should be located in /var/www/cm/).

In this file, find the line reading:

$GLOBALS['ALLOW_INHIBIT'] = true;
Replace it by :
$GLOBALS['ALLOW_INHIBIT'] = false;

9.15 ... enable/disable node automatic detection?

Disabling this feature requires your editing the cmwebadmin configuration file. In config.php, edit the line reading:

$GLOBALS['MODULE_NODE_AUTODETECTION'] = true;
and replace it by :
$GLOBALS['MODULE_NODE_AUTODETECTION'] = false;


9.16 ... enable/disable owner/user's pages?

Disabling this feature requires your editing the cmwebadmin configuration file. In config.php, edit the line reading:

$GLOBALS['USER_ACCESS'] = true;
and replace it by :
$GLOBALS['USER_ACCESS'] = false;

9.17 ... force reboots at specific times?

See next section.

9.18 ... use an agent?

Such a functionality requires a bit more work and the use and installation of a agent. The work of the agent is basically to ask the server what he should do next - that is reboot the server into computation mode or continue as if nothing.

There are currently 2 agent versions : a Windows and a Linux/Unix one.

9.18.1 Windows agent

Figure: Windows Agent download page
Image cm-windows-agent-download

A customized agent may be downloaded directly from a page hosted on cmserver. The figure [*] illustrates the customizations which could be done, namely:

You can find and download the windows agent on your computemode web server: http://172.28.255.253/wrapper.php?key=winagent

9.18.2 Linux agent

The Linux agent page is currently not shipped with the CM version but a Debian Squeeze version is provide with the Computemode APT's repository.

To install the Linux Agent on a Debian Squeeze node for example, just type those commands on your node:

cat <<EOF > /etc/apt/sources.list.d/computemode.list 
deb http://computemode.imag.fr/files/debian/squeeze ./ 
EOF 
apt-get update 
apt-get install cm-unixagent

9.19 ... handle specific bank holidays?

CM does not support calendar exceptions such as bank holidays but a script can be used configuring the 'PARTICIPANT_HOLIDAYS_SCRIPT' in the /var/www/cm/config.php file.This variable should contains the absolute path of a script which can analyse a Bank Holiday (a file, database...).

9.20 ... login as root on the client nodes?

For security reasons, logging in as root is disabled on the client nodes on the console. Yet, this can be reactivated by editing the client image OS.

Logging in as root is allowed through ssh from the root account on the CM server.

9.21 ... add a grid user account?

See next section.

9.22 ... add an Unix account?

  1. Log in as root
  2. Execute:
    adduser <the_login>
  3. Fill the requested information
  4. If you are using NIS - this is enabled by default set up - you have to rebuild the NIS maps, which may be obtained by typing:
    make -C /var/yp
  5. If you want to let this user submit jobs through OAR, you have to add him/her to the oar group, type:
    adduser <the_login> oar

9.23 ... use an extra NFS server ?

This can be done by configuring the ComputeMode boot mode parameters. Please read the table cap:cm-boot-modes-options, and especially the part about the MOUNTS option cm-bootmode-mounts-option.

If your servers are not available trough the same subnet (172.28.0.0/16) as your CM nodes, you will have to alter routing on the clients or use some NAT systems.

9.24 ... change the server public IP?

This is currently done the Debian way:

For further information, you may want to read interfaces(5) man page.

9.25 ... use a LDAP server to autenticate my users?

This can be done by configuring some variables in the config.php file.

10. ComputeMode achitecture

This chapter will try to give view a general overview of what's happening behind the scenes.

10.1 What happens during the PXE boot?

When the client boots, it broadcasts a DHCP request with a flag telling it wants some PXE.

The CM server sees this request and replies giving a temporary IP address and telling the client to fetch an IPXE file. IPXE is a binary that allow ComputeMode to use HTTP protocole to execute its commands (so a TCP connection).

10.2 What role does PXELinux play?

IPXE then tries to contact the HTTP server to request a configuration file whose name is based on the MAC address of the network card used. This file is dynamicallycreated during the IPXE request based on what the CM server currently knows (time, day, load, labels, etc)

IPXE then receives this configuration file and acts according to its contents, which may basically of the two following flavors:

  1. local boot or,
  2. chain / boot something else (Linux kernel, image, floppy image, other boot loader, pxelinux boot, etc.)
In the case it is a CM OS image, several options are passed by means of kernel command line options - this will be explained in the next section.

The IP address which was obtained during the PXE negotiation described above is now released.

10.3 What role does a CM image has to play?

Now that the Linux kernel is booting with its attached initrd, the following events take place:

10.4 How does the /cm/<distrib_name> file hierarchy works?

Basically this folder contains an almost complete OS image which has been adapted for CM. 'Almost' as to be complete an image has to contain also: kernel and initrd (which both go into /var/www/bootdirectory/images/) and the adequates boot image and boot mode in cmwebadmin.

ComputeMode provides a centralized distribution system for client nodes. As a result, deploying (adding or upgrading) an application on the nodes is easily achieved by modifying the ComputeMode Server's network boot system repository.

Note:
Altering these folders should only be performed by experienced Linux administrators.
The file system for client nodes is, currently, shared by means of the NFS protocol and mounted with the AUFS filesystem to enable a user write mode. For simplicity most images are located in /cm/ - this can be changed if you update the NFS exports list as well as the boot image configuration.

ComputeMode uses currently as default a Debian-based distribution, so the directory is logically: /cm/debian. In this folder, several subdirectories with specific names will be found. Let's review each one of these:

As a result, deploying an application to a given Boot Mode is achieved by adding necessary files to the 'orig' directory (possibly using the distribution's packaging facility) using a chroot command to have a consistent system view. Then, you have to check that the installed files are not overridden by the 'patch/' files. Such conflicts can be solved manually using the 'rules' file, but for most well-written applications (i.e. not needing a read/write access to the system installation when running), such do-it-yourself should not be necessary.
Note:
upgrading some system parts (especially libraries) while they are used by client nodes may cause problems (crashes)
According to the rules files, and how your directories are exported, client nodes will be able to use the newly installed software at their next boot.

10.5 Which processes run on the CM server?

Please see section sec:services-running-on-cmserver.

11. Security walk-through

This chapter aims at summarizing the processes running, and the security implications and mitigations of running a CM within your site.

11.1 Security concerns

ComputeMode is not designed to be run in a hostile environment in the sense where, if you want really want to mess with the nodes within your network almost any workstation can steal the identity of another (in more or less difficult ways).

The basic idea which is the one often seen within enterprises is that:

On the contrary, if you have hostile people whose sole purpose in your company is to annoy others by sabotaging their work, please rethink your hiring process !


11.2 Services running on the CM server

To know which ports are open on you CM server, you can execute, as root:

netstat -lnp
Several services depends on portmap and hence do not have fixed port numbers: to find out which ports are used, you may type on the server:
rpcinfo -p localhost

11.2.1 Services required by CM server

Several services and servers are running on a standard CM server - some may be added or removed but the following ones are currently needed for proper functioning.


Table: Services running (and required by) the CM server
Service Daemon name
Ports & protocol
Comment
DNS bind
TCP/53 + UDP/53
DNS
   
1 dynamically allocated UDP port
 
   
TCP/953 (localhost only)
control channel
ssh sshd
TCP/22
 
web/http httpd
TCP/80
 
web/https httpd
TCP/443
 
web httpd
TCP/943 (localhost only)
control channel
portmapper portmap
TCP/111=sunrpc and UDP/111=sunrpc
 
NIS rpc.yppasswd
(through portmap)
 
  rpc.ypxfrd
(through portmap)
 
  ypbind
(through portmap)
 
mail exim
TCP/25 on localhost
 
syslog syslogd
UDP/514
 
NFS server (none - kernel)
TCP/2049 and UDP/2049
 
  rpc.mountd
through portmap
 
  rpc.statd
through portmap
 
NFS mounts (none - kernel)
UDP system ports (< 1024)
 
PXE proxy pxe
UDP/4011
 
TFTP server in.tftpd
UDP/69
through xinetd
DHCP server isc-dhcp-server
UDP/67 + ICMP
 
PostgreSQL postmaster
TCP/5432 (may be bound to localhost)
can be disabled


11.2.2 Services required by ComputeMode friendly software tools

ComputeMode which ships a few known tools.

Currently, there is only the OAR scheduler which offers a free (under GNU GPL license) task scheduling systems. This software is written mostly in perl and uses mysql or postgresql, ssh and sudo. The client nodes are contacted when needed through ssh so no extra service is running over these. Please check the table [*] for further information.


Table: Services required by tools shipped with CM: OAR
Service name Daemon seen Protocol/Port
OAR perl TCP/6666
  postgresql TCP/5432


11.3 Services running on client nodes

Client nodes may only be accessed from the CM server and they should all belong to the dedicated CM sub-network (by default: 172.28.0.0/16). All the services specified in table [*] are only accessible from the private B network used.

Most have their accesses disabled thanks to inetd and tcp_wrappers.


Table: Services running on a client node
Service name Daemon name Protocol/Port Comment
sunrpc portmap UDP/111 + TCP/111 tcp_wrappers
NIS ypbind through portmap tcp_wrappers
NFS/status rpc.statd through portmap tcp_wrappers
ssh sshd TCP/22 tcp_wrappers
syslog syslogd UDP/514 no filtering
NFS mounts (none - kernel) UDP system ports (< 1024) usually around UDP/800 kernel filtering
DHCP client dhclient UDP/68 kernel filtering


11.4 Owner accesses

Owners may only access the CM server through a user page on the server. The scope of actions which can be done there is quite limited. They can:

Authentication is currently voluntarily weak (IP-based) to simplify the task of users who want to let their workstation join the grid.
Note:
if the CM administrator wants to disable this mode, the config.php file has to be edited (see section sec:enable-disable-owner-user-page).

V. Appendices


A. Default Authentication Credentials

Using default passwords is really close to being ''pure evil'' : do consider seriously changing them once your setup is running smoothly.

Do note however that the spelling may be slightly altered according to the keyboard flavor you are using. Basically if 'icatis' does not work, it may indeed be 'icqtis' that works.


Table: CM server - default passwords
Component Login Password Comment
UNIX account root icatis change this a.s.a.p. with passwd
UNIX account guest guest disable it when you are done: passwd -l guest
cmwebadmin admin icatis change it (see section sub:account-edition)
PostgreSQL / CMDB cmu cmupassword change it in PG and in config.php
OAR oar oar change it in MySQL, in config.php and in oar.conf



B. Enabling extra-options in Virtual Client download page

This chapter tells how to enable most options in the virtual client page. You will have to perform several operations on your server, some may not be quite obvious.

This appendix will be written in a next version of this documentation.

C. Acknowledgements


C.1 Open Source Software

ComputeMode relies on several Open Source components, namely:

If some software you developed is being used and we forgot to mention it in this document, please tell us and we will fix this page in later revisions of this document.

C.2 Third-part software

This manual mentions several third-part software and trademarks. Here is the alphabetized list:

An Open Source (GPL) batch manager is shipping with CM and named 'OAR'. If you are more acquainted with Platform's LSF, OpenPBS, TORQUE or Sun Grid Engine, you use them instead provided you adapt the configuration.

D. ComputeMode Database Schema and Dictionary

This appendix describes the database schema used in ComputeMode 2.0.

D.1 Compatibility with previous schemas

The schema described in this appendix is compatible with the previous schema (used in ComputeMode 1.6) meaning the web administration interface 1.6 may be used with a database implementing this schema.

Only fields addition were done, or fields extended without constraint added.

This schema aims at being implemented in PostgreSQL 7.4 or after.

D.2 Conventions

Conventions are numbered for convenience.

  1. Naming convention: whenever it is not used, this will be duly noted.
  2. Tables and fields names: they are case-insensitive.
  3. Table names: they have the same name as the object they store but in in the plural form : to store 'foo' objects the table should be named 'foos'.
  4. Foreign keys: if the table ''foos'' has a field named ''id'', then in the table ''bars'' the field name will begin by ''foos_id''. Most of the time, it will only be ''foos_id'' but it may become ''foos_id_thingie''.
  5. Primary keys: every ''simple'' table has a primary key implemented by means of a sequential auto-incrementing value (PostgreSQL type 'bigserial'). Most of the time, if the table has a plural name, the primary key (named id for short) will be named : 'id' + singular name of the table. For instance, for 'nodes' the record id is named 'idnode'.
  6. Joins table: many-to-many tables have a name based on the aggregation of both table names to join on : table 'foos' and table 'bars' will join in a 'foos_have_bars' (or 'bars_have_foo' - no specified order is not considered) table. For such tables, the key is composed of the aggregation of the two fields used to join. Both fields should respect the convention to foreign keys naming.
The following abbreviations are used throughout this appendix :
PK
primary key constraint (the recording is unique - DBMS enforcement)
FK
foreign key constraint (the recording exists as a primary key in the referring table - DBMS enforcement)
FK*
may be null - but if not, it should refer a foreign key (no DBMS enforcement)
NE
naming exception (when a field or table does not respect these conventions)

D.3 Tables list

Do note that in PostgreSQL ancillary tables are automatically added to the schema to handle sequential values (auto-incrementable values). The tables explicitely created are given below sorted by alphabetic order:

bootimages
Contains informations related to Linux boot images.
bootmodes
Contains the informations related to the boot modes (Local, ComputeMode or RawPXE).
bootparams
Contains boot parameters which are understood by the ComputeMode scripts. This table is not editable from the web interface and only contains fixed values. Some help messages may be added or translated.
bootmodes_have_bootparams
Stores the many-to-many association between the tables bootmodes and bootparams for the ComputeMode bootmodes. (A bootmode has boot parameters, a boot parameter may apply to several bootmodes).
labels
Contains the labels names.
Note: records whose primary keys are worth 1 and 2 are reserved for ComputeMode internal use (implement the 'Quarantine' and 'Unconfigured' labels.
logevents
Stores the modifications of the automatons states during the life of a node.
nodes
Contains all the information related to a node.
exceptions
Contains all the exceptions related to machines. An exception being when a user declares his/her machine is going to be available between 2 dates.
nodes_have_labels
Stores the many-to-many association between nodes and labels (A node may have several labels applied, a label may apply to several nodes).
nodetimeslots
Stores timeslots (= tuples composed of a week day, a start time and an endtime) and its association with a bootmode.
nodeweeklytimeslots
This is were the schedules (that is a set of nodetimeslots for a week) are stored.
users
This table contains the list of users of the web administration interface, that is mostly CM administrators. Personal information and encrypted passwords are also stored here.
version_schema [NE]
This tables must contain only one recording which is used to check the DB schema version when the web administration interface is upgraded.

D.4 Database Schema (plot)

Do note the following elements about figure [*] :

Figure: CM database schema and relations
Image cm-dbms-partial-schema

D.5 Database Dictionary & Constraints

The database dictionary is described below and aims at giving hints at how fields are used and what they are supposed to contain.

In addition to the PK and FK constraints, some constraints are enforced by PostgreSQL. To avoid depending on too many DBMS specificities, others are only application-enforced.

D.5.1 Table bootImages

idBootImage (PK)
id
biName
for boot image name - string identifying the BootImage (e.g. '2.4.24-20mdkenterprise')
CONSTRAINT:DB: not null
kernel
name of the kernel file used - path relative to the TFTPD directory
CONSTRAINT:DB: not null
initrd
name of the initrd image used - path relative to the TFTPD directory
cmdLine
kernel options (e.g. 'ro devfs=mount ramdisk=5000 acpi=ht') - the part related to ComputeMode boot parameters must not conflict with what is stored there
environment
link to an external description of the bootimage

D.5.2 Table bootModes

This table implements some kind of union record. Depending on the value of the 'BMType' the application will use different fields.

idBootMode (PK)
id
bmName
a unique string identifying the bootmode (e.g. 'redhatforsmp')
Note: to hold some notes or comments
CONSTRAINT:DB: not null, length > 0, length < 30, UNIQUE
bmType
currently there are only 3 modes supported, knowingly:
'Local': indicates to boot on the local disk : there should be only one
'ComputeMode': indicates a ComputeMode boot
'RawPXEConfig': indicates a plain service
CONSTRAINT:DB: not null, among the 3 strings above
rawConfigFilePath
Path to the raw PXE configuration file to use. This field is only used when BMType is worth 'RawPXEConfig'.
CONSTRAINT:DB: length < 200
bootImages_idBootImage
The id of the BootImage (in the BootImages table) to use if BMType is worth 'ComputeMode'. For the record, BootParams entries are also associated with every ComputeMode type BootMode, using the BootModes_have_BootParams association table.
CONSTRAINT: the DBMS only checks the value is greater or equal to 0.
isSmart (unused)
left in the schema for compatibility reasons - should always be 0.

D.5.3 Table bootParams

idBootParam
id
bpName
string identifying the BootParam (e.g. 'NFSROOT')
CONSTRAINT:DB: not null
isPXE
boolean indicating whether this parameter should be passed via the kernel cmdline (via PXE)
if false, the parameter will be given later an extra parameter (via HTTP). Only vitals parameters mandatory for booting should have this value set to true.
CONSTRAINT:DB: not null
defaultValue
an optional sample default value which may be used to prefill dialog in the configuration interface
note
an optional help message about how the bootparam should be used or the format of the values to use

D.5.4 Table bootModes_have_bootParams

This table is an association table for the many-to-many relation between BootModes and BootParams. The fields here are:

bootModes_idBootModes (FK)
bootmode to associate
CONSTRAINT:DB: see below
bootParams_idBootParam (FK)
bootparam to associate
CONSTRAINT:DB: (bootModes_idBootModes, bootParams_idBootParam) = (PK)
value
value for the association

D.5.5 Table labels

idLabel
id
name
label name
CONSTRAINT: DB: at most 30 characters
CONSTRAINT: APP: only letters, no spaces, unique in a case-insensitive way...

D.5.5.1 IMPORTANT NOTE - VALUES TO USE

CM uses 2 values internally which must be present in the DB :

  1. (idLabel, name) = (1,'Unconfigured')
  2. (idLabel, name) = (2,'Quarantine')
Any other (strictly positive) value is fine to use.

D.5.6 Table logEvents

This table does not have a primary key. However, considering the time granularity of most systems, the switchtime field may often be regarded as a key (though, theoretically it is not).

nodes_idNode
node ID to which the logged event apply
switchTime
timestamp indicating when the event happened
newBootMode [NE]
if there is a boot mode change, then it's stored here
Note: there is no foreign key constraint as a node may be deleted and we don't want to update this table
newState
this is used to store a string relative to the state of the application for this node
CONSTRAINT:DB: length < 50
pxeBoot
indicates whether there was a PXE boot.

D.5.7 Table nodes

idNode (PK)
id
users_idUsers (FK)
which user own the node
mac
clean MAC address (6 times 2 lower-case hexadecimal numbers separated by dashes, e.g. '11-33-55-77-99-aa') integrity is only partially checked by the SGBD - the application layer has to take care of storing only meaning mac addresses
CONSTRAINT:DB: not null, length < 17
hostname
machine name to use during dhcp requests, for dynamic DNS registration.
Note: a comment field aimed at storing some text - its presence is optional - a hostname should be unique with case ignored.
CONSTRAINT:DB: length < 90
note
optional comment field
nodeWeeklyTimeSlots_idNodeWeeklyTimeSlot (FK)
a NWTS ( = schedule) ID
state
used by the application to store some transient information
lastStateChange
timestamp indicating when the last time some event occurred to the node
bootModes_idBootMode_pxe (FK) [NE]
used by the application to store some transient information
Naming exception reason: previously there were several nodes.bootmodes_idbootmode_*, thus requiring an extra extension, the other ones disappeared, and this one was left as is.
lastTimeSeen
timestamp indicating when the latest time the node was seen by the server was
useWol
use Wake on LAN for this node ? 0 means no, 1 means yes
handled_by_job_manager
should CM attempt to synchronize with the job manager ? 0 means no, 1 means yes
host_ip
optionally some numeric IP indicating which is the public IP address of the node
CONSTRAINT: APP: valid IP

D.5.8 Table exceptions

idException (PK)
id
nodes_idNode (FK)
node to which the exception applies
beginTime
timestamp indicating when the exception is to start - the time of the day should be 00:00:00
CONSTRAINT:DB: not null, beginTime <= endTime
endTime
timestamp indicating when the exception period is to end - the time of the day should be 23:59:59
CONSTRAINT:DB: not null, beginTime <= endTime
nodeWeeklyTimeslots_idNodeWeeklyTimeslot_in (FK) [CE]
the schedule the node had when it entered the exception
nodeWeeklyTimeslots_idNodeWeeklyTimeslot_out (FK) [CE]
the schedule the node will have when it exits the exception
note
optional comment field

D.5.9 Table nodes_have_labels

This table is an association table for the many-to-many relation between Nodes and Labels. The fields here are:

nodes_idNode (FK)
node to associate
CONSTRAINT:DB: see below
labels_idLabel (FK)
label to associate
CONSTRAINT:DB: (nodes_idNode, labels_idLabel) = PK

D.5.10 Table nodeTimeSlots

2 different NodeTimeSlots fields related to a same NWTS should not overlap (application enforced).

idNodeTimeSlot (PK)
id
NodeWeeklyTimeSlots_idNodeWeeklyTimeSlot (FK)
the ID of the NWTS/Schedule to which the record belongs
dayNo
the number of the day (ranging [and including] from 0 = Sunday to 6 = Saturday)
CONSTRAINT:DB: not null, 0 <= dayNo <= 6
beginTime
time at which the period starts - should be the exact time (for instance, for 6:30 it would be 6:30:00)
CONSTRAINT:DB: not null, beginTime <= endTime
endTime
time at which the period ends - should be the exact time - 1 second (for instance, for 7:00 it would be 6:59:59)
CONSTRAINT:DB: not null, beginTime <= endTime
bootmode_idbootmode (FK)
ID of the boot mode associated to the period

D.5.11 Table nodeWeeklyTimeSlots

This table name may be shortened to 'schedule' or 'nwts'.

idNodeWeeklyTimeSlot (PK)
id
nwtsName
name of the schedule
CONSTRAINT:DB: not null, length > 0, length <42, unique
isTemplate (unused)
a boolean indicating whether this is editable by users or if this is an admin table (a template is instantiated when a user edits his own NWTS)
CONSTRAINT:DB: not null
note
optional comments
isuserselectable
this field indicate if a user can select this Sheduler or if it is an administrator one
bootmodes_iddefault [NE]
default bootmode - always use 1 ('Local') - other values are undefined - left for compatibility reasons

D.5.12 Table users

idUser (PK)
id
login
a nickname, optionally based on the user's real name
CONSTRAINT:DB: not null, length > 0, length < 30, UNIQUE
password
the password hashed with the crypt() function (used to authenticate the user)
CONSTRAINT:DB: not null, length > 0, length < 100
firstname
optional fields of information regarding the user
CONSTRAINT:DB: length < 50
lastname
optional fields of information regarding the user
CONSTRAINT:DB: length < 50
email
optional fields of information regarding the user
CONSTRAINT:DB: length < 90
phoneNumber
optional fields of information regarding the user
CONSTRAINT:DB: length < 40
adminrole
describes if a user is an administrator or not, 1 for an administrator role, 0 for an user
active_cm
describes if an user is active or not
active_holidays
describes if a user want to use an holliday planning file to compute ComputeMode scheduler

D.5.13 Table version_schema

This table does not have a primary key as it only has one record.

ts
timestamp of the schema

D.5.13.1 IMPORTANT NOTE

Currently, the row in this record is only used by the update system.

D.6 Joins

D.6.1 Joins list

The tables have the ''simple'' join relations below - the list items are only numbered for convenience:

  1. bootimages.idbootimage = bootmodes.bootimages_idbootimage
  2. bootmodes.idbootmode = nodetimeslots.bootmodes_idbootmode
  3. nodes.users_iduser = users.iduser
  4. nodes.nodeweeklytimeslots_idnodeweeklytimeslot =
    nodeweeklytimeslots.idnodeweeklytimeslot
  5. nodes.bootmodes_idbootmode_pxe = bootmodes.idbootmode
  6. exceptions.nodeweeklytimeslots_idnodeweeklytimeslot_in =
    nodeweeklytimeslots.idnodeweeklytimeslot
    Naming exception reason: several exceptions.nodeweeklytimeslots_idnodeweeklytimeslot used in a record
  7. exceptions.nodeweeklytimeslots_idnodeweeklytimeslot_out =
    nodeweeklytimeslots.idnodeweeklytimeslot
    Naming exception reason: several exceptions.nodeweeklytimeslots_idnodeweeklytimeslot used in a record
  8. exceptions.nodes_idnode = nodes.idnode
  9. logevents.nodes_idnode = nodes.idnode

D.6.2 Many-to-many joins tables

The tables storing many-to-many associations use the relations below - the list items are only numbered for convenience:

  1. bootmodes_have_bootparams.bootmodes_idbootmode = bootmodes.idbootmode AND bootmodes_have_bootparams.bootparams_idbootparam = bootparams.idbootparam
  2. nodes_have_labels.nodes_idnode = nodes.idnode AND nodes_have_labels.labels_idlabel = labels.idlabel

E. ComputeMode boot process

The figure cap:network-communications-during-boot describes all the network communications occuring when a node boots diskless in ComputeMode.

Figure: Network communications during the boot process
Image cm_network_boot_en

F. Updating Debian ComputeMode Images

F.1 Initrd Images

If you wish to upgrade a network driver or alter the initrd image by yourself:

mkdir /var/www/bootdirectory/images/temp 
cd /var/www/bootdirectory/images/temp 
gzip -dc ../debiaufs | cpio -id

F.2 Tips & Tricks

F.2.1 Configuring the boot message shown at the end of the diskless boot

Just edit the file located at : /cm/debian/patch/var/diskless/boot.msg

You can use ANSI colors and plain text.

F.2.2 Changing the alt-ctl-del behavior

As a default, in the computing distribution shipped (in /cm/debian/), hitting 'alt-ctl-del' when in ComputeMode will use the end behavior specified in the bootmode (halt or reboot for instance).

For instance, if it's set to 'halt', the node will be quarantined and stops when the schedule ends or the user hits 'alt-ctl-del'.

To change this behavior edit the distribution patched etc/inittab : look for the line reading:

ca::ctrlaltdel:/diskless/utils/diskless_exit.sh inhibit
and replace it by something such as:
ca::ctrlaltdel:/diskless/utils/diskless_exit.sh inhibit reboot
On the server, the files are located at:
/cm/debian/patch/etc/inittab
and
/cm/debian/utils/diskless_exit.sh

G. Localization (L10N, I18N)

cmwebadmin has supported both French and English language since version 1.2 through a rudimentary translation system.

In this appendix, every path specified is relative to the path cmwebadmin is installed (by default, /var/www/cm/)

Since the 1.6 version, the existing system has been overhauled and now support more standardized .po files (also known as locales) with the following constraints :

Those files are parsed the first time they are used and the parsed copy is stored in templates_c. Each time they have to be used, the file modification time of the parsed copy and the original copy are compared. If the .po file is newer than the parsed copy, the latter is updated.

For information, the parsed file are stored as a serialized PHP object whose name is something like :

./templates_c/l10n.<language code on 2 characters>_<country code on 2 characters>.ser
To add a new translation, just add a file in the ./locale/ directory. If you have some translations missing, the en_US (defined in config.php) flavor will be used, then the keyword (msgid) itself if nothing else is found.

About this document ...

Image cm-logo-computemode-sphere
 


 
ComputeMode: On-demand HPC 
Cluster Manager 
Version 2.0
 
http://computemode.imag.fr/

This document was generated using the LaTeX2HTML translator Version 2008 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -dir ./build -split 0 -show_section_numbers -html_version 3.2 -no_navigation manual.tex

The translation was initiated by genevois on 2012-02-16


genevois 2012-02-16