ComputeMode™ 1.2 Manual

1. Welcome to ComputeMode
2. ComputeMode overview
3. Installing ComputeMode
3.1. Requirements
3.2. Downloading ComputeMode
3.3. Server installation steps
3.4. Registering Processing Nodes
3.5. Your first Grid Computation
4. ComputeMode Management
4.1. Server configuration
4.2. Managing the grid
4.3. Managing applications
4.4. Adding new Boot Modes
5. Managing computational Jobs with OAR
5.1. OAR introduction
5.2. Login and authentication
5.3. Submitting a Job
5.4. Monitoring and troubleshooting Jobs
5.5. Removing a Job
6. Frequently Asked Questions (FAQ)
6.1. General
6.2. Installation
6.3. Security

1. Welcome to ComputeMode

ComputeMode is an Icatis product that builds or extends an enterprise Computing Grid through aggregation of unused computing resources. For instance, a boost to an existing cluster can be obtained using employees' PCs while they are not being used, to easily and smoothly provide extra performance in a reactive way upon peak computing loads. In another example, classroom PCs from a university can be used to build a cluster that will provide a huge computational facility when machines are not used.

In its standard installation mode, ComputeMode comes as a software suite installed on a server on the user's premises. This server will handle a set of PCs from the local network, manage their transition from their standard (Windows) usage mode to a (Linux) computing mode, and offer their available power through a Batch Management System. The ComputeMode Server provides a Web-based management interface.

Offering easy deployment and administration, security and reactivity, ComputeMode will help computationally intensive businesses such as Engineering, Life and Chemical Sciences, Research and Education, Oil and Gas, Weather and Climate to meet their processing demands in a reactive and efficient way.
In advanced configuration modes, ComputeMode can be connected to an existing cluster through Platform LSF, OpenPBS or Sun Grid Engine, monitoring the cluster activity and automatically provide extra resources upon peak usage load.

ComputeMode benefits from a lot of experience in the context of High Performance Computing, Cluster programming, and intensive data movement. The technology has its bases on Icatis' team work with Informatique et Distribution Research Lab, a French public organism affiliated to CNRS, INPG, INRIA, and UJF. We have been working on distributed (super)computing and high performance computing projects such as Ka-Tools, Mandrake CLIC, TopTools, I-Cluster, NFSp, a mainstream TOP500 cluster, and benefit from this whole set of expertise in the Icatis context.

ComputeMode is a Trademark from Icatis SAS.

[Back to the table of content]

2. ComputeMode overview

ComputeMode relies on a ComputeMode Server that keeps track of the availability of some dedicated PCs on the local network. Each PC Owner has a weekly availability Schedule, identifying periods when he or she is not using his/her PC. For instance, a PC owner can declare working periods from 8:00 to 18:00, Monday to Friday.

In the remainder of this document, we will refer to PCs handled by ComputeMode as Processing Nodes.

The ComputeMode Administrator can easily manage the computing PCs through a Web-based interface, accessed through the ComputeMode Server.

A Grid User can submit some computational Jobs to the system through the use of a classical Batch Manager (also called "Job Management System", "Distributed Resource Managers", or "Queuing System"). The Grid User can log onto the Batch Manager from any machine of the local network (usually through the "ssh" secure shell). The Batch Manager will then reserve appropriate resources for the computations, Schedule the execution of the Jobs depending on the overall load on the Computing Grid and allocate Jobs to available computing resources.

The OAR Batch Manager is installed by default with ComputeMode, even though other products such as Platform LSF, OpenPBS or Sun Grid Engine are supported by ComputeMode.

ComputeMode can monitor the load on the Batch Manager, and detect overloads. In such cases it can allocate available Processing Nodes to computational tasks.

Each Processing Node has two different operating modes:

- The User Mode, in which the machine is working in its standard way under Microsoft Windows for instance. The Owner of the machine will not even notice that his or her PC is managed as a ComputeMode Processing Node. In particular, the computational resources of the PC will not be used while in User Mode.
- The Computation Mode (this is where the name of our product comes from) is activated when the machine is in a time period where the PC is declared as "idle". In case the ComputeMode Server detects some computational peak, the PC can be remotely switched to Computation Mode. The switch from user mode is done through an automatic reboot of the machine and proceeds to a remote boot. The remote boot is handled by the ComputeMode Server with the "Preboot eXecution Environment" (PXE) protocol, which is natively available from the BIOS of PCs since 1999. While in Computation Mode, the machine is running under the Linux Operating System, and does not have any access to any local hard disk.

When the ComputeMode Server detects that some given Processing Node is no longer available for computation, it restores the machine back to its User Mode, so that the Owner will not even notice that his or her PC has been used by ComputeMode.

The submitted Jobs can take advantage of the NFS distributed file system, which is made available from the ComputeMode Server. Each Grid User has his or her own private directory and can use it for data required by the computational Jobs, as well as to retrieve the produced output files.

A specific case may happen if a PC Owner comes back and needs his or her PC while a computational Job is being processed. This can happen, for instance, if the Owner has an urgent need and comes back at night to work on his/her PC. In such a situation, the owner always has priority over the machine. Just by using the keyboard, the Owner can abort any ongoing computational activity on his/her PC, and ComputeMode will restore the machine to its User Mode in about one minute.

[Back to the table of content]

3. Installing ComputeMode

3.1. Requirements

ComputeMode is a Grid Computing product. In order to install or evaluate it, you will need to dedicate a PC as the ComputeMode Server. Also, some processing PCs must be available on the local network, in order for you to register them to ComputeMode and to use them as Processing Nodes.

Note that you do not require any Linux experience to install ComputeMode. You can experiment with our Linux configuration if you are an experimented Linux administrator.

We place the following requirements on the infrastructure that you will use:

*** Server requirements

The server should be a reasonably fast machine, dedicated to ComputeMode. Its minimal settings are:
- Pentium-class processor, 1 GHz minimum frequency.
- 256 MB of RAM.
- 100 mbps Ethernet (any brand will fit).
- 4GB Hard Disk Drive. SCSI or IDE drives are supported in an equal manner. Note that the HDD will be repartitioned and reformatted upon ComputeMode installation; hence the full contents of the hard disk will be lost.

*** Processing Nodes requirements

Processing nodes will be remote booted when required by the ComputeMode Server. This process involves running a diskless Linux (through a network boot) on each of these PCs. Not much requirements are made onto these PCs, but of course the faster the machines, the better the computing performance will be.
- Pentium-class processor
- 256 MB of RAM.
- 100 mbps Ethernet (Remote-Wake-Up capability is recommended)
- PXE compliance. This feature makes the PC capable of network booting. PXE capability is a commonplace technology on PCs now, and is available on all recent PCs with integrated LAN. It has been integrated by major vendors' corporate PCs since 1999 (required by PC98 and PC99 recommendations from Microsoft and by the Wired for Management Initiative from Intel).
- PXE should be activated on each Processing Node through their BIOS setup. Network boot is usually enabled by default by PC manufacturers, but you should check that each PC is properly PXE-activated. The PXE/Network boot must be configured as the first boot device in order to fully take advantage of ComputeMode boot mechanism.

*** Infrastructure requirements

ComputeMode just requires a TCP-IP local network. The following constraints should however be enforced:
- All ComputeMode nodes should be in the same broadcast range. For instance, having all the machines connected on the same switch, sharing the same subnet or VLAN will be sufficient. This is because ComputeMode relies on the DHCP/PXE protocol, which requires the server to be able to receive Ethernet frames broadcasted on the LAN by any of the Processing Nodes.
- PXE should *not* be in use on your local network. PXE is sometimes used by corporations that use it for deploying new Operating System images to network PCs. If PXE is already in use on your network, the PXE from the ComputeMode Server will conflict with the corporate PXE. You should check with your network administrator whether PXE is on your network or not. In such a case however, advanced configurations of the PXE environment may be provided by Icatis, adapted to your local needs, using for instance an advanced PXE bootstrap chaining mechanism

Tip: To check whether PXE is in use on your premises, boot a PC and let it proceed, at the end of its BIOS boostrap, into PXE network boot (the ComputeMode Server should be disconnected of course). If the PC exits without finding any PXE server, then there are chances that no PXE server is present on your local network.

*** Access Node requirements

Some machines of the network will be used to access the ComputeMode system. For instance, you will need to access the ComputeMode Web interface available through the ComputeMode Server. You will also need to access the system to submit computational Jobs and retrieve their results. These operations can be done using any PC from the network (including Processing Nodes). However, the ComputeMode Server cannot be used for such, as no graphical interface is installed (text-based management console is the only local access to the ComputeMode Server).
Requirements on the access nodes are:
- Web browser availability (to access the ComputeMode Server through its Web-based interface).
- An "ssh" facility, which will be used to logon to the Batch Manager and submit Jobs to ComputeMode. Also ssh can be used for remotely administer the ComputeMode Server (for experienced Linux administrators only).

[Back to the table of content]

3.2. Downloading ComputeMode

To install the ComputeMode Server, you will need the ComputeMode Server installation CD-ROM. You can obtain it from Internet at http://www.computemode.org/.
The ComputeMode CD-ROM is distributed in the form of an .iso file which is several hundred MB large; you must ensure that your Internet connection is able to perform this large download. ComputeMode™ is distributed under the GNU General Public License (GPL). Once the download is finished, burn the .iso image to a blank CD-ROM.

If you do not want to download the ComputeMode CD-ROM from Internet, please contact us at http://www.icatis.com/ and ask for a physical CD-ROM to be mailed to your address.

[Back to the table of content]

3.3. Server installation steps

Once you have ensured that requirements above are met, you just have to get the following information from you network administrator:
- IP address for the ComputeMode Server (static addressing is required). For instance, "143.23.42.21"
- Subnet mask. For instance, "255.255.255.0"
- Gateway. For instance, "143.23.42.253"
- DNS server. For instance, "143.23.42.128"

You can then proceed to the installation of ComputeMode following these steps:

. Switch your ComputeMode Server on, with the ComputeMode CD-ROM inserted.
. Follow the straightforward installation steps.
. When requested by the installer, enter the ComputeMode Server network parameters.
. Once the installation is finished, go to your own machine (which will be used as an Access Node), open your Web browser and navigate to the following ComputeMode Server address, prepended by "http://". For example, open the "http://143.23.42.21/" Web page. This is the ComputeMode Server administration page.
. Default username and password for the ComputeMode Server administration page are "admin" and "icatis" (without the quotes). Note that you can change your default password by editing the "admin" account: click on "Accounts" in the left menu bar, then choose the "admin"' login. Type the new password in the dedicated field and press "Update".

Please refer to section 4 below, "ComputeMode Management", for details about the ComputeMode management interface and its capabilities.

[Back to the table of content]

3.4. Registering Processing Nodes

To register machines to ComputeMode, there are two possibilities:

- From the administration interface, go to "Nodes", then click the "Add Node" action button. Enter the hostname and MAC address (you have to know it) for the host, select the "Always CM" node Schedule (the node will then always boot in Computation Mode, except if the user cancels the PXE boot upon startup of his/her machine). Click "Add" and you're done.
- ComputeMode will automatically add each PC from the local network that enters the PXE bootstrap. However each machine has to be validated in the ComputeMode Administration interface to be taken into account by the system. This means that if you reboot any machine from the network, it will appear in the administration interface in the "Nodes" page.

*** Activating the machines

Machines added to the ComputeMode system will not be handled by the system by default. Instead, it will always use the "Local" Boot Mode. You have to select an appropriate Boot Mode for the machine to actually start with its Computation Mode.
For instance, we suggest that you select the "debian-oar" Boot Mode (provided with the standard ComputeMode installation) for all your Processing Nodes.

Once this is done, upon each reboot, the managed PCs will be handled by the ComputeMode Server, which will activate their local boot in User Mode or activate a network boot to Computation Mode, depending on each PC's Schedule and on the current time.

[Back to the table of content]

3.5. Your first Grid Computation

In order to provide a computational Job example, a full application is included with ComputeMode to demonstrate its operation. This demonstration is a parallel execution of the POV-Ray ray tracing engine. The demonstration is available through a standard Web browser. Just point to http://cmserver/pov/ where "cmserver" stands for your actual ComputeMode Server IP or hostname. This will bring you to the Job submission interface.
The demonstration computes images from ray tracing scripts. Each image is split into several sub-images, and each of these sub-images is submitted for computation by a Computing Node. The computational capabilities of the available Computing Nodes cumulate to provide an extensible computing capacity. Obviously, the more machines are available in Computation Mode, the fastest will be the execution of the Jobs.

To submit a POV-Ray Job, just click on the appropriate hyperlink and wait until it starts.

Note: Currently, Icatis' parallel POV-Ray demonstration does not provide any possibility to stop running Jobs. This means that if several POV-Ray Jobs are submitted in parallel, they will be queued and be processed in the order chosen by the Batch Manager, which is likely to be a First Come First Served (FCFS aka FIFO) order. So if two POV-Ray Jobs are submitted at the same time, the second one will start only after the first one finishes.

The bottom of the page displays a graph showing the number of tasks done, as well as the number of Computational Nodes used as well as their names. You can experiment starting/stopping Computation Mode on some machines through the ComputeMode Management Interface and see how the execution speeds up with the number of nodes.

[Back to the table of content]

4. ComputeMode Management

4.1. Server configuration

The ComputeMode Server uses the parameters provided during the installation process, such as the server's IP address, to generate its network configuration. PXE, DHCP and other services are automatically configured and should work on the network without generating any conflict.

For advanced administration requirements, you can change the configuration parameters through the server console. In order to do this, you must log as "root" on the server, edit the configuration files you want to change and then restart services. Note that this process should be reserved to experienced Linux administrators only. The default password for the root account being "icatis", we strongly advise you to change it.

[Back to the table of content]

4.2. Managing the grid

All grid management features are available through the Web interface.

*** Adding new machines, Removing, Editing

From the administration interface, go to "Nodes", then click the "Add Node" action button. Enter the hostname and MAC address (you have to know it) for the host, select the "Always CM" node schedule (the node will then always boot in Computation Mode, except if the user cancels the PXE boot upon startup of his/her machine). Click "Add" and you're done.

Note that ComputeMode will automatically discover and add each PC from the local network that enters the PXE bootstrap. All such machines have to be validated through the ComputeMode Administration interface to be taken into account by the system. This means that any machine booting from the network will appear in the administration interface, in the "Nodes" page.

You can edit an existing machine parameters by clicking on its hostname in the "Nodes" management page.

You can remove one/several machines from the nodes list by selecting them (check the checkbox right before the hostname to select a machine) in the "Nodes" management page and then clicking on the "Delete nodes" button.

*** Schedule management

Each Processing Node is associated to a weekly Schedule in ComputeMode, which indicates at which times the Node will be reserved for the user, and when it will be available form computation.

Schedules are currently handled as templates in ComputeMode.
You can display the Schedules list by clicking on "Computing schedules" in the left bar menu.

To create a new Schedule, select an existing Schedule first, then copy it by clicking on "Copy computing schedule". You should see a new Schedule in the list which is named "Copy of xxx" where "xxx" is the name of the original Schedule used for the copy.
To edit this Schedule, click on its name. A new page is displayed where you can edit the Schedule name and its affected time slots.
The time slots allow you to set the periods when the Processing Nodes are available for Computation Mode:
- A checked time slot means that the Processing Node is available for Computation Mode.
- An unchecked time slot means that the Processing Node should be in User Mode.

Note that the Processing Nodes will be switched to Computation Mode only if they fulfill the following requirements:
- The Node is available for Computation Mode regarding to its Schedule.
- The Load Monitor detected that more Computation Nodes are required.
- The Node has been shut down or idle for more than 30 minutes.

*** Manual startup/shutdown of a Processing Node

You can force one or several Processing Node(s) to switch to Computation Mode by checking the checkboxes related to the Node(s), then clicking on the "Wake-up test" button. If the client is shut down, this will wake it up and turn it to Computation Mode for one hour.
You can stop the Wake-Up Test at any time by appropriately selecting the Nodes, then clicking on the "Stop test" button. Note that it takes about one minute to shut down a client through this method.

[Back to the table of content]

4.3. Managing Applications

ComputeMode provides a centralized, distribution independent system for Processing Nodes. As a result, deploying (adding or upgrading) an application on the nodes is easily achieved through modifying the ComputeMode Server's network boot system repository.

Note: Deploying an application on ComputeMode should be performed by experienced Linux administrators only.

The file system of the Processing Nodes is shared through the NFS distributed file system facility. This can be seen on the ComputeMode Server in the /cm directory (login as root on the ComputeMode Server through ssh to access it). You may find different directories here if you have several boot images available. The default ComputeMode offers a Debian-based distribution, so you should go to the /cm/debian directory.

In this directory you will find subdirectories 'orig', 'patch', 'utils', and a 'rules' file. Let's review each of these items:

* /cm/debian/orig contains a golden system image, built by copying all files from a fresh Linux installation (any Linux distribution can be used). It is not modified for network boot. You should manage this directory using the distribution specific packaging system.

* /cm/debian/patch contains the data (configuration files or replacement binaries) required to transform the 'orig' base Linux system into a ComputeMode network boot system. As the 'patch' name suggests, we use a mechanism similar as the one used on source files to distribute small modifications.

* /cm/debian/rules is a script describing how the 'patch' elements are applied over the 'orig' upon each Processing Node startup. More precisely, it describes the copy and link opertations to be applied using the copypatch, copyorig, linkpatch, linkorig and skip commands.

* /cm/debian/utils contains some ComputeMode-specific scripts required to perform the network boot specific operations on each Processing Node that enters Computation Mode. The buildroot.sh script handles the 'rules' processing, whereas other commands will perform specific operations while starting, registering or stopping Nodes.

As a result, deploying an application to a given Boot Mode is achieved by adding necessary files to the 'orig' directory (possibly using the distribution's packaging facility) using a "chroot" command to have a consistent system view. You then have to check that the installed files will not be overridden by the 'patch' files. Such conflicts will be solved manually using the 'rules' script.

Each Processing Node will benefit from the new software installation automatically upon next boot.

[Back to the table of content]

4.4. Adding new Boot Modes

The Nodes network boot images are handled by the ComputeMode Server through "Boot Modes". Boot Modes include files necessary to the network boot as well as parameters required to install to client machine.
Changing or creating Boot Modes is reserved to experimented Linux administrators only.

Creating a Boot Mode involves the creation of the appropriate Linux kernel and initrd files (which will be placed in the /var/lib/tftpboot/PXEClient directory on the Server), as well as the NFS Server repository for 'orig' and 'patch' files. Once this is done, you have to register the newly created Boot Mode in the ComputeMode system.

To register a new Boot Mode, click on the "Add ComputeMode bootmode" button in "Boot modes":
- Give the boot mode a name and an associated boot image.
- Click on the "Add" button.

You can then set the boot mode parameters:
- Choose a parameter in the boot parameters list. If JavaScript is activated on your Web browser, the default value for this parameter is displayed in the "Value" field (it is left blank if no default value is associated to this parameter). The comment field displays a contextual help for the parameter.
- Set the correct "Value" field for the parameter.
- Click on "Update parameters" to add the parameter to the Boot Mode.
- To remove a parameter, check the "Remove" checkbox associated to it in the parameters list and click on the "Update parameters" button.
- Click on the "Save changes" button in the first part of the form.

The new Boot Mode is then created and available for association to Processing Nodes. You can assign it to the appropriate Nodes through the "Nodes" menu.

[Back to the table of content]

5. Managing computational Jobs with OAR

5.1. OAR introduction

In order to handle large computations (we will call these "_Jobs_" in the remainder of this document), processing is often split into individual work units (which we will call "Tasks") to be distributed to several different machines. Once this is done, each Task can be processed by a processor, and the overall Job can then benefit from a large number of processing nodes and be done in a shorter time. Other benefits can be obtained from distributed processing, such as the ability to process large amounts of data that would not fit on a single machine's memory, or the capability to "queue" some Jobs so that they will be executed at a later time when the machines necessary for processing will be available.

Parallelization of Jobs into several tasks is beyond the scope of this document. In case you need some help to parallelize your applications, contact the Icatis team which will provide you some advice depending on your specific case.

The OAR Batch Manager plays the Job queuing role, the processor allocation role, as well as some advanced scheduling policies. You can control the priorities of Jobs so that the execution provides the most benefits to your organization (usually you will seek either faster execution, cheaper execution or deadline-matching policies, although more advanced use is available).

OAR uses a processor-oriented allocation paradigm: Although some Batch Managers handle machines using the available RAM on a given node, and allocate part of the memory for each Job, OAR does not share processors between Jobs. Instead, a machine is allocated as a whole to its submitted Job.

[Back to the table of content]

5.2. Login and authentication

To submit Jobs using the OAR Batch Manager, a secure shell (ssh) session must be opened on the ComputeMode Server. Note that the ComputeMode Server is the *only* submission host i.e., all Job management commands should be done through it.
A user account on the ComputeMode Server is requested to log in. OAR commands usage is restricted to the members of the "oar" UNIX group.

This means that to create Job management accounts, you must first log on the ComputeMode server as root, then create the new user account using the "adduser" command (e.g. `adduser alice'). Then add your user to the "oar" group using the adduser again (e.g. `adduser alice oar').

Your new account is now available on the ComputeMode Server.

You can log on to the ComputeMode Server using your account and use the oar commands as described below.

*** Data access for Jobs input/output

There are two approaches for data transfers on clusters: With or without data servers. A specific data server is with NFS, which is the recommended setting for ComputeMode operation.

Another possibility is to use stage-in and stage-out scripts that move data to computing nodes at the beginning of each task, and move back the data to a central repository once the task execution finishes. Note that with ComputeMode, no local storage is available except in a RAM disk, which limits this mechanism to Jobs with little data sets.

[Back to the table of content]

5.3. Submitting a Job

The "oarsub" command allows submitting new Jobs to the system:

oarsub [Options ...] /path.to/job

Options:
-l walltime=hh:mm:ss gives the maximum execution time during which the
resources are allocated. After the time period
expires the Job is automatically killed.
-p hostname=host places a requirement on the execution node.
-l nodes=n places a requirement on the number of nodes to be
allocated. Note that a bi-processor is considered as
a single node, hence both processors will be
allocated to the Job (also see weight below).
-l weight=n requires n processors per allocated node. Usually
n will be 1 or 2.
-q queuename specifies the queue to be used for this Job.
-I Interactive mode. This provides you with a shell
instead of executing a script.
-v verbose mode.

The parameters can be entered on the command line, or in the script itself.

Environment variables:
$PWD Current directory (where the command was run).
$OAR_FILE_NODES Name of the file containing the allocated node names.
$OAR_JOBID OAR Jobs id.
$OAR_USER User name.
$OAR_NB_NODES Number of allocated nodes.

Note that oarsub does not *execute* but rather *queues* the request for later execution, when processors are available and meet the required conditions.

Examples:

Example 1, script with a sequential Job.

#OAR -l weight=1 #Reservation of a single processor on a single node
#OAR -l walltime=72:00:00 #Reservation of the nodes for 72 hours max.
cp /home/nis/arnold/code/* /scratch/arnold/code/* #Copy file locally
/scratch/arnold/code #Job execution
cp /scratch/arnold/code/*.res /scratch/arnold/code/results/ #Copy back

Example 2, script with a parallel Job.

#OAR -l nodes=8 #Reservation of 8 nodes
#OAR -l walltime=144:00:00 #Reservation for 144 hours max
cd $PWD #Go to current directory (where 'oarsub' was launched)
lamboot $OAR_FILE_NODES #Start the MPI parallel engine
mpirun /home/nis/arnold/tests/code_parallel >res # Start tasks
lamclean # Stop the MPI engine
wipe # Cleanup

[Back to the table of content]

5.4. Monitoring and troubleshooting Jobs

Two command line tools are available to monitor OAR: "oarnodes" and "oarstat".

oarnodes provides information about Processing Nodes (state, which Jobs are executed, Nodes properties, ...)

oarstat displays current Jobs status with the following options:

-f prints each Job details
-a prints more detail and keeps table format

Two high level interfaces are also available: "Monika" and "DrawOARGantt". These analysis tools are available as Web pages at http://cmserver/oar (you have to replace "cmserver" by your actual ComputeMode server IP address or hostname).

Monika displays a snapshop of current OAR status regarding to Nodes occupation and running/pending Jobs.

DrawOARGantt displays a Gantt diagram of the Nodes reservations.

[Back to the table of content]

5.5. Removing a Job

To remove a Job from the OAR queue, just use the following command:

oardel jobid

Where jobid is the id of the Job that was allocated upon using the oarsub command.
If the Job is currently running on a Processing Node, it will be killed.

[Back to the table of content]

5.6. OAR administration

ComputeMode installs OAR in a configuration that makes it automatically maintained. For example, nodes are automatically registered and unregistered in OAR database (actually set to Alive or Absent respectively, see oarnodes output) as they enter or leave ComputeMode.
In standard environments, this default OAR configuration should be sufficient to satisfy most user needs.

Note that OAR has advanced features such as multiple queue support, which are not documented in this simplified manual. Please refer to http://oar.imag.fr/ for advanced information about OAR configuration.

[Back to the table of content]

6. Frequently Asked Questions (FAQ)

6.1. General

Q: Which architectures/operating systems are supported by ComputeMode ?
A: ComputeMode supports the PC architecture, IA32, SMP. Yet, some tests have be carried out successfully on X86-64(AMD Opteron). As ComputeMode is based on the reboot mechanism, any operating system to switch from is supported.

Q: Which network cards are supported by ComputeMode ?
A: ComputeMode supports any network card compliant with the PXE standard, and supported by Linux,which means most network cards nowadays.

Q: Which Linux distributions are compatible or available for ComputeMode ?
A: Thanks to ComputeMode advanced diskless system mechanism, any Linux distribution is compatible with ComputeMode. As of now, Debian, Mandrake and Fedora Linux distributions have been "converted" to be used in ComputeMode. A Debian Testing/Sarge distribution is provided with the ComputeMode server installation CD-ROM. More distributions will be made available in the download section. Feel free to ask in the user@computemode.org mailing list if you are interested in converting a particular distribution. Commercial support is available by contacting us.

Q: Which HPC/MPI/... library version is available in ComputeMode ?
A: Only a bare Debian Sarge Linux distribution is provided in the ComputeMode CD-ROM. No special libraries are preinstalled, as this can easily be done on demand using the Debian packaging mechanism (see the paragraph: Managing Applications). Please ask the user@computemode.org mailing list if you have troubles in doing that. Feel free to contact Icatis if you need commercial support.

[back to the table of content]

6.2. Installation

Q: ComputeMode uses the 192.168.128.0/24 private IP network, while living on the same physical network as the rest of our LAN/intranet. Why ?
A: ComputeMode uses the private 192.168.128.0/24 IP network in order to separate ComputeMode network communications from the others. This way, broadcasts do not interfere for instance. Note that the ComputeMode server may have two IP addresses (using the eth0:0 alias), so that it may be reached from both the private network and your intranet.

Q: ComputeMode server installation requires a dedicated machine. However, may I install ComputeMode server in a dual boot, or in a existing installed Linux system ?
A: No, because ComputeMode server is dependent on several software utilities which require some specific configuration. Such tools are DHCP and DNS servers with DDNS mechanisms, TFTP server linked with a SGBD, etc. However, you may install a ComputeMode server in VMWare if you wish to (such things are done here for development purposes). If you want to do so, be sure to use VMWare bridged network interface so that other machines on your network can reach your virtual ComputeMode server. You may ask for help on user@computemode.org mailing list if needed. Furthermode, future versions of the installer may propose to install ComputeMode server in a multiboot fashion if it turns to be really useful and requested.

Q: Is it possible to use an external NFS server to host the NFS based diskless systems ?
A: Yes, any NFS server that can exports files with the "read-only" and "no root squash" options can be used. Once the files from the /cm directory for instance are copied, juste change the NFSROOT value of the bootmode in the web administration interface.

Q: Is it possible to use an external NIS and NFS server to host ComputeMode users accounts ?
A: Yes, but this will need some customization of the node system, see the paragraph "Managing Applications". Please ask the user@computemode.org mailing list if your want to do so or contact Icatis for commercial support.

Q: Is it possible to install ComputeMode if a PXE server is already running ?
A: Basically, the answer is no, but advanced configurations of the PXE environment may be provided by Icatis, adapted to your local needs, using for instance an advanced PXE bootstrap chaining mechanism

Q: Can I configure a X server on the ComputeMode server, in order to access locally the web administation interface for instance ?
A: Yes. Actually a X server is already installed, with a minimalistic window manager and Mozilla Firefox. However, to use it, you first need to reconfigure the package to support your hardware, using the `dpkg-reconfigure xserver-xfree86' command.

[back to the table of content]

6.3. Security

Q: Is it possible to login as root on the nodes ?
A: For security reasons, login as root on the nodes is disabled from the console and in any other ways, except via SSH from the root account of the ComputeMode server using the private key based strong authentication.

Q: What are the passwords to log on the console of the ComputeMode server or on the web administation interface ?
A: The password for the root account of the ComputeMode server is "icatis". You may use it to login on a virtual console, or via SSH. BEWARE: AS THIS IS A DEFAULT VALUE, YOUR ARE STRONGLY ADVISED TO CHANGE IT, using the `passwd' command once logged in. The login/password to the web administration interface is "admin/icatis". YOU ARE ALSO ADVISED TO CHANGE IT, using the account page.

Q: Will ComputeMode users have access to the data stored on my hard-drive when my PC becomes a computation node ?
A: As the access to the nodes as root is very restricted (only remote login from the ComputeMode server using the SSH strong authentication protocol is allowed), harmful operations are prevented by the Linux security mechanisms. For instance, mounting or accessing hard-drive devices is prevented. Furthermore, a specific Linux kernel with advanced security restrictions may be set up, although this is not included in the standard ComputeMode installation. Please contact the user@computemode.org mailing list for further information. Support is also available through Icatis.

Q: What will happen if a ComputeMode server crashes ?
A: ComputeMode is based on solid tools and design, and crashes are not likely to occur. If it ever happens, standard Linux forensics tools can be used to repair the system. When the server is in an unstable mode, all processing nodes will automatically exit from the computation mode and get back to plain user mode.

[back to the table of content]