Todo
From ComputeMode
Contents |
Documentation/Wiki
-
find a better screenshot for the main page -
go on merging info from the old site - explain DHCP/iPXE/DNS and network configuration available from appliances
CM on Debian Squeeze
-
Server: update the doc to install CM on top of etch to lenny (or even squeeze since it's nearly release as the new stable)=> Doc for the Debian Squeeze: OK -
OAR 2 installation doc=>Installation of OAR 2.5: OK -
Update node system install doc wrt OAR2=>The same here -
The message to launch the cmaskconfig command should appear after the login=> The script is automatically launch after the login
ComputeMode
PXE smart boot
-
test replacement of pxelinux.0 by gpxelinux.0=>We are now using the ipxelinux.0 file -
iPXE=> OK running on computemode 2.0 -
provide a http cgi for ipxelinux.0 config file aside the tftp cgi-like file fetching mechanism=> Using IPXE we provide a http script to launch the "deployement" -
Check again the gpxe system=> The boot process freeze systematically with virtualbox...
Initrd
-
update to 2.6=> Ok the new image is running under Debian Squeeze-
use udev -
replace initrd by initramfs=> Ok
-
-
keep an eyes on union fs wrt NFS-
unionfs/aufs/mini_fo=> The Initrd is now mounting an AUFS system -
http://lwn.net/Articles/355351/
-
-
http://fedoraproject.org/wiki/StatelessLinux/HOWTO -> iscsi -
The dhclient must use the "computemode" option int the dhclient.conf file when requesting the DHCP server=> ok it's working -
Re-implement the available options into the initrd: DDNS, DEBUG, DHCPNOKILL, DHCPREJECT... check with the old initrd from the sarge system - The initrd must check for several network interface, not only eth0
Computemode Server
-
Fixe MRTG configuration to plot graphics (or change the system to use ganglia)=> - Watchdog is not working there is a failure on NFS: the AUFS system mounted over NFS doesn't allow watchdog to reboot the node anymore: you should mount some "utils" command to be sure that they are available to reboot the node!
-
Fixe the /etc/resolv.conf file to search for the computemode domain name=> -
Number of the DHCP option must be increased to be conform with the new RFC! - The start time for POV demos should begin with the first command launch and not the first results
-
RINSE: to build RPM distribution for kameleon debootstrap should be tested - The size of the computemode server should be increase too
-
Check that the wake on lan is working=>OK -
OAR must wake-on-lan nodes when jobs need it for power saving==> Ok it's working -
OAR must stop nodes 10mn before the end of the computemode service=> Ok, admin have to configure the exit command to halt -
Check that we can use a remote nis server to authenticate users - OAR have to comunicate with an other OAR's from an other cluster
Node system
-
provide Debian Lenny / Squeeze instead of Sarge -
update to OAR2 -
Define a resource with the OAR cpu/core fields -
Fix the warning during the boot and halt process==> Ok, we should investigate to fix the last remaining warning about resolconf package and resolv.conf file - At boot time, if the node has changed its physical configuration (different cpu/core) we have to put the node in the Absent state and to contact the administrators
- We have to check that the CTRL+ALT+DEL command is working even if there is a failure wih NFS => Not working...
-
Include the Module software environement manager
Web admin interface
-
remove Icatis logo -
add a about page with refs to Icatis, LIG and INRIA, aso -
replace pear/DB which is deprecated (and not packaged in Debian Lenny) -
A new feature: on the scheduling page: it's will be easier to select all the check box of a specific day with just one click on that day=> Already implemented -
We have to implement a user management system based on NIS, LDAP... -
The default value for the bootimage into the bootmode page is not the one selected into the database=>Fix with the lastest version -
When the user change the node name with the cmwebadmin interface, the node name have to be changed into the OAR database=>Now working -
Create a new page to visualize the Modules available -
Manage hosts aliases or disable modification for user to change their hhostname -
A participant must only see his own nodes (physical or virtual) -
Implement a scheduler/calendar for the participant to manage their nodes: he can change at any time this calendar. This calendar may only says when a node may reboot in the computemode service=> oK -
Only the participant can shows statistics about his nodes - The main page has to be reorganize
Packaging
- improve cmwebadmin packages
- provide more resources as packages ?
Appliance
- In the virtual box system, when a node has to boot on the local disk, the exit command doesn't work, it's working on physical computer
- Problem with some network drivers, we should contact either virtualbox or IPXE. A solution should be to dynamically load drivers from the dhcp server
HPC integration
-
update OAR2 integration
Deployment and new technology
- Study for the integration of general purpose deployment software
- Opsi (MS Windows deployement)
- system imager (based on a golden node; can generate dynamically a net boot kernel/initrd)
- FAI
- G4U
- KaDeploy
- CSM/XCAT (IBM tool which performs redhat kickstart installations; license?)
- cobbler (bof???)
- CIT (Cluster Integration Toolkit)
- DRBL (redhat kickstart or direct images; tools to perform post-installation on windows)
- unattended (for windows; like RIS but opensource)
- Norton Ghost
- Perceus: http://www.perceus.org/
- Some other softwares/technos to follow
Automaton
- Add a Boot-once option