Todo

From ComputeMode
Revision as of 13:20, 14 December 2011 by Genevois (Talk | contribs)

Jump to: navigation, search

Contents

Documentation/Wiki

  • find a better screenshot for the main page
  • go on merging info from the old site
  • explain DHCP/iPXE/DNS and network configuration available from appliances

CM on Debian Squeeze

  • Server: update the doc to install CM on top of etch to lenny (or even squeeze since it's nearly release as the new stable)=> Doc for the Debian Squeeze: OK
  • OAR 2 installation doc=>Installation of OAR 2.5: OK
  • Update node system install doc wrt OAR2=>The same here
  • The message to launch the cmaskconfig command should appear after the login=> The script is automatically launch after the login

ComputeMode

PXE smart boot

  • test replacement of pxelinux.0 by gpxelinux.0=>We are now using the ipxelinux.0 file
  • iPXE=> OK running on computemode 2.0
  • provide a http cgi for ipxelinux.0 config file aside the tftp cgi-like file fetching mechanism=> Using IPXE we provide a http script to launch the "deployement"
  • Check again the gpxe system => The boot process freeze systematically with virtualbox...

Initrd

  • update to 2.6=> Ok the new image is running under Debian Squeeze
  • keep an eyes on union fs wrt NFS
  • http://fedoraproject.org/wiki/StatelessLinux/HOWTO -> iscsi
  • The dhclient must use the "computemode" option int the dhclient.conf file when requesting the DHCP server=> ok it's working
  • Re-implement the available options into the initrd: DDNS, DEBUG, DHCPNOKILL, DHCPREJECT... check with the old initrd from the sarge system
  • The initrd must check for several network interface, not only eth0

Computemode Server

  • Fixe MRTG configuration to plot graphics (or change the system to use ganglia) =>
  • Watchdog is not working there is a failure on NFS: the AUFS system mounted over NFS doesn't allow watchdog to reboot the node anymore: you should mount some "utils" command to be sure that they are available to reboot the node!
  • Fixe the /etc/resolv.conf file to search for the computemode domain name =>
  • Number of the DHCP option must be increased to be conform with the new RFC!
  • The start time for POV demos should begin with the first command launch and not the first results
  • RINSE: to build RPM distribution for kameleon debootstrap should be tested
  • The size of the computemode server should be increase too
  • Check that the wake on lan is working=>OK
  • OAR must wake-on-lan nodes when jobs need it for power saving ==> Ok it's working
  • OAR must stop nodes 10mn before the end of the computemode service => Ok, admin have to configure the exit command to halt
  • Check that we can use a remote nis server to authenticate users
  • OAR have to comunicate with an other OAR's from an other cluster


Node system

  • provide Debian Lenny / Squeeze instead of Sarge
  • update to OAR2
  • Define a resource with the OAR cpu/core fields
  • Fix the warning during the boot and halt process ==> Ok, we should investigate to fix the last remaining warning about resolconf package and resolv.conf file
  • At boot time, if the node has changed its physical configuration (different cpu/core) we have to put the node in the Absent state and to contact the administrators
  • We have to check that the CTRL+ALT+DEL command is working even if there is a failure wih NFS => Not working...
  • Include the Module software environement manager

Web admin interface

  • remove Icatis logo
  • add a about page with refs to Icatis, LIG and INRIA, aso
  • replace pear/DB which is deprecated (and not packaged in Debian Lenny)
  • A new feature: on the scheduling page: it's will be easier to select all the check box of a specific day with just one click on that day => Already implemented
  • We have to implement a user management system based on NIS, LDAP...
  • The default value for the bootimage into the bootmode page is not the one selected into the database=>Fix with the lastest version
  • When the user change the node name with the cmwebadmin interface, the node name have to be changed into the OAR database=>Now working
  • Create a new page to visualize the Modules available
  • Manage hosts aliases or disable modification for user to change their hhostname
  • A participant must only see his own nodes (physical or virtual)
  • Implement a scheduler/calendar for the participant to manage their nodes: he can change at any time this calendar. This calendar may only says when a node may reboot in the computemode service => oK
  • Only the participant can shows statistics about his nodes
  • The main page has to be reorganize

Packaging

  • improve cmwebadmin packages
  • provide more resources as packages ?

Appliance

  • In the virtual box system, when a node has to boot on the local disk, the exit command doesn't work, it's working on physical computer
  • Problem with some network drivers, we should contact either virtualbox or IPXE. A solution should be to dynamically load drivers from the dhcp server


HPC integration

  • update OAR2 integration

Deployment and new technology

Automaton

Personal tools
Namespaces

Variants
Actions
user portal
developer portal
wiki stuff
Tools