Chapter 10. Computer Life Cycle Management

Table of Contents

Upgrading a node in an existing setup
Replacing an Admin node in an existing setup
Pool life cycle management
Adding a pool to an existing setup
Removing a pool to an existing setup

D-Cache is a flexible storage vitalisation system which can aggregate large numbers of computers storage resources into a coherent whole. Unfortunately the more computers used within a D-Cache cluster the more likely it is that they will fail or need upgrading frequently.

When administering a D-Cache storage cluster replacing computers and upgrading the system is an essential part of keeping the service running. This should be done without loosing data that users have chosen to store within the D-Cache service. This chapter is about maintaining the service through upgrades and hardware failures.

Upgrading a node in an existing setup

D-Cache is a mature project and very rarely has changes to its configuration between releases. This helps in the upgrade process as to upgrade versions of D-Cache all that is typically required is to shutdown D-cache upgrading the RPM's, then restarting D-cache. It is not recommended that you rerun YAIM or any other configuration assistant as the configuration typically does not require upgrading. Please do check with others who may have tested the upgraded before you upgrade to a new release.

To stop D-cache run the following commands.

[root@dev01 root]# /etc/init.d/dcache-opt stop 
Shutting down dcache services: Stopping srmDomain (pid=29835) 0 1 2 3 4 5 6 7 Done
Stopping gridftpdoorDomain (pid=29681) 0 1 2 3 4 5 6 7 Done
Stopping gsidcapdoorDomain (pid=29760) 0 1 2 3 4 5 6 7 Done

[root@dev01 root]# /etc/init.d/dcache-pool stop 

Shutting down dcache pool: Stopping dev01Domain (pid=29943) 0 1 2 3 4 5 6 7 Done

[root@dev01 root]# /etc/init.d/dcache-core stop 
Shutting down dcache services: Stopping utilityDomain (pid=30662) 0 1 2 3 4 5 6 
7 Done
Stopping httpdDomain (pid=30569) 0 1 2 3 4 5 6 7 Done
Stopping pnfsDomain (pid=30482) 0 1 2 3 4 5 6 7 Done
Stopping adminDoorDomain (pid=30394) 0 1 2 3 4 5 6 7 Done
Stopping doorDomain (pid=30309) 0 1 2 3 4 5 6 7 Done
Stopping dirDomain (pid=30220) 0 1 2 3 4 5 6 7 Done
Stopping dCacheDomain (pid=30122) 0 1 2 3 4 5 6 7 Done
Stopping lmDomain (pid=30044) 0 1 2 3 4 5 6 7 Done

[root@dev01 root]# /etc/init.d/pnfs stop 
Shutting down dcache services:  Stopping Heartbeat ....  Ready
 Killing pnfsd . Done
 Killing pmountd  Done
 Killing dbserver . Done
 Removing 8 Clients  0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
 Removing 8 Servers  0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+
 Removing main switchboard ... O.K.

Typical D-Cache installs will include the following RPM's.

pnfs-3.1.10-15
d-cache-lcg-5.0.0-1
d-cache-opt-1.5.3-84
d-cache-gpp-v1.2.2-1
d-cache-core-1.5.2-83
d-cache-client-1.0-100

Once the new rpms are downloaded they can be upgraded one by one using rpm. Administrators should at this stage check the new release to see if new mandatory fields have been added to the configurations system. D-Cache has a good record of informing users of their configuration upgrade path.

rpm -Uvh pnfs-3.1.10-15.i386.rpm \
	d-cache-lcg-5.0.0-1.i386.rpm \
	d-cache-opt-1.5.3-84.i386.rpm \
	d-cache-gpp-v1.2.2-1.i386.rpm \
	d-cache-core-1.5.2-83.i386.rpm \
	d-cache-client-1.0-100.i386.rpm

Due to bad previous experiences with the rpm command. I no longer use the approved upgrade option within rpm and have taken to removing rpms and then reinstalling the fresh version as on occasion rpm upgrades update the database and do not upgrade the files referred to in the data base. This experience maybe out of date but to upgrade I follow the practise as shown below for a single rpm.

[root@dev01 root]# rpm -e --nodeps pnfs
[root@dev01 root]# rpm -i ./oms/dcache_deploy/pnfs-3.1.10-15.i386.rpm

Once all the D-cache rpms have been upgraded D-Cache should be started again.

[root@dev01 root]# /etc/init.d/pnfs start
Starting dcache services:  Shmcom : Installed 8 Clients and 8 Servers
 Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... O.K.
 Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... O.K.
 Waiting for dbservers to register ... Ready
 Starting Mountd : pmountd 
 Starting nfsd : pnfsd 

[root@dev01 root]#  /etc/init.d/dcache-core start
Starting dcache services: Starting lmDomain  6 5 4 3 2 1 0 Done (pid=12383)
Starting dCacheDomain  6 5 4 3 2 1 0 Done (pid=12455)
Starting dirDomain  6 5 4 3 2 1 0 Done (pid=12539)
Starting doorDomain  6 5 4 3 2 1 0 Done (pid=12620)
Starting adminDoorDomain  6 5 4 3 2 1 0 Done (pid=12705)
Starting pnfsDomain  6 5 4 3 2 1 0 Done (pid=12793)
Starting httpdDomain  6 5 4 3 2 1 0 Done (pid=12880)
Starting utilityDomain  6 5 4 3 2 1 0 Done (pid=12973)

[root@dev01 root]# /etc/init.d/dcache-pool start

Starting dcache pool: Starting dev01Domain  6 5 4 3 2 1 0 Done (pid=13095)

[root@dev01 root]# /etc/init.d/dcache-opt start
Starting dcache services: Starting gridftpdoorDomain  
6 5 4 3 2 1 0 Done (pid=13182)
Starting gsidcapdoorDomain  6 5 4 3 2 1 0 Done (pid=13267)
Starting srmDomain  6 5 4 3 2 1 0 Done (pid=13350)