Thursday, December 1, 2016

HowTo: Restore a GridScaler GPFS Client Node after Reinstalling the Node

I ran into this issue after reinstalling several compute nodes on our cluster shortly after bringing our new DDN GridScaler GPFS storage cluster online.
$ sudo mmstartup -N c0040
Fri Dec  2 03:36:03 UTC 2016: mmstartup: Starting GPFS ...
c0040:  mmremote: determineMode: Missing file /var/mmfs/gen/mmsdrfs.
c0040:  mmremote: This node does not belong to a GPFS cluster.
mmstartup: Command failed. Examine previous error messages to determine cause.

One method I discovered online was to take the affected node off of the network (or reboot it), remove it from the GPFS cluster, once it's back on the network (or fully rebooted), add it back, license it and start it.

Later I was introduced to the mmsdrrestore command (portion of the man file below:
mmsdrrestore command

Restores the latest GPFS system files on the specified nodes.

Synopsis

mmsdrrestore [-p NodeName] [-F mmsdrfsFile] [-R remoteFileCopyCommand]
             [-a | -N {Node[,Node...] | NodeFile | NodeClass}]

Availability

Available on all IBM Spectrum Scale editions.

Description

The mmsdrrestore command is intended for use by experienced
system administrators.

Use the mmsdrrestore command to restore the latest GPFS
system files on the specified nodes. If no nodes are specified,
the command restores the configuration information only on the
node on which is it run. If the local GPFS configuration file is
missing, the file that is specified with the -F option from
the node that is specified with the -p option is used
instead. This command works best when used with the
mmsdrbackup user exit. See the following IBM Spectrum
Scale: Administration and Programming Reference topic:
mmsdrbackup user exit.

...

Here's an example of using the command to restore the configuration to node c0040 using primary server gs0 (i.e. one of the NSD servers)
$ sudo mmsdrrestore -p gs0 -N c0040
Fri Dec  2 03:47:06 UTC 2016: mmsdrrestore: Processing node gs0
Fri Dec  2 03:47:08 UTC 2016: mmsdrrestore: Processing node c0040
mmsdrrestore: Command successfully completed

Finally, start GPFS on the client (which also mounts the file system(s) if configured to do so
$ sudo mmstartup -N c0040

1 comment:

Unknown said...

Alternately, if the client still has a GPFS cluster configuration but has already been removed on the server side, run 'mmdelnode -f' on the client:

mmdelnode -f
mmdelnode: All GPFS configuration files on node c0016 have been removed.