Showing posts with label HMC. Show all posts
Showing posts with label HMC. Show all posts

Saturday, 11 October 2014

Run VIO commands from the HMC using "viosvrcmd" without VIOs Passwords

Recently  we got a situation where  in we don't know the passwords of  either padmin/root of VIOS  but need to run commands in VIOs.

Found an interesting command  in HMC  called "viosvrcmd",which will enble us to run commands on VIOs through HMC.
viosvrcmd -m managed-system {-p partition-name | --id partition-ID} -c "command" [--help]
Description: viosvrcmd issues an I/O server command line interface (ioscli) command to a virtual I/O server partition.

The ioscli commands are passed from the Hardware Management Console (HMC) to the virtual I/O server partition over an RMC session.

RMC does not allow interactive execution of ioscli commands.
-m    VIOs managed system name

-p    VIOs hostname

--id  The partion ID of the VIOs

Note:You must either use this option to specify the ID of the partition, or use the  -p option to specify the partition's name. The --id and the -p options are mutually exclusive.

-c    The I/O server command line interface (ioscli) command to issue to the virtual I/O      server partition.

Note: Command must be enclosed in double quotes. Also, command cannot contain the      semicolon (;), greater than (>), or vertical bar (|) characters.

--help  Display the help text for this command and exit.
Here is an example:
hscroot@umhmc:~> viosvrcmd -m umfrm570 -p umvio1 -c "ioslevel"
2.2.0.0
Since  we can't give the ; or > or |  in the command , if you need to process the output using filters , you can use that after "".
hscroot@umhmc:~> viosvrcmd -m umfrm570 -p umvio1 -c "lsdev -virtual" | grep vfchost0
vfchost0         Available   Virtual FC Server Adapter

What if  you want to run  command as root (oem_setup_env) ,  

got a method from internet
hscroot@umhmc:~> viosvrcmd -m umfrm570 -p umvio1 -c "oem_setup_env
> whoami"
root

You can  run in one shot like below

hscroot@umhmc:~> viosvrcmd -m umfrm570 -p umvio1 -c "oem_setup_env\n whoami"
root
If you need to run multiple commands , you can use them by assiging the commands to a variable and call the variable in place of the command parameter.
hscroot@umhmc:~>command=`printf  "oem_setup_env\nchsec -f /etc/security/lastlog -a unsuccessful_login_count=0 -s padmin"`

hscroot@umhmc:~>viosvrcmd -m umfrm570 -p umvio1 -c "$command"

Friday, 1 November 2013

Dealing With Managed System Passwords

Dealing With Managed System Passwords:

There are two sets of the passwords when you are dealing with the power machines.These are very important when you are adding  managed systems to the HMC ( Hardware Management Console).

And those passwords are 
  •  Server (HMC Access) Password
  •  Advanced System Management (ASM) Password


1) Advanced System Management (ASM) Password:

Advanced System Management Interface (ASMI) is a graphical interface that is part of the service processor firmware. The ASMI manages and communicates with the service processor. The ASMI is required to set up the service processor and to perform service tasks, such as reading service processor error logs, reading vital product data, and controlling the system power. There will be seperate password for  ASM to access it.

The ASMI might also be referred to as the service processor menus

2) HMC Access password:

The HMC Access password is a managed system password used to authenticate the HMC to the server. It is one of three managed system passwords set when the system is first installed. The three user IDs whose passwords must be set are admin, general, and HMC (also referred to as HMC Access). 

There is no default password for the HMC ID. The HMC access password is initially clear and the user is prompted to set a password the first time a HMC connects. After setting or entering the HMC access password once, the HMC caches the password locally. It is important to record the password for future use because the password will need to be re-entered if a new or scratch-installed HMC is used.


If the HMC password has not been set, the HMC connection will go to the state Pending Authentication - Password Update Required. The user is prompted to set an initial password for each of the three accounts in the Update Password Pending Authentication window.

If the window does not appear, ensure the user is logged into the HMC as hscroot: Pre-version 7 HMC, right click on the server entry on the console. The window for setting passwords opens. Set the passwords as directed. On Version 7 or later, select the server. Under tasks, select the Update password task then enter the passwords.

Setting Passwords for ASM & HMC Access

When you are trying to add managed system,If you received the message Authentication Pending, the HMC prompts you to set the passwords for the managed system.

If you did not receive the message Authentication Pending, complete the following steps to set the passwords for the managed system.

Updating your server(HMC Access) password

To update your server password, do the following:
  1. In the navigation area, select the managed system.
  2. In the Tasks area, click Operations.
  3. Click Change Password. The Update Password window opens.
  4. Type the required information and click OK.

Updating your Advanced System Management (ASM) general password

Note: The default password for the general user ID is general, and the default password for the administrator ID is admin.
To update your ASM general password, do the following:
  1. In the navigation area of the HMC, select the managed system.
  2. In the Tasks area, click Operations.
  3. Click Advanced System Management (ASM). The Launch ASM Interface window opens.
  4. Select a Service Processor IP Address and click OK. The ASM interface opens.
  5. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
  6. In the navigation area, expand Login Profile.
  7. Select Change Password.
  8. Specify the required information, and click Continue.

Resetting Passwords:

1) ASM Administrator Access Password Reset:

To reset the administrator password, contact an authorized service provider.

2) HMC Access Password Reset:

If the HMC Access password has already been set and an incorrect password is entered, the state is Failed Authentication - Incorrect password. The HMC will prompt the user to enter the HMC access password. After several invalid password attempts, the connection will go into the Failed Authentication - Firmware Password Locked state (0803-0001-00000203). The user must wait five minutes or reset the password in ASMI before any further attempts can be made.

To reset a lost HMC Access password, you should use one of the following methods:

oIf the admin password is known, use ASMI to reset the HMC password.  Refer this link .
Note: The default password for user admin is admin.
oIf the admin password is lost, contact an authorized service provider for information on how to reset both passwords. There are two methods available for the service provider:
- Provide a temporary login for user celogin
- Reset passwords and network settings using the FSP dip switches.

For further information, you should refer to Set Passwords for the Managed System , Reset the Advanced System Management (ASM) administrator password section  refer this link Database 'DCF Technotes (IBM i)', View 'Products', Document 'Changing an ASM Password'.

Resetting HMC Access Password  ( if ASMI  admin password known):

From the Advanced System Management (ASM) login screen, type the user ID and password, and click the Log in button or press the Enter key. 

Note: Ensure the desired system is selected.


Expand Login Profile, select Change Password, and then select the User ID you want to change.


Note: The Current password field indicates the user signed into ASM (not necessarily the User ID being changed).




Note:If the admin password is lost, contact your service provider for a temporary celoginprofile that can be used to reset admin.


Wednesday, 2 October 2013

Hardware Management Console (HMC ) Explained


HMC (Hardware Management Console) is a technology created by IBM  Vendor to provide a standard utility (interface) for configuring and operating logical partitions (also known as an LPAR or virtualized systems) and managing the  SMP (Symmetric multiprocessing)  systems such as IBM System i/z/p and IBM Power Systems.

Basically HMC is customized Linux blended with Java and many other graphical components. As per wiki  "The HMC is a Linux kernel using Busybox to provide the base utilities and X Window using the Fluxbox window manager to provide graphical logins. The HMC also utilizes Java applications to provide additional functionality."

As AIX admins like me very much  fond of  HMC uses in day-today operations. HMC supports the system with features that enable a system administrator to manage configuration and operation of partitions in a system, as well as to monitor the system for hardware problems. It consists of a 32-bit Intel-based desktop PC with a DVD-RAM drive.

Connection of HMC with different managed systems is shown in below diagram.

What does the HMC do?

  •     Creates and maintains a multiple-partitioned environment
  •     Displays a virtual operating system session terminal for each partition
  •     Displays virtual operator panel values for each partition
  •     Detects, reports, and stores changes in hardware conditions
  •     Powers managed systems on and off
  •     Powers Logical partitions on and off 
  •     Booting systems in Maintenance mode and  doing dump reboots
  •     Acts as a service focal point for service representatives to determine an appropriate service                                                   strategy  and enable the Service Agent Call-Home capability
  •     Activates additional resources on demand ( we call it as CoD, capacity on demand)
  •     Perform DLAR Operations. 
  •     Perform  Firmware up-gradations on managed systems
  •     Remote  management of managed systems

HMC  Facts:

  • Single HMC can manage multiple physical frames frames ( managed systems)
  • You can't open more than one virtual console for a given lpar at a given  time.
  • If your HMC is down , nothing will happen to your managed systems and their lpars they will operate as usual but only thing we can't manage them if something happens
  • There wont be direct root login . By default we get hscroot. ( need to engage  IBM support to get the root password)

HMC Operating Modes:

You can operate  HMC in two modes.

  1. Command Line Interface ( CLI )
  2. Graphical Interface
Each methods has its own merits ,  graphical interface can you clear view , how you can operate the managed systems even with  minimal knowledge on HMC

Where as using CLI you can run the information very fastly using commands  and scripts.

Below figure show how graphical interface.






















 

HMC Version Evolution:

  • HMC V7, for POWER5, POWER6 and POWER7 models
    • HMC V7R7.2.0 (Initial support for Power 710, Power 720, Power 730, Power 740 and Power 795 models)
    • HMC V7R7.1.0 (Initial support for POWER7)
    • HMC V7R3.5.0 (released Oct. 30, 2009)
    • HMC V7R3.4.0
    • HMC V7R3.3.0
    • HMC V7R3.2.0
    • HMC V7R3.1.0 (Initial support for POWER6 models)
  • HMC V6
    • HMC V6R1.3
    • HMC V6R1.2
  • 5.2.1
  • 5.1.0
  • 4.5.0
  • 4.4.0
  • 4.3.1
  • 4.2.1
  • 4.2.0, for POWER5 models
  • 4.1.x
  • 3.x, for POWER4 models

 RMC (Resource Monitoring and Control) & Association with HMC:

RMC is a distributed framework and architecture that allows the HMC to communicate with a managed logical partition. for example "IBM.DMSRM" is deamon which needs to run on the lapr inorder do DLAPR operation on through HMC on the lpar.

Both daemons in LPARs and HMCs use external network  to communicate among themselves but not through server processor  means both have access to same external  network in order to work with RMC related commands.

In order for RMC to work, port 657 upd/tcp must be open in both directions between the HMC public interface and the lpar.

The RMC daemons are part of the Reliable, Scalable Cluster Technology (RSCT) and are controlled by the System Resource Controller (SRC). These daemons run in all LPARs and communicate with equivalent RMC daemons running on the HMC. The daemons start automatically when the operating system starts and synchronize with the HMC RMC daemons.

Note: Apart from rebooting, there is no way to stop and start the RMC daemons on the HMC!

Things to check at the HMC:

- checking the status of the managed nodes: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc  (you must be root on the HMC)

- checking connection between HMC and LPAR:
hscroot@umhmc1:~> lspartition -dlpar
<#0> Partition:<2 data-blogger-escaped-10.10.50.18="" data-blogger-escaped-aix10.domain.com="">
       Active:<1>, OS:, DCaps:<0x4f9f>, CmdCaps:<0x1b data-blogger-escaped-0x1b="">, PinnedMem:<1452>
<#1> Partition:<4 data-blogger-escaped-10.10.50.71="" data-blogger-escaped-aix20.domain.com="">
       Active:<0>, OS:, DCaps:<0x0>, CmdCaps:<0x1b data-blogger-escaped-0x1b="">, PinnedMem:<656>

For correct DLPAR function:
- the partition must return with the correct IP of the lpar.
- the active value (Active:...) must be higher than zero,
- the decaps value (DCaps:...) must be higher 0x0

(The first line shows a DLPAR capable LPAR, the second line is anon-working LPAR)

----------------------------------------

Things to check at the LPAR:

- checking the status of the managed nodes: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc

- Checking RMC status:

# lssrc -a | grep rsct
 ctrmc            rsct             8847376      active   <== it is a RMC subsystem
 IBM.DRM          rsct_rm          6684802      active   <== it is for executing the DLPAR command on the partition
 IBM.DMSRM        rsct_rm          7929940      active    <== it is for tracking statuses of partitions
 IBM.ServiceRM    rsct_rm          10223780     active
 IBM.CSMAgentRM   rsct_rm          4915254      active   <== it is for  handshaking between the partition and HMC
 ctcas            rsct                          inoperative    <== it is for security verification
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative
 IBM.LPRM         rsct_rm                       inoperative
 IBM.HostRM       rsct_rm                       inoperative    <==it is for obtaining OS information

You will see some active and some missing (The key for DLPAR is the IBM.DRM)
- Stopping and starting RMC without erasing configuration:

# /usr/sbin/rsct/bin/rmcctrl -z   <== it stops the daemons
# /usr/sbin/rsct/bin/rmcctrl -A   <== adds entry to /etc/inittab and it starts the daemons
# /usr/sbin/rsct/bin/rmcctrl -p   <== enables the daemons for remote client connections

(This is the correct method to stop and start RMC without erasing the configuration.)
Do not use stopsrc and startsrc for these daemons; use the rmcctrl commands instead!

- recfgct: deletes the RMC database, does a discovery, and recreates the RMC configuration

# /usr/sbin/rsct/install/bin/recfgct (Wait several minutes)
# lssrc -a | grep rsct

(If you see IBM.DRM active, then you have probably resolved the issue)

Getting  Information of LPARS & HMC either way:

Make a Note: In-order to work with these commands you should have rsct daemons running on the servers means make sure RMC communication between the HMC and LPAR is happening.

1) Getting HMC IP  information from LPAR:

If you get the information of which HMC/HMCs your lpar associate  managed system ( frame) connected by using "lsrsrc" command .

Command: finding the HMC IP address
 (lsrsrc IBM.ManagementServer (or lsrsrc IBM.MCP on AIX 7))
$ lsrsrc IBM.ManagementServer

Resource Persistent Attributes for IBM.ManagementServer
resource 1:
Name             = "192.168.1.2″
Hostname         = "192.168.1.2″
ManagerType      = "HMC"
LocalHostname    = "ldap1-en1″
ClusterTM        = "9078-160″
ClusterSNum      = ""
ActivePeerDomain = ""
NodeNameList     = {"lpar1"}
So in this case, the HMC IP address is 192.168.1.2.

2) Get Managed System & LPAR information :

Below script will give us full details about Frame & LPAR information and their allocated CPU & Memory.

Script to get Frame & LPAR information
for system in `lssyscfg -r sys -F "name,state" | sort | grep ",Operating" | sed 's/,Operating//'`; do 
  echo $system
  echo "    LPAR            CPU    VCPU   MEM    OS"
  for lpar in `lssyscfg -m $system -r lpar -F "name" | sort`; do
     default_prof=`lssyscfg -r lpar -m $system --filter "lpar_names=$lpar" -F default_profile`
     procs=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_proc_units`
     vcpu=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_procs`
     mem=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_mem`
     os=`lssyscfg -r lpar -m $system --filter "lpar_names=$lpar" -F os_version`
     printf "    %-15s %-6s %-6s %-6s %-30s\n" $lpar $procs $vcpu $mem "$os"
  done
done


Generally  people will think there is no way to run scripts in HMC,but we have a possibility for this  use "rnvi" command to make scrippt file i.e  "rnvi -f  hmcscriptfile".

To run the script, use the "source" command.   For example "source hmcscriptfile".   This will run the script in your current shell. 


Here  "hmcscriptfile" is the script name and run the script like below you will see the o/p as below.

How to run script & o/p
hscroot@umhmc1:~> source hmcscriptfile
p570_frame5_ms
    LPAR            CPU    VCPU   MEM    OS
    umlpar1         0.1    3      512    AIX 6.1 6100-07-05-1228       
    umlpar2         0.1    3      512    AIX 6.1 6100-07-05-1228       
    umlpar3         0.1    3      512    Unknown                       
    linux1          0.1    3      512    Unknown                       
    vio1            0.2    2      512    VIOS 2.2.1.4                  
    vio2            0.1    2      352    Unknown       

Friday, 5 July 2013

HMC v7r3.1 - v7r.3.2 commands return error "Connection to the Command Server failed"

Problem(Abstract)

Defect in HMC v7r3.1 - v7r3.2 dealing with HMC log rotation scripts will eventually render HMC unusable. While problem is fixed at a later release level, you still might have to manually repair some key files or reinstall the HMC.

Symptom

Access to the V7 HMC GUI is not available (initializing) and CLI functions accessed via ssh are extremely limited, Most CLI commands return error , "Connection to the Command Server failed."

Cause

This is a known problem on that can occur if you were at HMC v7r3.1 or v7r3.2 when the file /var/hsc/log/hmclogger.log grows beyond 10MB.

Environment

HMC v7r3.1 or v7r3.2

Diagnosing the problem

Determine if the hmclogger.log file size exceeds 10MB by running command
ls -la /var/hsc/log/hmclogger.log

Determine if the cimserver.log file exists by running command
ls -la /var/hsc/log/cimserver.log

Most likely there will not be a cimserver.log file and hmclogger.log file will exceed 10MB in size.

Resolving the problem

Either reinstall the HMC and update to v7r3.3 or higher or contact IBM support to obtain a pesh password so you can manually repair key files.
Obtain a pesh password from IBM support. You must know the serial number of the HMC. Unfortunately, the command that gives you the serial#, lshmc -v, will not work as it is one of the symptoms. Hopefully you have an accurate list of serial numbers, otherwise on-site viewing of the HMC for the correct serial number will be needed so that support can give a pesh password that works. The other commands needed to become root usually work without problems.
Create an hscpe account if it does not exist (use lshmcusr to list all HMC user accounts)
$ mkhmcusr -u hscpe -a hmcpe -d "HMC PE user"
enter a password of seven characters or more.
If the hscpe user ID exist already, and you do not know the password, you may change the password as hscroot
$ chhmcusr -u hscpe -t passwd
Also, the password for the root user needs to be known as well. By default, that password is "passw0rd", but you can also change root's password as hscroot.
$ chhmcusr -u root -t passwd
Once you know the passwords for hscpe and root, and you have obtained a pesh password from IBM support, the following commands should resolve the problem.
- SSH to the HMC and login as hscpe and run following command
$ pesh
You will be prompted to enter a password after you enter the command above. Note that the serial number is seven characters long with alphabetical characters in upper case. The pesh password provided by IBM support will be eight characters long with alphabetical characters in lower case.
Once you enter the pesh password and return to the prompt, you will have access to run command "su -" so you can become root. After entering roots password perform the following.
# cat /dev/null > /var/hsc/log/hmclogger.log
Now the hmc needs to be rebooted and the normal hmcshutdown command will not work. Use reboot command as root to restart the HMC.
# reboot
This should reboot the hmc and once its back up you should have access to both the GUI web interface as well as the CLI commands when you ssh to the HMC.
This problem is resolved at 7.3.3, but if you had upgraded and the hmclogger.log file was already past the 10MB limit you may still experience this problem and have to perform this manual repair operation.

Thursday, 4 July 2013

List WWPN of all LPARs or a specific LPAR using HMC command

Question:How to WWPN of all LPARs or a specific LPAR using HMC command?

Solution:

To List WWPN of all LPARs on a managed system:

 MS=um-BOX-9117-MMB-SN129SBCP

#lshwres -r virtualio --rsubtype fc -m $MS --level lpar -F lpar_name,slot_num,wwpns --header |grep -v null
lpar_name,slot_num,wwpns
umaix101,5,"c05076036cfc002e,c05076036cfc002f"
umaix101,4,"c05078044abc002c,c05078044abc002d"
umaix101,3,"c05078044abc002a,c05078044abc002b"
umaix101,2,"c05078044abc0028,c05078044abc0029"
umaix102,5,"c05078044abc0538,c05078044abc0539"
umaix102,4,"c05078044abc0536,c05078044abc0537"
umaix102,3,"c05078044abc0534,c05078044abc0535"
umaix102,2,"c05078044abc0532,c05078044abc0533"
umaix333,5,"c05078044abc00c6,c05078044abc00c7"
umaix333,4,"c05078044abc00c4,c05078044abc00c5"
umaix333,3,"c05078044abc00c2,c05078044abc00c3"
umaix333,2,"c05078044abc00c0,c05078044abc00c1"

Please be noted slot_number is useful to identify the FC adapter on AIX.

For example:

On server umaix333

#lsum -Ccadapter|grep fcs
fcs0 Available C2-T1 Virtual Fibre Channel Client Adapter
fcs1 Available C3-T1 Virtual Fibre Channel Client Adapter
fcs2 Available C4-T1 Virtual Fibre Channel Client Adapter
fcs3 Available C5-T1 Virtual Fibre Channel Client Adapter
The location of fcs0 is C2, which means its slot_number is 2.

To List WWPN of a specific LPAR:

MS=um-BOX-9117-MMB-SN129SBCP
LPAR=umaix333

#lshwres -r virtualio --rsubtype fc -m $MS --filter lpar_names=$LPAR --level lpar -F lpar_name,slot_num,wwpns --header

lpar_name,slot_num,wwpns
umaix333,5,”c05078044abc00c6,c05078044abc00c7?
umaix333,4,”c05078044abc00c4,c05078044abc00c5?
umaix333,3,”c05078044abc00c2,c05078044abc00c3?
umaix333,2,”c05078044abc00c0,c05078044abc00c1?


How to correct a connection problem between HMC and Managed Systems

No Connection, Incomplete, Recovery, Error, or Failed Authentication state, follow the procedures below.

Debug and Fix RMC Connection Errors

Question

How does one debug and correct an issue when they get a "No RMC Connection" error message when using the HMC?

Cause

The no RMC connection error message can occur on the HMC when attempting to dynamically configure AIX or VIOS , when attempting LPAR mobility, or when configuring virtual resources? RMC is an encrypted communication channel between the HMC and a LPAR that uses port 657 and both TCP and UDP protocols . Changing IP addresses, cloning AIX LPARs, or a host of other administrative tasks can cause RMC to breakdown.

Answer

There are some basic commands that can be run to check status of RMC configurations and there are some dependancies on RSCT versions as to which commands you use. RSCT 3.1.x.x levels are the newest and included in AIX 6.1 TL6 or higher and RSCT 2.x.x.x are included in AIX 6.1 TL5 or lower. Following queries provide a quick method to assess RMC health.

- As root on AIX or VIOS LPAR
-- IF AIX 6.1 TL5 or lower
lslpp -l csm.client ---> This fileset needs to be installed
-- IF AIX 6.1 TL6 or higher
lslpp -l rsct.core.rmc ---> This fileset needs to be 3.1.0.x level or higher
-- For all AIX versions
/usr/sbin/rsct/bin/ctsvhbac ---> Are all IP and host IDs trusted?
-- For AIX 6.1 TL5 or lower
lsrsrc IBM.ManagementServer ---> Is HMC listed as a resource?
-- For AIX 6.1 TL6 or higher
lsrsrc IBM.MCP ---> Is the HMC listed as a resource?
- On HMC (as hscroot)
lspartition -dlpar ---> Is LPAR's DCaps value non-zero ?

If you answer no to any of the above then corrective action is required.

- Fix It Commands (run as root on LPAR, HMC, or both)

Caution: Running the commands listed below on AIX LPARs is only safe if the node is only a member of the HMC's RMC domain. These commands should not be used in an active CAA clustered environment. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Diagnosing problems with the Resource Monitoring and Control (RMC) subsystem."

http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.rsct312.trouble%2Fbl507_diagrmc.htm

Pay particular attempt to the section titled Diagnostic procedures to help learn if you node is a member of any domain other than the HMC management domain.

odmdelete -o CuAt -q "name='cluster0'" (Only run this on AIX or VIOS)
/usr/sbin/rsct/install/bin/recfgct
/usr/sbin/rsct/bin/rmcctrl -p

You would need a pesh password for your HMC if you need to run the above fix commands on the HMC.
You can try the following command first as hscroot:

lspartition -dlparreset

If that does not help you will need to request pesh passwords from IBM Support for your HMC so you can run the recfgct and rmcctrl commands listed above.

After running the above commands it will take several minutes before RMC connection is restored. The best way to monitor is by running the lspartition -dlpar command on the HMC every few minutes and watch for the target LPAR to show up with a non-zero DCaps value.

- Things to consider before using the above fix commands or if the reconfigure commands don't help.

If you are still confused about whether or not your LPAR is a member of a CAA cluster then some application names might help (PowerHA 7, HPC applications such as GPFS, ViSDs, CSM, etc). Most administrations should have a good idea how their server is configured and what is running on them so the decision to proceed can be easy. The diagnostic checks covered in the RSCT document should help with the decision if you are unsure.

Network issue are often overlooked or disregarded. There are some network configuration issues and perhaps even some APAR issues that might need to be addressed if the commands that reconfigure RSCT don't restore DLPAR functions and those issues will require additional debug steps not covered in this tech note. However, there are some common network issues that can prevent RMC communications from passing between the HMC and the LPARs and they include the following.

- Firewalls blocking bidirectional RMC related traffic for UDP and TCP on port 657.
- Mix of jumbo frames and standard Ethernet frames between the HMC and LPARs.
- Multiple interfaces with IP addresses on the LPARs that can route traffic to the HMC. 

Configure Jumbo Frames on the Hardware Management Console (HMC)

Tech Question

Can jumbo frames (MTU 9000) be configured on a HMC?

Cause

RMC communications between the HMC and LPARs using jumbo frames is failing.

Answer

The 7042 series of HMCs do come with Gigabit Ethernet adapters that do support jumbo frames.  You will need and HMC code level of v7r3.5 or later to get support for jumbo frames.  

If AIX LPARs are using jumbo frames on their network interface that communicate via RMC with the HMC then jumbo frames would need to be configured on the HMC for applications such as dynamic logical partitioning (DLPAR) and partition mobility (LPM) to work properly.  The command to change the HMC's public network interface is chhmc.  


Following is an example.

chhmc -c network -s modify -i eth1 --jumboframe on
Note: example above is for configuring eth1.  Use the appropriate adapter interface when configuring your HMC.

"netstat -in" is used to verify if MTU size changed from 1500 to 9000.


The network (routers and switches) between the HMC and the LPAR will also need to be configured to support Jumbo frames if RMC is to work properly.


Additional information for debugging RMC communications between and HMC and an AIX LPAR can be found with this link

Manually Mounting a USB Drive on the HMC

Technote (FAQ)

What is the device name used when manually mounting USB device on the HMC.

Cause

HMC commands that support various backup or recovery operations to a USB drive will take care of mounting the device for you. There are times when debug operations are necessary and a USB device might be used to copy or retrieve files from. Knowing what the USB device name is in order to manually mount it will be key.

Answer

In order to manually mount a USB drive on an HMC you will need to become root by means of the pesh application. Instructions for using pesh can be found in another document referenced at the end of this technote. Once you have become root then you insert the USB drive into the HMC and manually run the mount command.

Check what the mount point is for a USB device
# lsmediadev -> most times the usb flash mount point is /media/sdd1

Inserted the USB flash drive.
# mount /media/sdd1

Once the USB flash drive is mounted then you can copy files from the device to the HMC or to the device as required for debug purposes. 

Moving Power Systems from an old HMC to a new HMC

Technote (FAQ)

Are there any steps that need to be performed on an old HMC before moving a server to a new HMC?

Answer

This technote is written for customers that use DHCP on their HMC's private network to connect to the Power servers. Before you physically remove your older Power5, Power6 or Power7 server connections from their existing HMC and move them to the new HMC you will need to first perform a logically remove so they will be ready to connect to a new HMC and get a DHCP lease.

If you just move the physical connections from one HMC to another without removing the logical connection first then the servers might not connect to the new HMC or take a long time doing so.

Performing a Logical Remove

To remove the connection you used the Connections task menu and the sub-menu item is

"Reset or remove connection". 

Take the remove option to remove the server connection from their existing HMC.
Navigation through the GUI would be through following task option menus
Systems Management
-> Servers
--> Select server
---> Expand Connections task menu
----> Click on Reset or remove connection
-----> Use the remove option and press OK to remove the system

The command line option to do a remove has following syntax
rmsysconn -o remove --ip <FSP IP address to remove>
You can see the IP addresses of the flexible service processor (FSP) for a managed system by running
lssysconn -r all
If you use the rmsysconn command instead of the HMC GUI then make sure you remove all IP addresses associated with the server.

Perform the physical connections to new HMC

After you logically remove a system from an existing HMC, physically move the cable over to the new HMC's private switch.

Perform command to prepare HMC for new connections
On the new HMC you might need to run following command.
mksysconn -o auto
It will take several minutes for the server to show up on the new HMC, and when it does it will be as an IP address in state Failed Authentication. You can use the update password task to authenticate the HMC with the server using the appropriate HMC access password for the server.

Remote Upgrade to HMC v7r7.5

Question

What are the steps to perform a remote upgrade to HMC 775?

Answer

Upgrade Considerations

Before you upgrade your HMC you will need to consider if the HMC hardware will support the new version as well as the current managedsystems' firmware levels. You can review the hardware requirements by reading the read me for HMC V7R7.5.0 Recovery Media (Ref A). Generally the HMC needs to be one of the 7310 or 7042 dual processor models with1G to 4G of memory. It is also necessary to be aware that each HMCrelease is designed to work with specific system firmware levels which are documented in the "HMC / Firmware Supported Combinations" chart (RefB). If your managed systems are not at the levels shown as supported for the HMC version you wish to upgrade to then you will need to consider updating the managed system firmware before upgrading the HMC. Once you have determined that your HMC and managed system firmware will work with the new version then you are ready to proceed with the upgrade. 

HMC CLI Commands Used

During the remote upgrade process you will make use of following commands.
saveupgdata - to save configuration data of the HMC
getupgfiles - to retrieve network install images
chhmc - to setup alternate disk boot method
hmcshutdown - to reboot the HMC
updhmc - to apply corrective service patch

IBM's FTP Repository for HMC Images

Both the getupgfiles command and updhmc command syntax used in following  example will have an IBM FTP server. If your HMC can get to the IBM FTP server used in our example then you can enter the commands exactly as shown. If you need to use your own FTP server because your HMC is isolated from the Internet then you will want to modify the FTP server name and fix directory used in our example to something that works in your environment. The IBM FTP repository for HMC as well as other product updates is ftp.software.ibm.com and HMC has separate directories for various types of fixes as follows.

Network Install Images:
/software/server/hmc/network
HMC Updates (typically this applies to service packs)
/software/server/hmc/updates
Corrective service efixes
/software/server/hmc/fixes

Example Command Syntax Used in Remote Upgrade

The following example assumes that your HMC is at v7r3.4 or higher and  you have performed good administrative actions prior to upgrade (closed open service events, clear old archived diagnostic log files, and made any necessary backups required for your business). If you have an HMC at an earlier release (i.e. v4.2.1 through v7r3.3) then you will need to heed the directions in the Upgrading to HMC Version 7R7.5.0 from earlier HMC releases (Ref D) first.

 Prior to starting an upgrade its a good practice to perform following commands first.
chsvcevent -o closeall
chhmcfs -o f -d 0
hmcshutdown -t now -r

Example commands used to remotely upgrade to v7r7.5 with SP1.

- Save Upgrade data to HMC hard disk
saveupgdata -r disk

Note: This operation will mount a filesystem called /mnt/upgrade and save configuration data then unmount /mnt/upgrade. The Operation should  only take a few moments.

- Download the network install images to the HMC
#getupgfiles -h ftp.software.ibm.com -u anonymous --passwd ftp -d /software/server/hmc/network/v7750

Note: The getupgfiles operation will mount a filesystem called /hmcdump and copy the install files into the directory then unmount the filesystem.
- Set the HMC to boot from an alternate disk partition
chhmc -c altdiskboot -s enable --mode upgrade

- Reboot the HMC to begin the upgrade
hmcshutdown -r -t now

Note: the HMC will boot from the alternate disk partition then start processing the upgrade files and this
process is going to take some time. Most installs complete between one to two hours.

After the HMC is upgraded to v7r7.5 then you will need to install the service pack MH01317.
#updhmc -t s -h ftp.software.ibm.com -u anonymous -p ftp -f /software/server/hmc/updates/HMC_Update_V7R750_SP1.iso -r

After you complete the upgrade to 775 SP1 you will want to also check to see if any other efixes are available. The process to update with additional efixes is similar to the commands listed above, but will not be covered in this technote.

Things to consider when doing a remote network upgrade.

While the HMC CLI environment is restricted, there are some common scripting commands you could use to monitor the status of network image downloads which could be constructed as hscroot.
while true ; do
date
ls -la /hmcdump
sleep 60
done
Typically the filesystem /hmcdump remains mounted until the getupgfiles command completely exits and might be able to determine if all files and correct sizes were completely downloaded using such a monitor before you proceed with the upgrade. Ordering and staging recovery media as well as fix media on site with the HMC is a prudent action to take just in case something goes wrong during the unattended upgrade.

Post Upgrade Verification

You can use command "lshmc -V" post upgrade to verify the build level of  your HMC. 

Thursday, 23 May 2013

How to exit system console from HMC?

Its quite often we ended up in giving wrong  user name when  it promptedHMC console for a lpar.
Instead closing the console use this tip to  exit from the prompt.

Its the Key Combination of   ~. (tilt+dot)


Monday, 20 May 2013

Get the HMC IP address from LPAR

Something you may need to access the HMC of a AIX system but if you don't remember the IP address and don't have an up to date documentation, this information can lost. This page give a small tip to retrieve the HMC IP address directly from the AIX system itself.

Friday, 17 May 2013

How to reset "hscroot" password?

Instructions for hscroot password reset:

1 Power off the HMC.

2 Power on the HMC, and as soon as the Loading grub message is displayed

quickly press the F1 key to get into grub.

The Grub menu will show one line with the text hmc.

3 On the Grub menu, select e for edit. The next GRUB screen is displayed with two lines:

root (hd0,0)
kernel (hd0,1)/boot/bzImage ro root=/dev/hda2 vga=0x317 apm=power-off

Note: The root device can vary by model: hda2 C03, C04, CR2, and hdc2 for CR3.

4 Move the cursor down to the line starting with kernel. Select e for edit.

Move the cursor to the right and append the following to the end of the string:

V5.1.0 to V6.1.1: init=/bin/bash
V6.1.2 and later: init=/bin/rcpwsh

The final string will vary slightly by version and model:

kernel (hd0,1)/boot/bzImage ro root=/dev/hda2 vga=0x317 apm=power-off init=/bin/rcpwsh

Press the Enter key to save the changes.

5 Press b to boot the changed selection.

This will boot to a bash shell: (none):/#.

6 Verify root is mounted read/write. Type the following command:

$ mount -o remount,rw /dev/hda2 /

Note: The root device can vary by model: hda2 C03, C04; hdc2 for CR2,CR3; sda2 for CR4.

7 Reset root and hscroot passwords.

Run the following commands to reset the passwords. The command will prompt the user to enter the new password and a confirmation password. Any warning concerning the password being too simplistic can be ignored.

Reset root:
/usr/bin/passwd
Reset hscroot:
$/usr/bin/passwd hscroot

8 Reboot the HMC (left ctl+left alt+del).

9 Log on as hscroot.

10 Immediately after logon, use the Web-based System Manager (HMC GUI) or the chhmcusr.

HMC useful key combinations
CTRL-ALT-F1: Switch to Linux command line; no login possible. If you then click on CTRL-ALT-DEL the system will reboot.

CTRL-ALT-F2: Takes you back to the Xserver window.

CTRL-ALT-BACKSPACE: Kills of the Xserver and will start a new -fresh- one, so you can login again.

Tuesday, 23 April 2013

Tips for implementing NPIV on IBM Power Systems

Tips for implementing NPIV on IBM Power Systems

Virtual Fibre Channel with Virtual I/O and AIX 6.1v
Good article of Chris about NPIV and here Its as is.

Overview

In this article, I will share with you my experience in implementing NPIV on IBM Power Systems with AIX and the Virtual I/O Server (VIOS). There are several publications that already discuss the steps on how to configure NPIV using a VIOS, and I have provided links to some of these in the Resources section. Therefore, I will not step through the process of creating virtual Fibre Channel (FC) adapters or preparing your environment so that it is NPIV and virtual FC ready. I assume you already know about this and will ensure you have everything you need. Rather, I will impart information that I found interesting and perhaps undocumented during my own real-life experience of deploying this technology. Ultimately this system was to provide an infrastructure platform to host SAP applications running against a DB2 database.

NPIV (N_Port ID Virtualization) is an industry standard that allows a single physical Fibre Channel port to be shared among multiple systems. Using this technology you can connect multiple systems (in my case AIX LPARs) to one physical port of a physical fibre channel adapter. Each system (LPAR) has its own unique worldwide port name (WWPN) associated with its own virtual FC adapter. This means you can connect each LPAR to physical storage on a SAN natively.

This is advantageous for several reasons. First, you can save money. Having the ability to share a single fibre channel adapter among multiple LPARs could save you the cost of purchasing more adapters than you really need.

Another reason to use NPIV is the reduction in VIOS administration overhead. Unlike virtual SCSI (VSCSI), there is no need to assign the SAN disks to the VIOS first and then map them to the Virtual I/O client (VIOC) LPARs. Instead, the storage is zoned directly to the WWPNs of the virtual FC adapters on the clients. It also eliminates the need to keep your documentation up to date every time you map a new disk to an LPAR/VIOS or un-map a disk on the VIO server.

I/O performance is another reason you may choose NPIV over VSCSI. With NPIV all paths to a disk can be active with MPIO, thus increasing the overall bandwidth and availability to your SAN storage. The I/O load can be load-balanced across more than one VIO server at a time. There is no longer any need to modify a clients VSCSI hdisk path priority to send I/O to an alternate VIO server, as all I/O can be served by all the VIO servers if you wish.

One more reason is the use of disk "copy service" functions. Most modern storage devices provide customers with the capability to "flash copy" or "snap shot" their SAN LUNs for all sorts of purposes, like cloning of systems, taking backups, and so on. It can be a challenge to implement these types of functions when using VSCSI. It is possible, but automation of the processes can be tricky. Some products provide tools that can be run from the host level rather than on the storage subsystem. For this to work effectively, the client LPARs often need to "see" the disk as a native device. For example, it may be necessary for an AIX system to detect that its disk is a native NetApp disk for the NetApp "snapshot" tools to work. If it cannot find a native NetApp device, and instead finds only a VSCSI disk, and it is unable to communicate with the NetApp system directly, then the tool may fail to function or be supported.

The biggest disadvantage (that I can see) to using NPIV is the fact that you must install any necessary MPIO device drivers and/or host attachment kits on any and all of the client LPARs. This means that if you have 100 AIX LPARs that all use NPIV and connect to IBM DS8300 disk, you must install and maintain SDDPCM on all 100 LPARs. In contrast, when you implement VSCSI, the VIOS is the only place that you must install and maintain SDDPCM. And there's bound to be fewer VIOS than there are clients! There are commonly only two to four VIO servers on a given Power system.

Generally speaking, I'd recommend NPIV at most large enterprise sites since it is far more flexible, manageable, and scalable. However, there's still a place for VSCSI, even in the larger sites. In some cases, it may be better to use VSCSI for the rootvg disk(s) and use NPIV for all non-rootvg (data) volume groups. For example, if you boot from SAN using NPIV (rootvg resides on SAN disk) and you had to install MPIO device drivers to support the storage. It can often be difficult to update MPIO software when it is still in use, which in the case of SAN boot is all the time. There are procedures and methods to work around this, but if you can avoid it, then you should consider it!

For example, if you were a customer that had a large number of AIX LPARs that were all going to boot from HDS SAN storage, then I'd suggest that you use VSCSI for the rootvg disks. This means that HDLM (Hitachi Dynamic Link Manager, HDS MPIO) software would need to be installed on the VIOS, the HDS LUNs for rootvg would be assigned to and mapped from the VIOS. All other LUNS for data (for databases or application files/code) would reside on storage presented via NPIV and virtual FC adapters. HDLM would also be installed on the LPARs but only for non-rootvg disks. Implementing it this way means that when it comes time to update the HDLM software on the AIX LPARs, you would not need to worry about moving rootvg to non-HDS storage so that you can update the software. Food for thought!

Environment

The environment I will describe for my NPIV implementation consists of a POWER7 750 and IBM XIV storage. The client LPARs are all running AIX 6.1 TL6 SP3. The VIO servers are running version 2.2.0.10 Fix Pack 24 Service Pack 1 (2.2.0.10-FP-24-SP-01). The 750 is configured with six 8GB fibre channel adapters (feature code 5735). Each 8GB FC adapter has 2 ports. The VIO servers were assigned 3 FC adapters each. The first two adapters in each VIOS would be used for disk and the last FC adapter in each VIOS would be for tape connectivity.

NPIV and virtual FC for disk

I made the conscious decision during the planning stage to provide each production LPAR with four virtual FC adapters. The first two virtual FC adapters would be mapped to the first two physical FC ports on the first VIOS and the last two virtual FC adapters would be mapped to first two physical FC ports on the second VIOS. As shown in the following diagram below.

Figure 1: Virtual FC connectivity to SAN and Storage

Illustration of Virtual Fibre Channel connectivity to SAN and Storage
Figure 1
I also decided to isolate other disk traffic (for example, non-critical production traffic) over different physical FC adapters/ports. In the previous diagram, the blue lines/LUNs indicate production traffic. This traffic is mapped from the virtual adapters, fcs0 and fcs1 in an LPAR, to the physical ports on the first FC adapters in vio1: fcs0 and fcs1. The virtual FC adapters, fcs2 and fcs3 in an LPAR, map to the physical ports on the first FC adapter in vio2: fcs0 and fcs1.

The red lines indicate all non-critical disk traffic. For example, the NIM and Tivoli Storage Manager LPARs use different FC adapters in each VIOS than the production LPARs. The virtual FC adapters, fcs0 and fcs1, map to the physical ports on the second FC adapter, fcs2 and fcs3 in vio1. The virtual FC adapters, fcs2 and fcs3, map to the physical ports on the second FC adapter, fcs2 and fcs3 in vio2.

An example of the vfcmap commands that we used to create this mapping on the VIO servers are shown here:
For production systems (e.g. LPAR4):
  1. Map LPAR4 vfchost0 adapter to physical FC adapter fcs0 on vio1.
     $ vfcmap –vadpater vfchost0 –fcp fcs0
  2. Map LPAR4 vfchost1 adapter to physical FC adapter fcs1 on vio1.
     $ vfcmap –vadapter vfchost1 – fcp fcs1
  3. Map LPAR4 vfchost0 adapter to physical FC adapter fcs0 on vio2.
     $ vfcmap –vadapter vfchost0 – fcp fcs0
  4. Map LPAR4 vfchost1 adapter to physical FC adapter fcs1 on vio2.
     $ vfcmap –vadapter vfchost1 –fcp fcs1
For non-critical systems (e.g. NIM1):
  1. Map NIM1 vfchost3 adapter to physical FC adapter fcs2 on vio1.
     $ vfcmap –vadapter vfchost3 –fcp fcs2
  2. Map NIM1 vfchost4 adapter to physical FC adapter fcs3 on vio1.
     $ vfcmap –vadapter vfchost4 – fcp fcs3
  3. Map NIM1 vfchost3 adapter to physical FC adapter fcs2 on vio2.
     $ vfcmap –vadapter vfchost3 – fcp fcs2
  4. Map NIM1 vfchost4 adapter to physical FC adapter fcs3 on vio2.
     $ vfcmap –vadapter vfchost4 –fcp fcs3
I used the lsmap –all –npiv command on each of the VIO servers to confirm that the mapping of the vfchost adapters, to the physical FC ports, was correct (as shown below).
vio1 (production LPAR):
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost0      U8233.E8B.XXXXXXX-V1-C66                4 LPAR4          AIX
 Status:LOGGED_IN
 FC name:fcs0                    FC loc code:U78A0.001.XXXXXXX-P1-C3-T1
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs0            VFC client DRC:U8233.E8B.XXXXXXX-V6-C30-T1
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost1      U8233.E8B.XXXXXXX-V1-C67                4 LPAR4          AIX
 Status:LOGGED_IN
 FC name:fcs1                    FC loc code:U78A0.001.XXXXXXX-P1-C3-T2
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs1            VFC client DRC:U8233.E8B.XXXXXXX-V6-C31-T1
vio1 (non-production LPAR):
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost3      U8233.E8B.XXXXXXX-V1-C30                3 nim1           AIX
 Status:LOGGED_IN
 FC name:fcs2                    FC loc code:U5877.001.XXXXXXX-P1-C1-T1
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs0            VFC client DRC:U8233.E8B.XXXXXXX-V3-C30-T1
 
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost4      U8233.E8B.XXXXXXX-V1-C31                3 nim1           AIX
 Status:LOGGED_IN
 FC name:fcs3                    FC loc code:U5877.001.XXXXXXX-P1-C1-T2
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs1            VFC client DRC:U8233.E8B.XXXXXXX-V3-C31-T1
vio2 (production LPAR):
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost0      U8233.E8B.XXXXXXX-V2-C66                4 LPAR4          AIX
 Status:LOGGED_IN
 FC name:fcs0                    FC loc code:U5877.001.XXXXXXX-P1-C3-T1
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs2            VFC client DRC:U8233.E8B.XXXXXXX-V6-C32-T1
 
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost1      U8233.E8B.XXXXXXX-V2-C67                4 LPAR4          AIX
 Status:LOGGED_IN
 FC name:fcs1                    FC loc code:U5877.001.XXXXXXX-P1-C3-T2
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs3            VFC client DRC:U8233.E8B.XXXXXXX-V6-C33-T1
vio2 (non-production LPAR):
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost3      U8233.E8B.XXXXXXX-V2-C30                3 nim1           AIX
 Status:LOGGED_IN
 FC name:fcs2                    FC loc code:U5877.001.XXXXXXX-P1-C4-T1
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs2            VFC client DRC:U8233.E8B.XXXXXXX-V3-C32-T1
 
 Name          Physloc                            ClntID ClntName       ClntOS
 ------------- ---------------------------------- ------ -------------- -------
 vfchost4      U8233.E8B.XXXXXXX-V2-C31                3 nim1           AIX
 Status:LOGGED_IN
 FC name:fcs3                    FC loc code:U5877.001.XXXXXXX-P1-C4-T2
 Ports logged in:5
 Flags:a<LOGGED_IN,STRIP_MERGE>
 VFC client name:fcs3            VFC client DRC:U8233.E8B.XXXXXXX-V3-C33-T1
Fortunately, as we were using IBM XIV storage, we did not need to install additional MPIO devices drivers to support the disk. AIX supports XIV storage natively. We did, however, install some additional management utilities from the XIV host attachment package. This gave us handy tools such as xiv_devlist (output shown below).
# lsdev –Cc disk
hdisk0          Available 30-T1-01    MPIO 2810 XIV Disk
hdisk1          Available 30-T1-01    MPIO 2810 XIV Disk
hdisk2          Available 30-T1-01    MPIO 2810 XIV Disk
hdisk3          Available 30-T1-01    MPIO 2810 XIV Disk
# lslpp –l | grep xiv
xiv.hostattachment.tools   1.5.2.0  COMMITTED  Support tools for XIV connectivity
# xiv_devlist
Loading disk info...                                                                              
XIV Devices
----------------------------------------------------------------------
Device       Size     Paths  Vol Name      Vol Id   XIV Id   XIV Host 
----------------------------------------------------------------------
/dev/hdisk1  51.5GB   16/16  nim2_ rootvg  7        7803242  nim2     
----------------------------------------------------------------------
/dev/hdisk2  51.5GB   16/16  nim2_ nimvg   8        7803242  nim2     
----------------------------------------------------------------------
/dev/hdisk3  103.1GB  16/16  nim2_ imgvg   9        7803242  nim2     
----------------------------------------------------------------------
Non-XIV Devices
---------------------
Device   Size   Paths
---------------------
If you are planning on implementing XIV storage with AIX, I highly recommend that you take a close look at Anthony Vandewert's blog on this topic.
You may have noticed in the diagram that the VIO servers themselves boot from internal SAS drives in the 750. Each VIO server was configured with two SAS drives and a mirrored rootvg. They did not boot from SAN.

LPAR profiles

During the build of the LPARs we noticed that if we booted a new LPAR with all four of its virtual FC adapters in place, the fcsX adapter name and slot id were not in order (fcs0=slot32, fcs1=slot33, fcs3=slot30, fcs4=slot31). To prevent this from happening, we created two profiles for each LPAR.
The first profile (known as normal) contained the information for all four of the virtual FC adapters. The second profile (known aswwpns) contained only the first two virtual FC adapters that mapped to the first two physical FC ports on vio1. Using this profile to perform the LPARs first boot and to install AIX allowed the adapters to be discovered in the correct order (fcs0=slot30, fcs1=slot31). After AIX was installed and the LPAR booted, we would then re-activate the LPAR using the normal profile and all four virtual FC adapters.

Two LPAR profiles exist for each AIX LPAR. An example is shown below.

Figure 2: LPAR profiles for virtual FC

Example of LPAR profiles for virtual FC

The profile named normal contained all of the necessary Virtual I/O devices for an LPAR (shown below). This profile was used to activate an LPAR during standard operation.

Figure 3: Profile with all virtual FC adapters used after install

Profile with             all virtual FC adapters used after install

The profile named wwpns contained only the first two virtual FC devices for an LPAR (shown below). This profile was only used to activate an LPAR in the event that the AIX operating system needed to be reinstalled. Once the AIX installation completed successfully, the LPAR was activated again using the normal profile. This configured the remaining virtual FC adapters.

Figure 4: An LPAR with first two virtual FC adapters only

An LPAR with first two VFC adapters only

Also during the build process, we needed to collect a list of WWPNs for the new AIX LPARs we were installing from scratch. There were two ways we could find the WWPN for a virtual Fibre Channel adapter on a new LPAR (for example, one that did not yet have an operating system installed). First, we started by checking the LPAR properties from the HMC (as shown below).

Figure 5: Virtual FC adapter WWPNS

VFC adapter WWPNS

To speed things up we moved to the HMC command line tool, lssyscfg, to display the WWPNs (as shown below).

hscroot@hmc1:~> lssyscfg -r prof -m 750-1 -F virtual_fc_adapters --filter lpar_names=LPAR4
"""4/client/2/vio1/32/c0507603a2920084,c0507603a2920084/0"",
""5/client/3/vio2/32/c050760160ca0008,c050760160ca0009/0"""

We now had a list of WWPNs for each LPAR.

 # cat LPAR4_wwpns.txt
 c0507603a292007c
 c0507603a292007e
 c0507603a2920078
 c0507603a292007a

We gave these WWPNS to the SAN administrator so that he could manually "zone in" the LPARs on the SAN switches and allocate storage to each. To speed things up even more, we used sed to insert colons into the WWPNs. This allowed the SAN administrator to simply cut and paste the WWPNs without needing to insert colons manually.

 # cat LPAR4_wwpns | sed 's/../&:/g;s/:$//'
 c0:50:76:03:a2:92:00:7c
 c0:50:76:03:a2:92:00:7e
 c0:50:76:03:a2:92:00:78
 c0:50:76:03:a2:92:00:7a

An important note here, if you plan on implementing Live Partition Mobility (LPM) with NPIV enabled systems, make sure you zone both of the WWPNs for each virtual FC adapter on the client LPAR. Remember that for each client virtual FC adapter that is created, a pair of WWPNs is generated (a primary and a secondary). Please refer to Live Partition Mobility with Virtual Fibre Channel in the Resources section for more information.

Virtual FC adapters for tape

Tivoli Storage Manager was the backup software used to backup and recover the systems in this new environment. Tivoli Storage Manager would use a TS3310 tape library, as well as disk storage pools to backup client data. In this environment, we chose to use virtual FC adapters to connect the tape library to Tivoli Storage Manager. This also gave us the capability to assign the tape devices to any LPAR, without moving the physical adapters from one LPAR to another, should the need arise in the future. As I mentioned earlier, there were three 2-port 8GB FC adapters assigned to each VIOS. Two adapters were used for disk and the third would be used exclusively for tape.

The following diagram shows that physical FC ports, fcs4 and fcs5, in each VIOS would be used for tape connectivity. It also shows that each of the 4 tape drives would be zoned to a specific virtual FC adapter in the Tivoli Storage Manager LPAR.

Figure 6. Tape drive connectivity via virtual FC

Example of a tape drive connectivity via virtual FC
( Figure 6.)

The Tivoli Storage Manager LPAR was initially configured with virtual FC adapters for connectivity to XIV disk only. As shown in thelspath output below, fcs0 through fcs3 are used exclusively for access to disk only.

# lsdev -Cc adapter | grep fcs
fcs0 Available 30-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 31-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 32-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 33-T1 Virtual Fibre Channel Client Adapter
# lspath
Enabled hdisk0  fscsi0
Enabled hdisk0  fscsi0
Enabled hdisk0  fscsi0
Enabled hdisk0  fscsi0
Enabled hdisk0  fscsi1
Enabled hdisk0  fscsi1
Enabled hdisk0  fscsi1
Enabled hdisk0  fscsi1
Enabled hdisk0  fscsi2
Enabled hdisk0  fscsi2
Enabled hdisk0  fscsi2
Enabled hdisk0  fscsi2
Enabled hdisk0  fscsi3
Enabled hdisk0  fscsi3
Enabled hdisk0  fscsi3
Enabled hdisk0  fscsi3
..etc.. for the other disks on the system

To connect to the tape drives, we configured four additional virtual FC adapters for the LPAR. First, we ensured that the physical adapters were available and had fabric connectivity. On both VIOS, we used the lsnports command to determine the state of the adapters and their NPIV capability. As shown in the following output, the physical adapter's fcs4 and fcs5 were both available and NPIV ready. There was a 1 in the fabric column. If it was zero then the adapter may not be connected to an NPIV capable SAN.

$ lsnports
name  physloc                    fabric tports aports swwpns awwpns
fcs0  U78A0.001.DNWK4W9-P1-C3-T1      1     64     52   2048   1988
fcs1  U78A0.001.DNWK4W9-P1-C3-T2      1     64     52   2048   1988
fcs2  U5877.001.0084548-P1-C1-T1      1     64     61   2048   2033
fcs3  U5877.001.0084548-P1-C1-T2      1     64     61   2048   2033
fcs4 U5877.001.0084548-P1-C2-T1  1    64    64   2048  2048
fcs5 U5877.001.0084548-P1-C2-T2  1    64    64   2048  2048

When I initially checked the state of the adapters on both VIOS, I encountered the following output from lsnports:

$ lsnports
name  physloc                    fabric tports aports swwpns awwpns
fcs0  U78A0.001.DNWK4W9-P1-C3-T1      1     64     52   2048   1988
fcs1  U78A0.001.DNWK4W9-P1-C3-T2      1     64     52   2048   1988
fcs2  U5877.001.0084548-P1-C1-T1      1     64     61   2048   2033
fcs3  U5877.001.0084548-P1-C1-T2      1     64     61   2048   2033
fcs4 U5877.001.0084548-P1-C2-T1      0     64     64   2048   2048

As you can see, only the fcs4 adapter was discovered; the fabric value for fcs4 was 0 and fcs5 was missing. Both of these issues were the result of physical connectivity issues to the SAN. The cables were unplugged and/or they had a loopback adapter plugged into the interface. The error report indicated link errors on fcs4 but not for fcs5.

$ errlog
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
7BFEEA1F 0502104011 T H fcs4  LINK ERROR

Once the ports were physically connected to the SAN switches, I removed the entry for fcs4 from the ODM (as shown below) and then ran cfgmgr on the VIOS.

$oem_setup_env
# rmdev -dRl fcs4
fcnet4 deleted
sfwcomm4 deleted
fscsi4 deleted
fcs4 deleted
# cfgmgr
# exit
$

Then both fcs4 and fcs5 were discovered and configured correctly.

$ lsnports
name  physloc  fabric     tports  aports  swwpns  awwpns
fcs0  U78A0.001.DNWK4W9-P1-C3-T1      1         64      52    2048    1988
fcs1  U78A0.001.DNWK4W9-P1-C3-T2      1         64      52    2048    1988
fcs2  U5877.001.0084548-P1-C1-T1      1         64      61    2048    2033
fcs3  U5877.001.0084548-P1-C1-T2      1         64      61    2048    2033
fcs4 U5877.001.0084548-P1-C2-T1  1        64     64   2048   2048
fcs5 U5877.001.0084548-P1-C2-T2  1        64     64   2048   2048
The Tivoli Storage Manager LPARs dedicated virtual FC adapters, for tape, appeared as fcs4, fcs5, fcs6 and fcs7. The plan was for fcs4 on tsm1 to map to fcs4 on vio1, fcs5 to map to fcs5 on vio1, fcs6 to map to fcs4 on vio2, and fcs7 to map to fcs5 on vio2.
The virtual adapter slot configuration was as follows:
LPAR: tsm1  VIOS: vio1
U8233.E8B.06XXXXX-V4-C34-T1 >   U8233.E8B.06XXXXX-V1-C60  U8233.E8B.06XXXXX-V4-C35-T1 >   U8233.E8B.06XXXXX-V1-C61  LPAR: tsm1  VIOS: vio2
U8233.E8B.06XXXXX-V4-C36-T1 >   U8233.E8B.06XXXXX-V2-C60  U8233.E8B.06XXXXX-V4-C37-T1 >   U8233.E8B.06XXXXX-V2-C61  
We created two new virtual FC host (vfchost) adapters on vio1 and two new vfchost adapters on vio2. This was done by updating the profile for both VIOS (on the HMC) with the new adapters and then adding them with a DLPAR operation on each VIOS. Once we had run the cfgdev command on each VIOS to bring in the new vfchost adapters, we needed to map them to the physical FC ports.
Using the vfcmap command on each of the VIOS, we mapped the physical ports to the virtual host adapters as follows:
  1. Map tsm1 vfchost60 adapter to physical FC adapter fcs4 on vio1.
     $ vfcmap –vadapter vfchost60 –fcp fcs4
  2. Map tsm1 vfchost61 adapter to physical FC adapter fcs5 on vio1.
     $ vfcmap –vadapter vfchost61 – fcp fcs5
  3. Map tsm1 vfchost60 adapter to physical FC adapter fcs4 on vio2.
     $ vfcmap –vadapter vfchost60 – fcp fcs4
  4. Map tsm1 vfchost61 adapter to physical FC adapter fcs5 on vio2.
     $ vfcmap –vadapter vfchost61 –fcp fcs5
Next we used DLPAR (using the following procedure) to update the client LPAR with four new virtual FC adapters. Please make sure you read the procedure on adding a virtual FC adapter to client LPAR. If care is not taken, the WWPNs for a client LPAR can be lost, which can result in loss of connectivity to your SAN storage. You may also want to review the HMC's chsyscfg command, as it is possible to use this command to modify WWPNs for an LPAR.
After running the cfgmgr command on the LPAR, we confirmed we had four new virtual FC adapters. We ensured that we saved the LPARs current configuration, as outlined in the procedure.
# lsdev –Cc adapter  grep fcs
fcs0 Available 30-T1  Virtual Fibre Channel Client Adapter
fcs1 Available 31-T1  Virtual Fibre Channel Client Adapter
fcs2 Available 32-T1  Virtual Fibre Channel Client Adapter
fcs3 Available 33-T1  Virtual Fibre Channel Client Adapter
fcs4 Available 34-T1  Virtual Fibre Channel Client Adapter
fcs5 Available 35-T1  Virtual Fibre Channel Client Adapter
fcs6 Available 36-T1  Virtual Fibre Channel Client Adapter
fcs7 Available 37-T1  Virtual Fibre Channel Client Adapter
On both VIOS, we confirmed that the physical to virtual mapping on the FC adapters was correct using the lsmap –all –npivcommand. Also checking that client LPAR had successfully logged into the SAN by noting the Status: LOGGED_IN entry in thelsmap output for each adapter.
vio1:
Name Physloc ClntID ClntName  ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost60  U8233.E8B.06XXXXX-V1-C60  6 tsm1  AIX
Status:LOGGED_IN
FC name:fcs4  FC loc code:U5877.001.0084548-P1-C2-T1
Ports logged in:1
Flags:a<LOGGED_IN,STRIP_MERGE>
VFC client name:fcs4  VFC client DRC:U8233.E8B.06XXXXX-V4-C34-T1
Name Physloc ClntID ClntName  ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost61  U8233.E8B.06XXXXX-V1-C61  6 tsm1  AIX
Status:LOGGED_IN
FC name:fcs5  FC loc code:U5877.001.0084548-P1-C2-T2
Ports logged in:1
Flags:a<LOGGED_IN,STRIP_MERGE>
VFC client name:fcs5    VFC client DRC:U8233.E8B.06XXXXX-V4-C35-T1
vio2:
Name Physloc ClntID ClntName  ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost60  U8233.E8B.06XXXXX-V2-C60         6 tsm1  AIX
Status:LOGGED_IN
FC name:fcs4  FC loc code:U5877.001.0084548-P1-C5-T1
Ports logged in:1
Flags:a<LOGGED_IN,STRIP_MERGE>
VFC client name:fcs6  VFC client DRC:U8233.E8B.06XXXXX-V4-C36-T1
Name Physloc ClntID ClntName  ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost61  U8233.E8B.06XXXXX-V2-C61  6 tsm1  AIX
Status:LOGGED_IN
FC name:fcs5              FC loc code:U5877.001.0084548-P1-C5-T2
Ports logged in:1
Flags:a<LOGGED_IN,STRIP_MERGE>
VFC client name:fcs7  VFC client DRC:U8233.E8B.06XXXXX-V4-C37-T1
We were able to capture the WWPNs for the new adapters at this point. This information was required to zone the tape drives to the system.
# for i in 4 5 6 7
> do
> echo fcs$i
> lscfg -vpl fcs$i | grep Net
> echo
> done
fcs4   Network Address.............C0507603A2720087
fcs5   Network Address.............C0507603A272008B
fcs6   Network Address.............C0507603A272008C
fcs7   Network Address.............C0507603A272008D
The IBM Atape device drivers were installed prior to zoning in the TS3310 tape drives.
# lslpp -l | grep -i atape
 Atape.driver  12.2.4.0  COMMITTED  IBM AIX Enhanced Tape and
Then, once the drives had been zoned to the new WWPNs, we ran cfgmgr on the Tivoli Storage Manager LPAR to configure the tape drives.
# lsdev -Cc tape
# cfgmgr
# lsdev -Cc tape
rmt0 Available 34-T1-01-PRI IBM 3580 Ultrium Tape Drive (FCP)
rmt1 Available 34-T1-01-PRI IBM 3580 Ultrium Tape Drive (FCP)
rmt2 Available 35-T1-01-ALT IBM 3580 Ultrium Tape Drive (FCP)
rmt3 Available 35-T1-01-ALT IBM 3580 Ultrium Tape Drive (FCP)
rmt4 Available 36-T1-01-PRI IBM 3580 Ultrium Tape Drive (FCP)
rmt5 Available 36-T1-01-PRI IBM 3580 Ultrium Tape Drive (FCP)
rmt6 Available 37-T1-01-ALT IBM 3580 Ultrium Tape Drive (FCP)
rmt7 Available 37-T1-01-ALT IBM 3580 Ultrium Tape Drive (FCP)
smc0 Available 34-T1-01-PRI IBM 3576 Library Medium Changer (FCP)
smc1 Available 35-T1-01-ALT IBM 3576 Library Medium Changer (FCP)
smc2 Available 37-T1-01-ALT IBM 3576 Library Medium Changer (FCP)
Our new tape drives were now available to Tivoli Storage Manager.

Monitoring virtual FC adapters

Apparently the viostat command on the VIO server allows you to monitor I/O traffic on the vfchost adapters (as shown in the following example).
$  viostat -adapter vfchost3
System configuration: lcpu=8 drives=1 ent=0.50 paths=4 vdisks=20 tapes=0 
tty: tin tout  avg-cpu: % user % sys % idle % iowait physc %
entc
0.0 0.2 0.0 0.2 99.8 0.0  0.0
0.4 
Adapter: Kbps tps Kb_read  Kb_wrtn
fcs1 2.5 0.4 199214  249268 
Adapter: Kbps tps Kb_read  Kb_wrtn
fcs2 0.0 0.0 0  0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost4 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost6 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost5 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost0 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost3 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost2 0.0 0.0 0.0  0.0 
Vadapter: Kbps tps bkread  bkwrtn
vfchost1 0.0 0.0 0.0  0.0
I must admit I had limited success using this tool to monitor I/O on these devices. I am yet to discover why this tool did not report any statistics for any of my vfchost adapters. Perhaps it was an issue with the level of VIOS code we were running?
Fortunately, nmon captures and reports on virtual FC adapter performance statistics on the client LPAR. This is nothing new, as nmon has always captured FC adapter information, but it is good to know that nmon can record the data for both virtual and physical FC adapters.
Figure 7. nmon data for virtual FC adapter usage
Example of nmon data for virtual FC adapter usage
( Figure 7.)
The fcstat command can be used on the client LPARs to monitor performance statistics relating to buffer usage and overflows on the adapters. For example, the following output indicated that we needed to tune some of the settings on our virtual FC adapters. In particular the following attributes were modified, num_cmd_elems and max_xfer_size.
# fcstat fcs0 | grep -p DMA | grep -p 'FC SCSI'
FC SCSI Adapter Driver Information   No DMA Resource Count: 580   No Adapter Elements Count: 0   No Command Resource Count: 6093967
# fcstat fcs1 | grep -p DMA | grep -p 'FC SCSI'
FC SCSI Adapter Driver Information   No DMA Resource Count: 386   No Adapter Elements Count: 0   No Command Resource Count: 6132098
# fcstat fcs2 | grep -p DMA | grep -p 'FC SCSI'
FC SCSI Adapter Driver Information   No DMA Resource Count: 222   No Adapter Elements Count: 0   No Command Resource Count: 6336080
# fcstat fcs3 | grep -p DMA | grep -p 'FC SCSI'
FC SCSI Adapter Driver Information   No DMA Resource Count: 875   No Adapter Elements Count: 0   No Command Resource Count: 6425427
We also found buffer issues (via the fcstat command) on the physical adapters on the VIO servers. We tuned the FC adapters on the VIO servers to match the settings on the client LPARs, such as max_xfer_size=0x200000 and num_cmd_elems=2048.
The fcstat command will report a value of UNKNOWN for some attributes of a virtual FC adapter. Because it is a virtual adapter, it does not contain any information relating to the physical adapter attributes, such as firmware level information or supported port speeds.
# fcstat fcs0
FIBRE CHANNEL STATISTICS REPORT: fcs0
Device Type: FC Adapter (adapter/vdevice/IBM,vfc-client)
Serial Number: UNKNOWN
Option ROM Version: UNKNOWN
Firmware Version: UNKNOWN
World Wide Node Name: 0xC0507603A202007c
World Wide Port Name: 0xC0507603A202007e
FC-4 TYPES:
Supported: 0x0000010000000000000000000000000000000000000000000000000000000000
Active: 0x0000010000000000000000000000000000000000000000000000000000000000
Class of Service: 3
Port Speed (supported): UNKNOWN
Port Speed (running): 8 GBIT
Port FC ID: 0x5D061D

Conclusion

In all that describes my experience with NPIV, Power Systems, Virtual I/O and AIX. I hope you have enjoyed reading this article. Of course, as they say, "there's always more than one way to skin a cat"! So please feel free to contact me and share your experiences with this technology, I'd like to hear your thoughts and experiences.