Showing posts with label AIX - Tips. Show all posts
Showing posts with label AIX - Tips. Show all posts

Friday, 21 February 2014

IBM AIX- Admin Best Practices

 IBM AIX- Admin Best Practices

When I  was started my career as AIX Admin, it was like  Greek and Latin, so scared . I am not aware where,how  to start .

In this article, I will walk you  through  best practices as  AIX admin  which  makes your life easy.

These are also  applicable to other flavors of Unix Administration  ( like Solaris,Linux & HP-UX) , the only difference some of commands differs to flavour to flavor..

Rule 1:  Learn Process

If you pass this area it would be get very much easy life going forward.

I would like emphasize  on below points especially.

  • Follow ITIL processes which are adopted by most of the companies.
  • Get to know about the SLAs ( Service Level Agreement)
  • Always try to acknowledge tickets as per SLA  and update on regular intervals
  • If its P1 login to bridge call and voice out your findings promptly and wisely
  • Always perform  approved changes within the prescribed change window
Table describes different ITIL process in short.

Process Name
Definition
Priorities
Tools
Incident/Ticket Management
An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident.
P1,P2,P3 & P4
Maximo,BMC Remedy,HP-Service Manager ,Peregrine
Change Management
A process to control and coordinate all changes to an IT production environment.
Normal,Urgent,Emergency &Expedite
Maximo,BMC Remedy,HP-Service Manager ,Peregrine
Service Request /Task Management
A monitoring and reporting the agreed Key Performance Indicators (KPI) corresponding to the compliance with customer and management.

Maximo,BMC Remedy,HP-Service Manager ,Peregrine
Problem Management
Problem Management includes the activities required to diagnose the root cause of incidents identified through the Incident Management process, and to determine the resolution to those problems

Maximo,BMC Remedy,HP-Service Manager ,Peregrine
Release management
Process of managing software releases from development stage to software release.


Rule 2: Get to know Coverage Area & Contacts

  • Make supported servers inventory: If possible make sheet in such way that it includes environment,jump server,application,datacentre location and console information.
  • Get the access for the servers
  • Collect vendor contact information with phone numbers
  • Also keep application team contact information handy

Rule 3: Day to day operations

Backups:

  • Take system backup mksysb on regularly atleast for one week and keep it in  other server preferably in NIM server
  • Verify mksysb with “lsmksysb -l -f /mksysbimg”(check size).
  • Check the /etc/exclude.rootvg to see if any important filesystem/dir is excluded from the mksysb backup.
  • Ensure  file systems (non-rootvg )backups as per backup software of your company. ( Eg: TSM or Net backup)
  • Take system snap for every week ( make cron entry) and keep log file in different server or make a copy in your desktop

System Consistency Checks:

  • Ensure the current OS level is consistent:     “oslevel -s;oslevel -r;instfix -i|grep ML;instfix -i|grep SP;lppchk -v” [If the os is inconsistent, then first bring the os level to consistent state and then proceed with the change].
  • Proactively remediate compliance issues.
  • Check your firm policies on server uptime and arrange for reboot , generally some organizations  fix it as < 90 days / < 180 days period .

Troubleshooting issues:

  1. Don't do issue fixing without a proper incident record.
  2. Engage relevant parties while working on  the issue
  3. Always try get the information about the issue from the  user ( requestor) with questions line "what, when, where"
  4. Look at errpt first
  5. Check  ‘alog -t console -o’ to see if  its boot issue
  6. Also looking log files mentioned in  "/etc/syslog.conf" , may give some more information for investigation.
  7. Check backups if your looking for configuration change issues
  8. if  your running out of time,involve your next level team and managers
  9. Take help from vendors like IBM,EMC,Symantec if necessary

P1 issues:  

if its a priority 1 (P1) issue you may need to consider few more additional points apart from  above.
  1. On sev1 issues, update the SDM (Service Delivery Manager) in the ST/Communicator  multi chat  at regular intervals.
  2. Over the conference  voice call(bridge call ), if they verbally request you to perform any change, get the confirmation in writing in the multi ST chat.
  3. Update the incident record ( IR) in regular intervals.
  4. Update your team with the issue status(via mail).
  5. Document any new leanings(from issues/changes) and share it with team.

Working on a Change:

Thumb Rule: Change should go in sequence manner DEV ==> UAT/QA==> PROD environment servers.
  1. Make sure change record is in fully approved otherwise don't start any of your task
  2. Ensure proper validated CR procedure is in place;  Precheck -> Installation -> Backout -> Post-Verification
  3. Supress alerts if needed
  4. Remember Application/Database teams are responsible for their Application/Database backup/restore and stop/start. Therefore alert the application teams .
  5. Check the history of the servers(CRs or IRs )…to see if there were any issues or change failures for these servers.
  6. EXPECT THE UNEXPECTED : Ensure you have the proper back out plan in place.
  7. Ensure you are on right server('uname -n'/'hostname') before you perform change.
  8. Make sure your id as well as root id is not expired and working.
  9. Ensure  no other from your team are working on the same task to avoid one change being performed by multiple SAs. Its better to verify with the ‘who -u’ command, to see if there are any SAs already working on the server.
  10. Remember one change at onetime; multiple changes could cause problems & can complicate troubleshooting.
  11. Ensure there are no other conflicting changes from other departments such as SAN, network, firewall, application.. which could dampen your change.
  12. Maintain/record the commands run/console output in the notepad(named after the change).

if its configuration change:

  • Take backup of pre and post values and document them
  • Take screenshots if you comfortable in taking
  • If your are updating configuration of  a file take                                                                                                                              #cp -p <filename> filename_`date +"%m_%d_%Y_%H:%M`"

if its a change to reboot or update s/w :

  1. Check if the server is running any cluster (HACMP/PowerHA), if so then you have to follow different procedure.
  2. Always remember three essential things are in place before you perform any change “backup(mksysb); system information; console”
  3. Take system configuration information (sysinfo script).
  4. Check the lv/filesystems consistency “df -k”(df should not hang); all lvs should be in sync state “lsvg -o|lsvg -il”.
  5. Check errpt & ‘alog -t console -o’ to see if there are any errors.
  6. Ensure latest  mksysb(OS image backup) kept in relevant NIM server
  7. Ensure  non-rootvg file systems backup taken
  8. Verify boot list & boot device:   “bootlist -m normal -o”  “ipl_varyon -i”
  9. Login to HMC console
Additional points for reboot:
  1. Put the servers in maintenance  mode (stop alerts) to avoid unnecessary incident alerts.
  2. Check filesystems count “df -g|wc -l”  ; verify the count after migration or reboot.
  3. Ensure there are no schedule reboots in crontab. If there is any then comment it before you proceed with the change.
  4. If the system has not rebooted from long-time(> 100 days); then perform ‘bosboot’ & then reboot the machine(verify the fs/appfs after reboot), & then commence with the migration/upgrade. [Don't reboot the machine if the bosboot fails!]
  5. Look for the log messages carefully; don't ignore warnings.
Additional points for OS & S/W upgrades:
  1. Ensure hd5(bootlv) is 32MB (contiguous PPs)  [very important for migration]
  2. For OS updates Initiate the change on console. If there is any network disconnection during the change, you can reconnect to the console and get the menus back.
  3. If situation demands ,ensure there is enough free filesystem space(/usr, /var, / ), required for the change.
  4. Have the patches/filesets pre-downloaded and verified.
  5. Check/verify the repositories on NIM/central server; check if these repositories were tested/used earlier.
  6. If there are two disks in rootvg, then perform alt disk clone for one disk. This is fastest & safest back-out method in case of any failure. Though you perform alt disk clone, ensure you as well take mksysb.
  7. For migration change, check if there is SAN(IBM/EMC..) used, if so, then you have to follow the procedure of exporting vgs, uninstall sdd* fileset;and after migration reinstall sdd* fileset, reimport vgs etc.
  8. Perform preview(TL/SP upgrade) before you perform actual change; see if there are any errors reported in preview(look for keyword ‘error’ / ‘fail’); look for the tail/summary of messages;
  9. Though the preview may report as ‘ok’ at the header, still you have to look in the messages and read the tail/summary of preview.
  10. If preview reports any dependency/requisite filesets  missing then have those downloaded as well.
  11. Ensure you have enough free space in rootvg. Min of 1-2 GB to be free in rootvg(TL upgrade/OS migration).
  12. Ensure application team have tested their application on the new TL/SP/OS to which you are upgrading your system.
  13. If you have multiple putty sessions opened; then name the sessions accordingly [Putty -> under behaviour -> window title]; this will help you in quickly getting to the right session or else use PuttyCM ( Putty Connection Manager)
  14. Ensure for TL upgrades, you go by TL by TL, shortcut to direct TL could sometimes cause problem.
What if you are crossing change widow ?
  • inform the relevant application teams and SDMs  and take extended with proper approvals
  • Raise a incident record in supporting the issue.
What if change fails ?
  • Inform the relevant application teams and SDMs
  • Close the record with the facts
  • Attend the change review calls for the failed changes
Successful Change:
  • if possible send the success status to relevant parties  with artifacts
  • Update the change request with relevant artifacts and close it

Last but not the least:

  • Don't hesitate to take your team mates help or vendor support  when your  issue taking more time
  • Inform your managers if the issue in escalation situation ( if its P1 you need to inform prior).
  • Always perform change with proper approvals in place
  • Take backups  and  make your life easy
Happy Unixing Thumbs up

Saturday, 28 December 2013

Practical Guide to AIX "Volume Group Management"

Folks I am going to discuss about practical examples and real time usefull commands about AIX Volume Group Management.


Contents:


1)Volume Group Creation:

mkvg -y <vg> -s <PP size> <pv>  (normal volume group)
mkvg -y datavg -s 4 hdisk1

Use below options to creat Big & Scalable volume groups.

-B Creates a Big-type volume group
-S Creates a Scalable-type volume group.

Note: the PP size will be the size of the physical partition size you want  1, 2, (4), 8, 16, 32, 64, 128, 256, 512, 1024MB

2) List/Display Volume Group:

lsvg
lsvg <vg> (detailed)
lsvg -l <vg> (list all logical volumes in goup)
lsvg -p <vg> (list all physical volumes in group)
lsvg -o (lists all varied on)
lsvg -M <vg> (lists all PV, LV, PP deatils of a vg (PVname:PPnum LVname: LPnum :Copynum))
lsvg -o | lsvg -ip        lists pvs of online vgs
lsvg -o | lsvg -il        lists lvs of online vgs
lsvg -n <hdisk>           shows vg infos, but it is read from the VGDA on the specified disk (it is useful to compare it with different disks)

## Details volume group info for the hard disk
lqueryvg -Atp <pv>
lqueryvg -p <disk> -v (Determine the VG ID# on disk)
lqueryvg -p <disk> -L (Show all the LV ID#/names in the VG on disk)
lqueryvg -p <disk> -P (Show all the PV ID# that reside in the VG on disk)

3)Extending Volume Group:

#extendvg <vg> <pv>
#extendvg myvg hdisk5

4)Reducing Volume Group:

#reducevg -d <vg> <pv>
## removes the PVID from the VGDA when a disk has vanished without using the reducevg command
#reducevg <vg> <PVID>

5) Mirror Volume Group:

We can do mirroring in AIX, using mirrorvg command and we can create max of three copy of mirror.

If we have two PV’s in rootvg, now we want mirror, Data and OS installed in hdisk0 and now we want to mirror hdisk0 to hdisk1. Then your command will be
# mirrorvg –S –m rootvg hdisk1

S – Backgroup mirror
-m - exact (force) mirror
NOTE: in mirrored VG quorum should be off line because quorum is not recommended for mirror.

6)Un-Mirror Volume Group: 

Using Unmirror command we can Unmirror the VG
#unmirrorvg rootvg hdisk1
PV hdisk1 is removed from rootvg mirror.

7)Synchronize Volume Group:

Using Syncvg command we can sync the mirrored Vg and LV copy information’s

If we want to sync lvcopy
#syncvg –l lvname

#syncvg –l testlv
After executing the above command, testlv copy get sync with lv copied PV

If we want to sync mirrored PV’s
#syncvg –v rootvg
The above sync the mirrored PV’s in rootvg

8) Un-Lock Volume Group:

# chvg -u <vgname>          unlocks the volume group (if a command core dumping, or the system crashed and vg is left in locked state)
(Many LVM commands place a lock into the ODM to prevent other commands working on the same time.)

9)Re-Organise Volume Group:

# reorgvg   <vgname>
rearranges physical partitions within the vg to conform with the placement policy (outer edge...) for the lv.
(For this 1 free pp is needed, and the relocatable flag for lvs must be set to 'y': chlv -r...)

10) VarryOn Volume Group:

This is just for VG activation; some times clients want to deactivate VG for project restriction. After that we want to activate the VG for further data access

Suppose we want to activate testvg, then you should follow like below
#lsvg
rootvg
datavg
testvg
The above command shows what are VG’s available
#lsvg –o
rootvg
datavg
The above commands shows only online(active)  VG’s because testvg is offline so we have to activate testvg using "varyonvg". This makes us enable to mount the filesystems which were created on top of the testvg.

#varryonvg testvg

#lsvg –o
rootvg
datavg
testvg
Now above command is display the testvg.

11)Varryoff Volume Group:

This is just for VG deactivation; some clients want to deactivate VG for project Restriction. Suppose customer want deactivate testvg then your command will be
#lsvg –o
rootvg
datavg
testvg

#varryoff testvg

#lsvg –o
rootvg
datavg
The above command displays only two online VG’s and it will not show testvg because testvg is offline VG.

12) Rename Volume Group:

#varyoffvg <old vg name>
#lsvg -p <old vg name> (obtain disk names)
#exportvg <old vg name>
#import -y <new vg name> <pv>
#varyonvg <new vg name>
#mount -a

13) Exporting Volume Group:

Using exportvg command we can export VG (including all the PV’s) from one server to another server.

If you have ServerA, in this server has datavg with two PV’s. Now we want export datavg to ServerB

Before exporting the datavg, we should Varryoff the datavg, i.e. datavg is moved to offline.
#varryoff datavg (Varryoff the datavg)
#exportvg datavg (VG information removed from ODM
Now datavg is exported from the ServerA, after this run the following command to verify the export.
#lsvg
It won’t show datavg name. Because datavg is exported.

Then you should remove PV from the configuration
#rmdev –dl hdisk3
#rmdev –dl hdisk4
After that we can remove the PV’s from ServerA for import datavg to ServerB.

14)Importing Volume Group:

Using importvg command we can import the datavg to ServerB

First you should connect hdisk3, hdisk4, in ServerB then, run the
#cfgmgr (for hard disk detection)
Then check the PV’s installed or not using lspv command
#lspv (it will display the installed PV’s) if hdisk3, hdisk4 is available then PV’s are configured properly.
Then run the command importvg for import the datavg
#importvg –y datavg hdisk3 (VG information is added in ODM)
#importvg –y datavg hdisk4 (VG information is added in ODM)
NOTE:If ServerB has VG with same name datavg, This case we can rename the importing VG datavg to other name,
#importvg –y newdatavg hdisk3
#importvg –y newdatavg hdisk4
Like this we can import.

After importing the datavg, we no need to Varryon datavg, automatically it will Varryon while importing.

15)Removing Volume Group:

#varyoffvg <vg>
#exportvg <vg>
Note: the export command nukes everything regarding the volume goup in the ODM and /etc/filesystems

16) Check Volume Group Type:

Run the lsvg command on the volume group and look at the value for MAX PVs. The value is 32 for normal, 128 for big, and 1024 for scalable volume group.
VG type     Maximum PVs    Maximum LVs    Maximum PPs per VG    Maximum PP size
Normal VG     32              256            32,512 (1016 * 32)      1 GB
Big VG        128             512            130,048 (1016 * 128)    1 GB
Scalable VG   1024            4096           2,097,152               128 GB
If a physical volume is part of a volume group, it contains 2 additional reserved areas. One area contains both the VGSA and the VGDA, and this area is started from the first 128 reserved sectors (blocks) on the disk. The other area is at the end of the disk, and is reserved as a relocation pool for bad blocks.

17)Changing Normal VG to Big VG:

If you reached the MAX PV limit of a Normal VG and playing with the factor (chvg -t) is not possible anymore you can convert it to Big VG.

It is an online activity, but there must be free PPs on each physical volume, because VGDA will be expanded on all disks:
root@um-lpar: / # lsvg -p bbvg
bbvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            511         2           02..00..00..00..00
hdisk3            active            511         23          00..00..00..00..23
hdisk4            active            1023        0           00..00..00..00..00

root@um-lpar: / # chvg -B bbvg
0516-1214 chvg: Not enough free physical partitions exist on hdisk4 for the
        expansion of the volume group descriptor area.  Migrate/reorganize to free up
        2 partitions and run chvg again.

In this case we have to migrate 2 PPs from hdisk4 to hdsik3 (so 2 PPs will be freed up on hdisk4):

root@um-lpar: / # lspv -M hdisk4
hdisk4:1        bblv:920
hdisk4:2        bblv:921
hdisk4:3        bblv:922
hdisk4:4        bblv:923
hdisk4:5        bblv:924
...

root@um-lpar: / # lspv -M hdisk3
hdisk3:484      bblv:3040
hdisk3:485      bblv:3041
hdisk3:486      bblv:3042
hdisk3:487      bblv:1
hdisk3:488      bblv:2
hdisk3:489-511

root@um-lpar: / # migratelp bblv/920 hdisk3/489
root@um-lpar: / # migratelp bblv/921 hdisk3/490

root@um-lpar: / # lsvg -p bbvg
bbvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            511         2           02..00..00..00..00
hdisk3            active            511         21          00..00..00..00..21
hdisk4            active            1023        2           02..00..00..00..00

If we try again changing to Big VG, now it is successful:
root@um-lpar: / # chvg -B bbvg
0516-1216 chvg: Physical partitions are being migrated for volume group
        descriptor area expansion.  Please wait.
0516-1164 chvg: Volume group bbvg2 changed.  With given characteristics bbvg2
        can include up to 128 physical volumes with 1016 physical partitions each.

If you check again, freed up PPs has been used:
root@um-lpar: / # lsvg -p bbvg
bbvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk2            active            509         0           00..00..00..00..00
hdisk3            active            509         17          00..00..00..00..17
hdisk4            active            1021        0           00..00..00..00..00

18)Changing Normal (or Big) VG to Scalable VG:

If you reached the MAX PV limit of a Normal or a Big VG and playing with the factor (chvg -t) is not possible anymore you can convert that VG to Scalable VG. A Scalable VG allows a maximum of 1024 PVs and 4096 LVs and a very big advantage that the maximum number of PPs applies to the entire VG and is no longer defined on a per disk basis.

!!!Converting to Scalable VG is an offline activity (varyoffvg), and there must be free PPs on each physical volume, because VGDA will be expanded on all disks.
root@um-lpar: / # chvg -G bbvg
0516-1707 chvg: The volume group must be varied off during conversion to
        scalable volume group format.

root@um-lpar: / # varyoffvg bbvg
root@um-lpar: / # chvg -G bbvg
0516-1214 chvg: Not enough free physical partitions exist on hdisk2 for the
        expansion of the volume group descriptor area.  Migrate/reorganize to free up
        18 partitions and run chvg again.


After migrating some lps to free up required PPs (in this case it was 18), then changing to Scalable VG is successful:
root@um-lpar: / # chvg -G bbvg
0516-1224 chvg: WARNING, once this operation is completed, volume group bbvg
        cannot be imported into AIX 5.2 or lower versions. Continue (y/n) ?
...
0516-1712 chvg: Volume group bbvg changed.  bbvg can include up to 1024 physical volumes with 2097152 total physical partitions in the volume group.

19) Check VGDA (Volume Group Descriptor Area):

It is an area on the hard disk (PV) that contains information about the entire volume group. There is at least one VGDA per physical volume, one or two copies per disk. It contains physical volume list (PVIDs), logical volume list (LVIDs), physical partition map (maps lps to pps)
# lqueryvg -tAp hdisk0                                <--look into the VGDA (-A:all info, -t: tagged, without it only numbers)
Max LVs:        256
PP Size:        27                                    <--exponent of 2:2 to 7=128MB
Free PPs:       698
LV count:       11
PV count:       2
Total VGDAs:    3
Conc Allowed:   0
MAX PPs per PV  2032
MAX PVs:        16
Quorum (disk):  0
Quorum (dd):    0
Auto Varyon ?:  1
Conc Autovaryo  0
Varied on Conc  0
Logical:        00cebffe00004c000000010363f50ac5.1   hd5 1       <--1: count of mirror copies (00cebff...c5 is the VGID)
                00cebffe00004c000000010363f50ac5.2   hd6 1
                00cebffe00004c000000010363f50ac5.3   hd8 1
                ...
Physical:       00cebffe63f500ee                2   0            <--2:VGDA count 0:code for its state (active, missing, removed)
                00cebffe63f50314                1   0            (The sum of VGDA count should be the same as the Total VGDAs)
Total PPs:      1092
LTG size:       128
...
Max PPs:        32512

20)Mirroring rootvg (after disk replacement):

1. disk replaced -> cfgmgr           <--it will find the new disk (i.e. hdisk1)
2. extendvg rootvg hdisk1            <--sometimes extendvg -f rootvg...
(3. chvg -Qn rootvg)                 <--only if quorum setting has not yet been disabled, because this needs a restart
4. mirrorvg -s rootvg                <--add mirror for rootvg (-s: synchronization will not be done)
5. syncvg -v rootvg                  <--synchronize the new copy (lsvg rootvg | grep STALE)
6. bosboot -a                        <--we changed the system so create boot image (-a: create complete boot image and device)
                                     (hd5 is mirrorred, no need to do it for each disk. ie. bosboot -ad hdisk0)
7. bootlist -m normal hdisk0 hdisk1  <--set normal bootlist
8. bootlist -m service hdisk0 hdisk1 <--set bootlist when we want to boot into service mode
(9. shutdown -Fr)                    <--this is needed if quorum has been disabled
10.bootinfo -b                       <--shows the disk  which was used for boot

21)Miscellaneous VG Commands:

getlvodm -j <hdisk>       get the vgid for the hdisk from the odm
getlvodm -t <vgid>        get the vg name for the vgid from the odm
getlvodm -v <vgname>      get the vgid for the vg name from the odm
getlvodm -p <hdisk>       get the pvid for the hdisk from the odm
getlvodm -g <pvid>        get the hdisk for the pvid from the odm
lqueryvg -tcAp <hdisk>    get all the vgid and pvid information for the vg from the vgda (directly from the disk)
                          (you can compare the disk with odm: getlvodm <-> lqueryvg)
synclvodm <vgname>        synchronizes or rebuilds the lvcb, the device configuration database, and the vgdas on the physical volumes
redefinevg                it helps regain the basic ODM informations if those are corrupted (redefinevg -d hdisk0 rootvg)
readvgda hdisk40          shows details from the disk

Monday, 26 August 2013

uuencode Command

uuencode Command

Purpose

Encodes a binary file for transmission using electronic mail.

Syntax

uuencode [ -m ] [ SourceFile ] OutputFile

Description

The uuencode command converts a binary file to ASCII data. This is useful before using BNU (or uucp) mail to send the file to a remote system. The uudecode command converts ASCII data created by the uuencode command back into its original binary form.
The uuencode command takes the named SourceFile (default standard input) and produces an encoded version on the standard output. The encoding uses only printable ASCII characters, and includes the mode of the file and the OutputFile filename used for recreation of the binary image on the remote system.
Use the uudecode command to decode the file.

Flags

ItemDescription
-mEncode the output using the MIME Base64 algorithm. If -m is not specified, the old uuencode algorithm will be used.

Parameters

ItemDescription
OutputFileSpecifies the name of the decoded file. You can direct the output of the uuencode command to standard output by specifying /dev/stdout as the OutputFile.
SourceFileSpecifies the name of the binary file to convert. Default is standard input.

Examples

  1. To encode the file unix1 on the local system and mail it to the user jsmith on another system called mysys, enter:
    uuencode unix1 unix1 | mail [email protected]
  2. To encode the file /usr/lib/boot/unix2 on your local system with the name pigmy.goat in the file /tmp/con , enter:
    uuencode /usr/lib/boot/unix2 pigmy.goat > /tmp/con

Files

ItemDescription
/usr/bin/uuencodeContains the uuencode command.

Thursday, 8 August 2013

Rename Disks-IBM AIX OS

Renaming Disks AIX OS

Scenario 1:

Sometimes disks will drop into the server in an unsatisfactory manner.  That is to say, the naming of the disks will not be ideal.  Let's look at neatening that up.

A reliable way to avoid this is to shutdown VIOS1 and physically remove the disks that you want assigned to it.  That way, there's no way VIOS2 can see them when you bring it up.

You're working on a shiny new 9117-MMC in a dual VIOS (we'll call them VIOS1 and VIOS2) configuration.  You've installed VIOS1 from DVD and are now installing VIOS2.  You have already assigned the DVD drive to the VIOS2 but, in doing so, have revealed the disks from VIOS1 to VIOS2.  When the install is complete and you remove the adapter(s) which enabled VIOS2 to see the disks and DVD drive in VIOS1, you will be left with a configuration similar to below:

# lsdev -Cc disk
hdisk0 Defined 00-08-00 SAS Disk Drive
hdisk1 Defined 00-08-00 SAS Disk Drive
hdisk2 Available 00-08-00 SAS Disk Drive
hdisk3 Available 00-08-00 SAS Disk Drive
hdisk4 Available 00-08-00 SAS Disk Drive
hdisk5 Available 00-08-00 SAS Disk Drive

hdisk0 and hdisk1 are clearly the remnants of the disks from VIOS1 detected during the install of VIOS2.  We'll want to remove them so we're left with:
# lsdev -Cc disk
hdisk2 Available 00-08-00 SAS Disk Drive
hdisk3 Available 00-08-00 SAS Disk Drive
hdisk4 Available 00-08-00 SAS Disk Drive
hdisk5 Available 00-08-00 SAS Disk Drive
And then we can work on renumbering the disks.

Scenario 2:

In particular in large clustered environments where it is sometimes very important to have the same disk and network device names in sync across all nodes in a cluster. And besides, it’s a lot easier  to verify a cluster configuration if the hdisk names are all the same. Matching PVIDs works but it requires a lot more effort! For example, knowing that hdisk123 is the same device on all nodes makes life easier than scanning lspvoutput for a PVID like 00f6048868b4gead.  Of course you can script things to make this easier but it would be great if you didn’t need to do this and that there was a way to rename devices as needed, without resorting to unsupported methods.

There are two different ways for this depending upon the operating system version.

Prior AIX 7.1v Procedure

Let's remove hdisk0 and hdisk1:
# rmdev -l hdisk0 -dR
# rmdev -l hdisk1 -dR

We would now be left with:
# lspv
hdisk2          00c5538409a99b66                   rootvg          active
hdisk3
hdisk4
hdisk5

In order to put these names straight, we need to remove these disks also.  It's worth nothing here that inevitably one of these four disks will be your root volume.  You can't remove or rename that one just yet.
# rmdev -l hdisk3 -dR
# rmdev -l hdisk4 -dR
# rmdev -l hdisk5 -dR

So now:
# lspv
hdisk2          00c5538409a99b66                   rootvg          active

Run cfgmgr:
# cfgmgr
# lspv
hdisk0          00c55384341c6e62                    None      
hdisk1          00cd55a4ae6b676f                    None      
hdisk2          00c5538409a99b66                    rootvg         active
hdisk3          00c553844356f733                    None      

Now you can mirror hdisk2 to hdisk0:
# extendvg rootvg hdisk0
# exit
$ mirrorios -defer hdisk0

When that completes:
# lspv
hdisk0          00c55384341c6e62                    rootvg         active
hdisk1          00cd55a4ae6b676f                    None    
hdisk2          00c5538409a99b66                    rootvg         active
hdisk3          00c553844356f733                    None

Further Juggling and Dump Movement

Sometimes you may want hdisk0 and hdisk1 to be in rootvg.  Here's how to accomplish that.
$ unmirrorios hdisk2

# lspv
hdisk0          00c55384341c6e62                    rootvg         active
hdisk1          00cd55a4ae6b676f                    None    
hdisk2          00c5538409a99b66                    rootvg    
hdisk3          00c553844356f733                    None

hdisk2 is now not mirrored so we can remove it from rootvg:
# reducevg rootvg hdisk2
<error>

This will fail because the sysdumpdev is still set to a volume on hdisk2.  We need to remove this and set it back up later on.

Check the size of the current dump space:
# lsvg -l rootvg | grep sysdump
lg_dumplv           sysdump    4       4       1    open/syncd    N/A

# sysdumpdev -e
0453-041 Estimated dump size in bytes: 233413017

Check the location of the dump device:
# sysdumpdev -l
primary              /dev/lg_dumplv
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         traditional

This shows us that there is a primary dump device called /dev/lg_dumplv and no secondary dump device.  Since that is the case, we will also add a second boot device in case the primary is not available.

Let's clear the dump configuration:
# sysdumpdev -Pp /dev/sysdumpnull
# sysdumpdev -Ps /dev/sysdumpnull

Check that:
# sysdumpdev -l
primary              /dev/sysdumpnull
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         traditional

Now we can remove hdisk2 from rootvg.  This may warn you that there is a volume on this device.  The volume will most likely be the dump volume.  If that's the case, you can carry on and remove it.  If it is any other volume then stop and find out what it is.

# reducevg rootvg hdisk2

Create two new volumes for sysdumps:
# mklv -t sysdump -y sysdump1 rootvg 4 hdisk0
# mklv -t sysdump -y sysdump2 rootvg 4 hdisk1

Configure the new dump devices:
# sysdump -Pp /dev/sysdump1
# sysdump -Ps /dev/sysdump2

Check that:
# sysdumpdev -l
primary              /dev/sysdump1
secondary            /dev/sysdump2
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         traditional

Now we want to add hdisk1 into rootvg and mirror:
# extendvg rootvg hdisk1
$ mirrorios -defer hdisk1

Set boot list:
$ bootlist -mode normal hdisk0 hdisk1

Since we have been deferring the restart, do it now:
$ shutdown -restart

Hopefully everything should come up fine and you should have this:
# lspv
hdisk0          00c55384341c6e62                    rootvg         active
hdisk1          00cd55a4ae6b676f                    rootvg         active    
hdisk2          00c5538409a99b66                    None    
hdisk3          00c553844356f733                    None

2. Renaming devices in AIX 7.1

Well, this is no longer an issue for AIX.

Starting with AIX 7.1, you can now easily rename devices. A new command called rendev was introduced to allow AIX administrators to rename devices as required.

From the man page:

The rendev command enables devices to be renamed. The device to be renamed, is specified with the -l flag, and the new desired name is specified with the -n flag.

The new desired name must not exceed 15 characters in length. If the name has already been used or is present in the /dev directory, the operation fails. If the name formed by appending  the new name after the character r is already used as a device name, or appears in the /dev directory, the operation fails.

 If the device is in the Available state, the rendev command must unconfigure the device before renaming it. This is similar to the operation performed by the rmdev -l Name command. If the unconfigure operation fails, the renaming will also fail. If the unconfigure succeeds, the rendev command will configure the device, after renaming it, to restore it to the Available state. The -u flag may be used to prevent the device from being configured again after it is renamed.

 Some devices may have special requirements on their names in order for other devices or applications to use them. Using the rendev command to rename such a device may result in the device being unusable. Note: To protect the configuration database, the rendev command cannot be interrupted once it has started. Trying to stop this command before completion, could result in a corrupted database.

Here are some examples of using the rendev command on AIX 7.1 (GA) system. In the first example I will rename hdisk3 to hdisk300. Note: hdisk3 is not in use (busy).

If the disk had been allocated to a volume group, I would have needed to varyoff the volume group first.

# lspv
hdisk0          00f61ab2f73e46e2                    rootvg          active
hdisk1          00f61ab20bf28ac6                    None
hdisk2          00f61ab2202f7c0b                    None
hdisk4          00f61ab20b97190d                    None
hdisk3          00f61ab2202f93ab                    None

# rendev -l hdisk3 -n hdisk300

# lspv
hdisk0          00f61ab2f73e46e2                    rootvg          active
hdisk1          00f61ab20bf28ac6                    None
hdisk2          00f61ab2202f7c0b                    None
hdisk4          00f61ab20b97190d                    None
hdisk300        00f61ab2202f93ab                    None

Easy!
Next, I’ll rename a virtual SCSI adapter. I renamed vscsi0 to vscsi2.
Note: I placed the adapter, vscsi0, in a defined state before renaming the device.
# rmdev -Rl vscsi0

# lsdev -Cc adapter
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
ent1   Available  Virtual I/O Ethernet Adapter (l-lan)
vsa0   Available  LPAR Virtual Serial Adapter
vscsi0 Defined    Virtual SCSI Client Adapter
vscsi1 Available  Virtual SCSI Client Adapter

# rendev -l vscsi0 -n vscsi2

# lsdev -Cc adapter
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
ent1   Available  Virtual I/O Ethernet Adapter (l-lan)
vsa0   Available  LPAR Virtual Serial Adapter
vscsi1 Available  Virtual SCSI Client Adapter
vscsi2 Defined    Virtual SCSI Client Adapter

Now I’ll rename a network adapter from ent0 to ent10. I bring down the interface before changing the device name
# lsdev -Cc adapter
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
ent1   Available  Virtual I/O Ethernet Adapter (l-lan)
vsa0   Available  LPAR Virtual Serial Adapter
vscsi1 Available  Virtual SCSI Client Adapter
vscsi2 Defined    Virtual SCSI Client Adapter

# ifconfig en0
en0: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
        inet 10.1.20.19 netmask 0xffff0000 broadcast 10.153.255.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

# ifconfig en0 down detach

# rendev -l ent0 -n ent10

# lsdev -Cc adapter
ent1   Available  Virtual I/O Ethernet Adapter (l-lan)
ent10  Available  Virtual I/O Ethernet Adapter (l-lan)
vsa0   Available  LPAR Virtual Serial Adapter
vscsi1 Available  Virtual SCSI Client Adapter
vscsi2 Defined    Virtual SCSI Client Adapter

# rendev -l en0 -n en10

# chdev -l en10 -a state=up
en10 changed

# mkdev -l inet0
inet0 Available

# ifconfig en10
en10: flags=1e080863,480<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),CHAIN>
        inet 10.1.20.19 netmask 0xffff0000 broadcast 10.153.255.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

If you want to be creative you can rename devices to anything you like (as long as it’s not more than 15 characters). For example I’ll rename vscsi2 to myvscsiadapter.
# rendev -l vscsi2 -n myvscsiadapter

# lsdev -Cc adapter
ent1           Available  Virtual I/O Ethernet Adapter (l-lan)
myadapter      Available  Virtual I/O Ethernet Adapter (l-lan)
myvscsiadapter Defined    Virtual SCSI Client Adapter
vsa0           Available  LPAR Virtual Serial Adapter
vscsi1         Available  Virtual SCSI Client Adapter

And in the last example I’ll demonstrate changing virtual SCSI adapter device names on a live system.
This is single disk system (hdisk0), with two vscsi adapters.
# lspv
hdisk0          00f6048868b4deee                    rootvg          active

# lspath
Enabled hdisk0 vscsi0
Enabled hdisk0 vscsi1

# lsdev -Cc adapter
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
ent1   Available  Virtual I/O Ethernet Adapter (l-lan)
vsa0   Available  LPAR Virtual Serial Adapter
vscsi0 Available  Virtual SCSI Client Adapter
vscsi1 Available  Virtual SCSI Client Adapter

We ensure the adapter is in a defined state before renaming it. This will fail otherwise.
# rmdev -Rl vscsi1
vscsi1 Defined

# lsdev -Cc adapter | grep vscsi
vscsi0 Available  Virtual SCSI Client Adapter
vscsi1 Defined    Virtual SCSI Client Adapter

Now we rename the adapter vscsi1 to vscsi3.
# rendev -l vscsi1 -n vscsi3

# lsdev -Cc adapter | grep vscsi
vscsi0 Available  Virtual SCSI Client Adapter
vscsi3 Defined    Virtual SCSI Client Adapter

That was easy enough. Now I need to bring the adapter and path online with cfgmgr. The lspath output displays an additional path to vscsi3.
# lspath
Enabled hdisk0 vscsi0
Defined hdisk0 vscsi1

# cfgmgr
Method error (/etc/methods/cfgscsidisk -l hdisk0 ):
        0514-082 The requested function could only be performed for some
                 of the specified paths.

# lspath
Enabled hdisk0 vscsi0
Defined hdisk0 vscsi1
Enabled hdisk0 vscsi3

Now I need to remove the old path to vscsi1. The path to vscsi3 is now Enabled. The adapter, vscsi3, is in an Available state. All is good.
# rmpath -l hdisk0 -p vscsi1 -d
path Deleted

# lspath
Enabled hdisk0 vscsi0
Enabled hdisk0 vscsi3

# lsdev -Cc adapter | grep vscsi
vscsi0 Available  Virtual SCSI Client Adapter
vscsi3 Available  Virtual SCSI Client Adapter

The same steps need to be repeated for the vscsi0 adapter. This is renamed to vscsi2.
# rmdev -Rl vscsi0
vscsi0 Defined

# lsdev -Cc adapter | grep vscsi
vscsi0 Defined    Virtual SCSI Client Adapter
vscsi3 Available  Virtual SCSI Client Adapter

# rendev -l vscsi0 -n vscsi2

# lsdev -Cc adapter | grep vscsi
vscsi2 Defined    Virtual SCSI Client Adapter
vscsi3 Available  Virtual SCSI Client Adapter

# lspath
Defined hdisk0 vscsi0
Enabled hdisk0 vscsi3

# cfgmgr
Method error (/etc/methods/cfgscsidisk -l hdisk0 ):
        0514-082 The requested function could only be performed for some
                 of the specified paths.

# lspath
Defined hdisk0 vscsi0
Enabled hdisk0 vscsi2
Enabled hdisk0 vscsi3
# rmpath -l hdisk0 -p vscsi0 -d
path Deleted

# cfgmgr

# lspath
Enabled hdisk0 vscsi2
Enabled hdisk0 vscsi3

That’s it. Both adapters have been renamed while the system was in use. No downtime required.
# lsdev -Cc adapter | grep vscsi
vscsi2 Available  Virtual SCSI Client Adapter
vscsi3 Available  Virtual SCSI Client Adapter

# lspath
Enabled hdisk0 vscsi2
Enabled hdisk0 vscsi3

Reference:

Please refer to the AIX 7.1 command reference for more information on this new command:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.cmds/doc/aixcmds4/rendev.htm
IBM AIX Version 7.1 Differences Guide:
http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg247910.html?Open

www.unixmantra.com

Introduction to Alternate Disk Cloning on AIX 6.1 and 7.1

  1. What is an alt_clone and why is it useful?
  2. What is an alt_disk_mksysb and why is it useful?
  3. What filesets need to be installed to run an alt_clone?
  4. What commands do I need to run to create a basic alt_clone?
  5. How do I create an alt_clone while updating it at the same time?
  6. How-to wake up the altinst_rootvg and put it to sleep
  7. Useful flags for the alt_clone
  8. FAQs

1) What is an alt_clone and why is it useful?

An alt_clone is an alternate disk copy of your rootvg. The alt_clone will backup all mounted jfs and jfs2 filesystems on your current rootvg and restore them to the disk that you choose.

One of its main uses is to upgrade your version of AIX to a higher Technology Level, (TL for short) or Service Pack, (SP for short) without impacting your current running rootvg. This would be beneficial, for example, in a situation where you find an application that is not compatible at a higher TL or SP, you can easily boot back to the working rootvg with minimal downtime.

Another functionality of the alt_clone command is that it can serve as a backup to your current rootvg. You can create a backup of your rootvg to a disk that is not being used.

2) What is an alt_disk_mksysb and why is it useful?

alt_disk_mksysb clones your mksysb to the disk that you choose. The advantage to using this procedure is you can clone another server's mksysb to the disk and server that you choose. 

You can upgrade your alt_disk_mksysb to a higher TL or SP and it can serve as a backup in case of an emergency. 

Some of the requirements for alt_disk_mksysb are as follows:
  • The mksysb must have all necessary device and kernel support required for the system it will be cloned to. You can not clone your mksysb and then add device support later.
  • The bos.alt_disk_install.boot_images fileset has to be installed on the source server, (where the mksysb came from) and/or on the target server where alt_clone will take place.
  • The version, release and TL of the bos.alt_disk_install.boot_images fileset must match the oslevel in the mksysb. So if the mksysb was created at 6.1TL5, the boot_images must be at 6.1.5.X.
  • You can not clone a mksysb that is at a lower level against the server you are cloning to; meaning a 6.1TL5 mksysb can not be cloned to a 6.1TL7 server.
(NOTE: Even though the alt_clone & alt_disk_mksysb are very reliable backups, it SHOULD NOT replace your mksysb or sysback backups.)

3) What filesets need to be installed to run an alt_clone?

You need the bos.alt_disk_install package located on your AIX media. At AIX 5.3 and above, this package was included in the requisites list which means they were automatically installed when you first installed a server. There are 2 filesets contained in the bos.alt_disk_install package. You may want to verify both are installed by running the following command:

# lslpp -l |grep alt_disk
You will likely find you have these filesets installed:
bos.alt_disk_install.boot_images
bos.alt_disk_install.rte

4) What commands do I need to run to create a basic alt_clone?

# smitty alt_clone

(I am only including the fields you need to change in the smit examples.)
Clone the rootvg to an Alternate Disk
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Target Disk(s) to install [ hit F4 and select hdisk1]
This will clone your existing rootvg over to hdisk1.
From command line, you would run:

# alt_disk_copy –d hdisk1

5) How do I create an alt_clone while updating it at the same time?

# smitty alt_clone:

Target Disk(s) to install [ hit F4 and select hdisk1]
Bundle to install [ hit F4 and choose update_all ]
Directory or Device with images [ /<path to updates> -or- device to use (/dev/cd0)]
(required if filesets, bundles or fixes used)
installp Flags
ACCEPT new license agreements? Yes
From command line, you would run:

# alt_disk_copy -b update_all -l /<path to updates> -d hdisk1
-or-
# alt_disk_copy -b update_all -l /dev/cd0 -d hdisk1


If you have an existing altinst_rootvg on hdisk1 and you want to update it as a separate step, you would first wake up the altinst_rootvg and run the following:

# alt_rootvg_op -W -d hdisk1
# alt_rootvg_op -C -b update_all -l /<path to updates>
-or-
# alt_rootvg_op -C -b update_all -l /dev/cd0


Note: It is imperative that you put the altinst_rootvg back to sleep once this operation is complete. See the next section for more information about the wake and sleep operations.

6) How-to wake up the altinst_rootvg and put it to sleep?

You may find the need to wake-up your altinst_rootvg. As in the example above where it was necessary to run an update_all on an existing alt_clone – you may also want to wake-up your alt_clone to access some data files in your altinst_rootvg. To wake up your altinst_rootvg disk, you would run the following:

# alt_rootvg_op -W -d hdisk1 

When you have finished working with your altinst_rootvg you MUST put it back to sleep before rebooting.

# alt_rootvg_op -S

If you wake up the altinst_rootvg or the old_rootvg and you reboot without putting the alt_clone disk back to sleep, both rootvgs run the risk of being corrupted.

The corruption can manifest itself in several ways:
  • Server hangs on reboot, (usually around LED 517, 518, 552 or 554 when it tries to varyon the rootvg or mount the rootvg filesystems).
  • The ODM is corrupted, (command stops working or generate error messages after you run the command.)
  • If your system does boot up you may see extended filesystem names such as: /alt_inst/alt_inst/var and logical volume names such as: alt_alt_hd9var

Also, the wake up command was designed to wake up the altinst_rootvg at the same level as the current rootvg or to wake the old_rootvg once we are booted to the altinst_rootvg. Waking up the altinst_rootvg which is running at a higher technology level compared to the current rootvg is not supported.

7) Useful flags for the alt_clone?

alt_disk_copy flags

-d target_disk(s) – This is a required flag. No alt_disk_copy command works without it.
-b bundle_name - This flag is used to run a update_all. The -l flag must be used with this option.
-l images_location – The location of where your update filesets are located, (EX. –l /usr/sys/inst.images.)
-i image.data – Used if you have a customized image.data file.

 A common use of a custom image.data file would be to break the mirror on the rootvg so you can alt_clone a mirrored rootvg environment to one disk. Given rootvg is on hdisk0 and mirrored to hdisk1, your image.data file is located under /home/image.data and you want to clone to hdisk2, you would run the following:

# alt_disk_copy -i /home/image.data -d hdisk2
-B Prevents the alt_disk_copy command from setting the bootlist to the target disk after the successful completion of the operation.

alt_rootvg_op flags
-d target_disk – Needed for most rootvg_op commands except for the –X and -S flags.
-C Customization operation - This flag is needed if you want to run an update_all or any other customization to your altinst_rootvg.
-b bundle_name – Use this flag to specify a software bundle, (like the update_all bundle for example.) The -l flag must be used with this option.
-l images_location – Location of your update filesets.
-W Performs a wake-up on the root volume group located on the target_disk.
-S Puts to sleep the alternate root volume group that experienced a previous "wake" operation.
-t Once you “wake up” your altinst_rootvg, you can rebuild the boot image when you put the volume group back to sleep. This is useful in case you saw a bosboot error during your alt_clone process but you want to make sure that the boot image is valid on the altinst_rootvg. You have to use the -S with the -t flag.
-X Removes the altinst_rootvg or old_rootvg volume group definition from the ODM database.


EX. If you are booted off the altinst_rootvg and you want to remove the odm information for old_rootvg, you would run the following:

# alt_rootvg_op -X old_rootvg

Note that the -X flag only removes the volume group place holder for the altinst_rootvg or old_rootvg. It does not make any modifications to the actual disk, so it's still a bootable copy of the rootvg, even though you won't see the volume group in lsvg output anymore.

alt_disk_mksysb flags
-d target_disk(s) – The disk or disks you want to clone to.
-m Specifies the location of the mksysb that you want to clone. The value for device can be: Tape / CD device OR path name of mksysb image in a file system.

EX. If you want to clone a mksysb located under /usr/sys/inst.images to hdisk1, you would run the following:

# alt_disk_mksysb -m /usr/sys/inst.images/mksysb_filename -d hdisk1

-i image_data – Used if you have a customized image.data file. A common use of a custom image.data file would be to break the mirrors on the rootvg so you can install a mksysb backup with a mirrored rootvg environment to one disk

EX. If your rootvg was mirrored to hdisk0 & hdisk1 on the source system when you created the mksysb, and you want to alt_clone to a single disk, (hdisk2 on the target system, for example) using a customized image.data file located under /home/image.data, you would run the following:

# alt_disk_mksysb -m /usr/sys/inst.images/mksysb_filename -i /home/image.data -d hdisk2

8) Frequently Asked Questions, (FAQ)

Q: Can I run commands like bosboot or varyonvg to my altinst_rootvg or my old_rootvg?
A: No!!!! For bosboot: The only supported way to create a bootimage on the altinst_rootvg, is with the alt_rootvg_op –t flag. You must “wake up” the altinst_rootvg and run the following command to create the boot image:
# alt_rootvg_op -St
(The alt_disk_install command does not have a creating boot images flag.)
If you varyon the altinst_rootvg, it tries to varyon hd5, hd6, hd4, etc….to the mount points that are currently mounted on the running rootvg. AIX tries to merge both rootvgs and it’s a mess. From our standpoint, we can not recover from this and a mksysb restore is needed.
The best rule of thumb is to run commands that “look” at the LV data on the alt_clone and not write LV data to the alt_clone. 


Q: I ran an 'alt_disk_install –X rootvg '/ 'alt_rootvg_op –X rootvg' by mistake and when i run lspv, my rootvg entry is gone. What do I do?
A: You can run these 2 commands to fix the issue:
# redefinevg -d hdisk0 rootvg
# synclvodm -Pv rootvg
Q: How come at 5.3, there are 3 alt_clone commands and only 1 command at 5.2?
A: Development realized that new functionality / flags could be introduced in later versions of AIX and it would look messy to have it all in one command. It was decided to break up the 'alt_disk_install' command into 3 commands mentioned earlier in this technote.
Q: What is the difference between alt_disk_install & alt_disk_copy?
A: alt_disk_install is a wrapper for the alt_disk_copy command. So when you run alt_disk_install, it’s actually running the alt_disk_copy command under the covers.
Q: I want to physically move my rootvg disk from one server to another. Is there a supported way for me to do that?
A: Yes. The -O flag was introduced for this feature. If you have an existing rootvg disk you can run the following command to clone hdisk0 to hdisk1. You can then physically remove hdisk1 and take it to another system.
EX.
# alt_disk_copy -B –O –d hdisk1
This performs a device reset on hdisk1. Using the safest method possible: You would then replace the altinst_rootvg placeholder in the ODM (# alt_rootvg_op -X altinst_rootvg), rmdev hdisk1, shutdown the server, physically remove that disk, go to another server that is powered down and plug in the disk. Power up the server and set the set disk as the boot device in SMS. It is not supported to simply pull a disk from one server and place it in another server without running through this process.
Q: Does alt_disk_copy convert JFS filesystems to JFS2?
A: Yes! Starting at 6100-04, a new flag was introduced that would convert JFS filesystems to JFS2. We now have the -T which does the conversion.

Saturday, 3 August 2013

AIX LVM QUORUM mysteries revealed

Technote (FAQ)

Question

Why can't I varyon a Volume Group when one or more physical volumes are not available?

Cause

Varyonvg requires 100% of it's physical volumes be available and accessible in order to successfully vary on the Volume Group without using the force option.

Answer

A common misconception is that the QUORUM setting of an LVM Volume Group can affect one's ability to varyon a volume group, when, in fact, the Volume Group QUORUM setting (enabled or disabled) has no bearing on the varyon process. This misconception is further enhanced by the following varyonvg error message...

0516-052 varyonvg: Volume group cannot be varied on without a quorum. More physical volumes in the group must be active.

This message indicates a "quorum" of physical volumes must exist in order to varyon the Volume Group and is unrelated to the Volume Group's QUORUM setting.

The Volume Group QUORUM setting is a concept that applies only to currently varied on Volume Groups in order to force varyoff of the Volume Group should it lose more than 51% of it's disks. With QUORUM disabled on the Volume Group, loss of one or more disks will not cause the Volume Group to varyoff. If QUORUM is enabled on the Volume Group, LVM will force varyoff the Volume Group if less than 51% of it's disks are available and accessible. For a two disk Volume Group with QUORUM enabled, LVM will check the number of VGDAs on each disk and varyoff the Volume Group should it lose QUORUM (if it loses the disk with two active VGDA's).

The Volume Group's QUORUM setting has no meaning for a Volume Group which is currently varied off. Varyonvg does not look at how many VGDAs a disk has, it ONLY looks at the number of physical volumes which are available and accessible. Without the -f (force) flag, ALL physical volumes in a Volume Group must be available and accessible. If one or more physical volumes is unavailable, the Volume Group may be forced online with varyonvg -f.


Excerpts from the varyonvg man page...
"The varyonvg will fail to varyon the volume group if a majority of the physical volumes are not accessible (no Quorum). This condition is true even if the quorum checking is disabled. Disabling the quorum checking will only ensure that the volume group stays varied on even in the case of loss of quorum."

"If the volume group cannot be varied on due to a loss of the majority of physical volumes, a list of all physical volumes with their status is displayed. To varyon the volume group in this situation, you will need to use the force option."

"-f Allows a volume group to be made active that does not currently have a quorum of available disks. All disk that cannot be brought to an active state will be put in a removed state. At least one disk must be available for use in the volume group."