Tuesday, 14 May 2013

vmstat output explained

vmstat - CPU/RAM

vmstat -t 5 3 shows 3 statistics in 5 seconds interval (-t: it will show timestamps as well)
vmstat -l 5 it will show large pages as well (alp:active large page, flp: free large page)
vmstat -s displays the count of various events (paging in and paging out events)
vmstat hdisk0 2 5 displays 5 summaries for hdisk0 at 2 seconds interval
# vmstat -Iwt 2 (it is what IBM-ers are using)

kthr            memory                         page                       faults                 cpu             time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
r   b   p        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa    pc    ec hr mi se
0   0   0    1667011      35713     0     0     0     0     0      0    16    488   250  0  0 99  0  0.01   0.3 11:38:56
0   0   0    1667012      35712     0     0     0     0     0      0    16    102   236  0  0 99  0  0.01   0.1 11:38:58
1   0   0    1664233      38490     0     1     0     0     0      0    12    218   245  0  0 99  0  0.01   0.3 11:39:00
0   0   0    1664207      38515     0    15     0     0     0      0   164   5150   450  1  3 96  0  0.20   4.9 11:39:02

kthr: kernel threads

r:    threads placed in run queue (runnable threads) or are already ecexuting (running)
b:    threads placed in virtual memory waiting queue (b=blocked queue,waiting for resource (e.g. filesystem I/O blocked, inode lock))

(kernel threads blocking on blocked io, it is an indication of io workload and if we have inode lock contetion.)

If runnable threads (r) divided by the number of CPU is greater than one -> possible CPU bottleneck
(The (r) coulmn should be compared with number of CPUs (logical CPUs as in uptime) if we have enough CPUs or we have more threads.)

High numbers in the blocked processes column (b) indicates slow disks.
(r) should always be higher than (b); if it is not, it usually means you have a CPU bottleneck
an example:
lcpu=2, r=18 (18/2=9), so 8 threads are waiting. But you have to compare this number with the nature of the work is being done. (These processes are holding onto a CPU for a long time or they are using the CPU (running) for a very little time then they get load off from there.) If a queue can be emptied fast then 8 may not be a problem.
---------------------------

memory:

avm:   The amount of active virtual memory (in 4k pages) you are using, not including file pages.
Active virtual memory is defined as the number of virtual-memory working segment pages that have actually been touched.
from Earl Jew:
Active Virtual Memory is computational memory which is active. AVM does not include any file buffer cache at all. AVM is your computational memory percent that you see listed under topas. AVM includes the active pages out on the paging space. It is possible you have computational memory or virtual memory which was not recently active and it would not be in this caclulation."

"Over memory commitment would be a situation where AVM would be greater that the installed RAM. It is good to keep AVM at or less than 80%."

(non computational memory is your file buffer cache)

fre:    The size of your memory free list.

We don't worry when fre is small, as AIX loves using every last drop of memory and does not return it as fast as you might like. This setting is determined by the minfree parameter of the vmo command.
---------------------------

page:

pi:    Pages paged in from the paging space. (if there is any it is not a problem)
po:    Pages paged out to the paging space. (if there is any this could be a pronlem!)
fi:    file system reads
fo:    file system writes
fr:    Pages freed (replaced)
sr:    pages scanned (pages should be scanned to see if it could be freed)
fr and sr ratio can show how much pages we had to scan to free up that amount.
(if we scanned 1000 and freed 999 those memory pages were not in use recently, it is an indicator)

Look at the largest value of avm (output of vmstat: active virtual pages). Multiply it by 4KB. Compare that number with the installed RAM.

Ideally avm should be smaller than total RAM. (avm * 4096 < bootinfo -r * 1024) If not some amount of virtual memory paging will occur.

If there is far more virtual memory than real memory, this could cause excessive paging which then results in delays.

But if avm is lower than RAM and paging activity occurs, then tuning minperm/maxperm could reduce paging.

(If the system paging too much, using vmo/vmtune may help)

If sr is much higher than fr (5 times higher) than it should take your attention. If you had to scan a lot to free a little bit, it means that the memory is recently used, so it is harder to steal.

If fi+fo is greater than free memory (fre), then system has to scan (sr) and free (fr) pages to push through that amount of I/O and this increases the 'free frame waits' value. lrud is scanning and freeing the needed memory pages.
---------------------------

faults:

in:  interrupt rate (hardware interrups against the network or san... it is good if it is not high, like here)
sy:  system calls (this amount shows how much work is done by the system, if it is a 6 digit number it is doing a lot of work)
cs:  context switch (process or thread switch) (the rate is given in switches per second)
(A context switch occurs when the currently runnig thread is different from the previously running thread, so it is taken off of the CPU.)

It is not uncommon to see the context switch rate be approximately the same as device interrupt rate (in column)

If cs is high, it may indicate too much process switching is occurring, thus using memory inefficiently.

If a program is written inefficiently, it may generate an unusually large number of system calls. (sy)

If cs is higher then sy, system is doing more context switching than actual work.

High r with high cs -> possible lock contention

Lock contention occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more granular the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.)

When you are seeing blocked processes or high values on waiting on I/O (wa), it usually signifies either real I/O issues where you are waiting for file accesses or an I/O condition associated with paging due to a lack of memory on your system.
---------------------------

cpu:   

us:   % of CPU time spent in user mode (not using kernel code, not able to acces to kernel resources)
sy:   % of CPU time spent in system mode (it can acces kernel resources (all the nfs daemons and lrud are kernel processes)
id:   % of CPU time when CPUs is idle
wa:   % of CPU time when there was at least one I/O in progress (waiting for finishing that I/O)

pc:   physical capacity (how much physical cpu is used)
ec:   entitled capacity (in percentage) (it correlates with the system calls (sy))

When a wait process is running it can show up either in id (idle) or wa (wait):

-wait%: if there is at least 1 outstanding thread which is waiting for something (such as I/O to complete, or read it from disk)
-idle%: if there is nothing to wait for it will show up as idle%

(If the CPU is waiting data from real memory, the CPU is still considered as being in busy state. )
To measure true idle time measure id+wa together:
- if id=0%, it does not mean all CPU is consummed, becuase "wait" (wa) can be 100% and waiting for an I/O to complete

- if wait=0%, it does not mean I have no I/O waiting issues, because as long I have threads which keep the CPU busy I could have additional threads waiting for I/O, but this will be masked by the running threads

If process A is running and process B is waiting on I/O, the wai% still would have a 0 number.
A 0 number doesn't mean I/O is not occurring, it means that the system is not waiting on I/O.

If process A and process B are both waiting on I/O, and there is nothing that can use the CPU, then you would see that column increase.

- if wait% is high, it does not mean I have io performance problem, it can be an indication that I am doing some IO but the cpu does not kept busy at all

- if id% is high then likely there is no CPU or I/O problem
To measure cpu utilization measure us+sy together (and compare it to physc):

- if us+sy is always greater than 80%, then CPU is approaching its limits (but check physc as well and in "sar -P ALL" for each lcpu)

- if us+sy = 100% -> possible CPU bottleneck, but in an uncapped shared lpar check physc as well.

- if sy is high, your appl. is issuing many system calls to the kernel and asking the kernel to work. It measures how heavily the appl. is using kernel services.

- if sy  is higher then us, this means your system is spending less time on real work (not good)


Don't forget to compare these values with ouputs where each logical CPU can be seen (like "sar -p ALL 1 5")

Some examples when physical consumption of a CPU should be also looked when smt is on.:

- usr+sys=16%, but physc=0.56, it means i see 16% is utliized of a CPU, but actually half of the physical CPU (0.56) is used.

- if us+sys=100 and physc=0.45 we have to look both. If someone says 100% percent is used, then 100% of what? The 100% of the half of the CPU (physc=0.45) is used.

- %usr+%sys=83% for lcpu 0 (output from command sar). It looks a high number at the first sight,  but if you check physc, you can see only 0.01 physical core has been used, and the entitled capacityis 0.20, so this 83% is actually very little CPU consumption.

------------------------------------
------------------------------------
------------------------------------
# vmstat -v
4980736 memory pages
739175 lruable pages
--------------------
432957 free pages
--------------------
1 memory pools
84650 pinned pages
80.0 maxpin percentage
20.0 minperm percentage                               <== system’s minperm% setting
80.0 maxperm percentage                                <== system’s maxperm% setting
2.2 numperm percentage                    <== % of memory containing non-comp. pages
16529 file pages                                        <== # of non-comp. pages
0.0 compressed percentage
0 compressed pages
2.2 numclient percentage        <== % of memory containing non-comp. client pages
80.0 maxclient percentage                    <== system’s maxclient% setting
16503 client pages                                    <== # of client pages
0 remote pageouts scheduled
-----------------------
940098 pending disk I/Os blocked with no pbuf       <== every disk allocated to a vg has a certain amount of pbuf
1141440 paging space I/Os blocked with no psbuf            <== paging space buffer
2228 filesystem I/Os blocked with no fsbuf              <== jfs filesystem buffer
0 client filesystem I/Os blocked with no fsbuf   <== nfs/veritas filesystem buffer
382716 external pager filesystem I/Os blocked with no fsbuf      <==jfs2 filesystem buffer
-------------------------
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
free pages:
how many free pages we have. earls rule: 5 digit of free pages is ideal, 6 digit is generous, 4 digits trouble, 3 digits you are in big trouble
pbuf:
These are physical device buffers. pbus are allocated in the memory per lun in the volume group. (if you have more luns there will be more pbufs) Every lun in the vg are pulled together, all ios to these LUNs go through these pbufs (these are pinned memory structures)if you exhaust pbufs, you will get pending disk I/O blocked with pbuf and you need to allocate more pbufs
psbuf:
you have to compare this with vmstat -s: paging space page outs. If you see 'psbufs' value is relatively high to paging space page outs then you know, there are high burst of paging out, so pbufs can't handle it. If 'psbufs' value is low, then paging out is moderate there are'nt so big peaks.
fsbuf:
When AIX mounts a fs it allocates a static naumber of fsbufs per filesytem. (and that is included in pinned memory)
These numbers (last 5 lines) means that many times the buffer for the specified fs has been exhausted (no I/O can go through until the fs buffer unblocks)
------------------------------------
------------------------------------
------------------------------------
# vmstat -s
15503846449 total address trans. faults
3320663543 page ins                        <== filesystem reads from disk (vmstat:page:fi)
3257961345 page outs                       <== filesystem writes to disk (vmsta:page:fo)
-----------------------------------
1775154 paging space page ins           <== vmstat:page:pi
2477803 paging space page outs          <== vmstat:page:po
-----------------------------------
0 total reclaims
9424678118 zero filled pages faults
158255178 executable filled pages faults
-----------------------------------
36410003498 pages examined by clock         <== vmstat:page:sr
169803 revolutions of the clock hand
2438851639 pages freed by the clock        <== vmstat:page:fr
------------------------------------
179510410 backtracks
699 free frame waits
0 extend XPT waits
------------------------------------
192163699 pending I/O waits
6572422694 start I/Os
447693244 iodones
------------------------------------
43768541570 cpu context switches            <== vmstat:faults:cs
12683830408 device interrupts
528405827 software interrupts
4196361885 decrementer interrupts
40062263 mpc-sent interrupts
40062181 mpc-received interrupts
772686338 phantom interrupts
0 traps
102934901653 syscalls
total address trans. faults:
Every page ins/outs will cause 1 total addr. trans. faults.

- If the sum of page ins+outs is higher that total addr. trans. faults, it means data is paged in and out that has the total addres trans. faults already calculated, so I am reading in and out the same data
- If the sum of page ins+outs smaller than total addr. trans. faults it means we are not reading/writing the same data, but there are additional io probably from process executions...

The value of total addr. trans. faults can be compared to the sum of the below 4 lines (page ins/outs, paging space page ins/outs). If the 1st line is larger than the sum of the below 4 than the TLB (Translation Lookaside buffer) has to be recalculated for the contents that already have.
paging space page outs:
earls rule: independently from the system uptime is paging space page outs 5 digits then it should grab your attention and every plus digit should take 10 times more concern from you (6 digit 10 times concern, 7 digit 100 times more concern, 8 digit ....)
pages examined -revolutions of the clock hand - pages freed:
clock hand: it examines the pages in memory (in background at a very low priority).
lrud is a kernel process that does the scanning and freeing (sr and fr in vmstat -I). The clock hand is the pointer that lrud is using for scanning and freeing memory. It examines the pages and/or frees the pages.

If the system has nothing to do, the clock hand starts to examine pages. And if there are pages which have not been used, it frees them.

revolutions of the clock hand means lrud that many times scanned through the memory since uptime.

(If it is low the system is busy most of the time. If the system would be totally idle for 60 day it would be a 6 digit number.)

'pages examined' shows how many pages have been scanned by lrud, 'pages freed' shows how many pages were freed. ratio of pages examined and pages freed is usueful to know (how much work a system has to do to free some pages.)
free frame waits:
it is whenever the amount of free memory hits zero (since boot how many times ther were no free memory) And the system has to scan and free memory in order to
start I/Os - iodones:
how many ios started and how many are done (if it is blocked/timed out it is not done it had to be restarted). If iodones are higher than start I/O then probably NFS is running there. (page ins+page outs is the start I/Os)

0 blogger-disqus:

Post a comment