Saturday, January 29, 2011

How can I find out what command is respawning too rapidly and filling up my wtmp file?

A common situation is that the init process is repeatedly attempting to start a failing process. The init man page describes what happens when init finds an entry is being respawned:


If the init command finds that it is continuously running an entry in
the /etc/inittab file (more than five times in 225 seconds), it assumes
that an error in the entry command string exists. It then prints an
error message to the console and logs an error in the system error log.
After the message is sent, the entry does not run for 60 seconds. If
the error continues to occur, the command will respawn the entry only
five times every 240 seconds. The init command continues to assume an
error occurred until the command does not respond five times in the
interval, or until it receives a signal from a user. The init command
logs an error for only the first occurrence of the error.


To find out what is being respawned use the steps below.


1. Check the console or console logs
Check on the console to see if init is writing an error message similar to the one below:

0 Thu Jan 22 10:16:27 EST 2009
INIT: Command is respawning too rapidly. Check for possible errors.
id:  xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"


Or search through the console log using the alog command:

# alog -t console -o | more


2. Check errpt

Next there may be an entry in the errpt output with the label "INIT_RAPID" like below:

LABEL: INIT_RAPID
IDENTIFIER: 3A30359F

Date/Time:       Wed Jan 28 10:14:17 2009
Sequence Number: 1789
Machine Id:      00CC2F914C00
Node Id:         libgng
Class:           S
Type:            TEMP
Resource Name:   init

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SOFTWARE PROGRAM

User Causes
PERFORMANCE DEGRADED

Recommended Actions
REVIEW DETAILED DATA

Detail Data
SOFTWARE ERROR CODE
Command is respawning too rapidly. Check for possible errors.
COMMAND
id:  xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"

Both messages clearly identify the failing command that is being run out of the /etc/inittab file.


3. Check the wtmp file

If the warning messages are not noticed on the system console or in errpt, the next indication of he problem may be that the /var file system is filling up. This is a result of init creating an entry in /var/adm/wtmp file during each attempt to start the problem process. See below for a procedure to format the wtmp file in readable characters for diagnosis.

This technique makes use of the fwtmp utility which is part of the bos.acct fileset.

# lslpp -w /usr/sbin/acct/fwtmp
File                                    Fileset               Type
------------------------------------------------------------------
/usr/sbin/acct/fwtmp                    bos.acct              File


What's In The wtmp File

The actual content of the wtmp is not viewable, as the wtmp entries are written as binary structures (see /usr/include/utmp.h for the format). The fwtmp utility can be used to extract the contents into a human readable format.

For example we redirect the the contents of the /var/adm/wtmp file:

# /usr/sbin/acct/fwtmp < /var/adm/wtmp--> /tmp/wtmp_readable

At quick cat of the /tmp/wtmp_readable file and we find that the
file mainly consists of the following entries:

xvfb   xvfb   5 319596 0000 0000 1078170250    Mon Mar  1 11:44:10 2004
      xvfb   8 319596 0000 0001 1078170250    Mon Mar  1 11:44:10 2004
xvfb   xvfb   5 319598 0000 0000 1078170250    Mon Mar  1 11:44:10 2004
      xvfb   8 319598 0000 0001 1078170250    Mon Mar  1 11:44:10 2004

The first numeric column shows us the ut_type of entry, as defined in the utmp.h header file. The interesting types in our case are:

#define INIT_PROCESS    5    /* Process spawned by "init" */
#define LOGIN_PROCESS   6    /* A "getty" process waiting for login */
#define USER_PROCESS    7    /* A user process */
#define DEAD_PROCESS    8

In this example the "xvfb" entry is being started by init (signified by the "5" in column 3) and in the next line it's dying (ut_type = 8)

A quick check of the inittab file we find our problem:

# grep xvfb /etc/inittab
xvfb:2:respawn:/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &

In this case, the xvfb entry was starting an X terminal server daemon.


SOLUTION
The solution would then to resolve the command problem or change the entry in inittab from respawn to off using the chitab utility:

# chitab xvfb:2:off:'/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &'

In this specific case the trailing "&" sign was removed from the X Server command and it started up normally

1 comment:

  1. Very useful blog post! Thanks for publishing it!

    As noted in the post, an inittab entry should never use & to put the command in the background. That's because when the command goes into the background, init sees the process disappear, assumes that it has failed, and tries to respawn it. Clearly not the desired behavior.

    Please note that a command is put into the background primarily so that it will not die if/when it's parent dies. But since the init process never dies, it's children do not need go into the background.

    If there is some other motivation to start a process in the background at AIX boot time, add a start/stop script to the directory tree below /etc/rc.d, as described in the "Starting and Stopping Software via System V RC Directories" Technote (at https://www.ibm.com/support/pages/starting-and-stopping-software-system-v-rc-directories).

    ReplyDelete