A common situation is that the init process is repeatedly attempting to start a failing process. The init man page describes what happens when init finds an entry is being respawned:
If the init command finds that it is continuously running an entry in
the /etc/inittab file (more than five times in 225 seconds), it assumes
that an error in the entry command string exists. It then prints an
error message to the console and logs an error in the system error log.
After the message is sent, the entry does not run for 60 seconds. If
the error continues to occur, the command will respawn the entry only
five times every 240 seconds. The init command continues to assume an
error occurred until the command does not respond five times in the
interval, or until it receives a signal from a user. The init command
logs an error for only the first occurrence of the error.
To find out what is being respawned use the steps below.
1. Check the console or console logs
Check on the console to see if init is writing an error message similar to the one below:
0 Thu Jan 22 10:16:27 EST 2009
INIT: Command is respawning too rapidly. Check for possible errors.
id: xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"
Or search through the console log using the alog command:
# alog -t console -o | more
2. Check errpt
Next there may be an entry in the errpt output with the label "INIT_RAPID" like below:
LABEL: INIT_RAPID
IDENTIFIER: 3A30359F
Date/Time: Wed Jan 28 10:14:17 2009
Sequence Number: 1789
Machine Id: 00CC2F914C00
Node Id: libgng
Class: S
Type: TEMP
Resource Name: init
Description
SOFTWARE PROGRAM ERROR
Probable Causes
SOFTWARE PROGRAM
User Causes
PERFORMANCE DEGRADED
Recommended Actions
REVIEW DETAILED DATA
Detail Data
SOFTWARE ERROR CODE
Command is respawning too rapidly. Check for possible errors.
COMMAND
id: xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"
Both messages clearly identify the failing command that is being run out of the /etc/inittab file.
3. Check the wtmp file
If the warning messages are not noticed on the system console or in errpt, the next indication of he problem may be that the /var file system is filling up. This is a result of init creating an entry in /var/adm/wtmp file during each attempt to start the problem process. See below for a procedure to format the wtmp file in readable characters for diagnosis.
This technique makes use of the fwtmp utility which is part of the bos.acct fileset.
# lslpp -w /usr/sbin/acct/fwtmp
File Fileset Type
------------------------------------------------------------------
/usr/sbin/acct/fwtmp bos.acct File
What's In The wtmp File
The actual content of the wtmp is not viewable, as the wtmp entries are written as binary structures (see /usr/include/utmp.h for the format). The fwtmp utility can be used to extract the contents into a human readable format.
For example we redirect the the contents of the /var/adm/wtmp file:
# /usr/sbin/acct/fwtmp < /var/adm/wtmp--> /tmp/wtmp_readable
At quick cat of the /tmp/wtmp_readable file and we find that the
file mainly consists of the following entries:
xvfb xvfb 5 319596 0000 0000 1078170250 Mon Mar 1 11:44:10 2004
xvfb 8 319596 0000 0001 1078170250 Mon Mar 1 11:44:10 2004
xvfb xvfb 5 319598 0000 0000 1078170250 Mon Mar 1 11:44:10 2004
xvfb 8 319598 0000 0001 1078170250 Mon Mar 1 11:44:10 2004
The first numeric column shows us the ut_type of entry, as defined in the utmp.h header file. The interesting types in our case are:
#define INIT_PROCESS 5 /* Process spawned by "init" */
#define LOGIN_PROCESS 6 /* A "getty" process waiting for login */
#define USER_PROCESS 7 /* A user process */
#define DEAD_PROCESS 8
In this example the "xvfb" entry is being started by init (signified by the "5" in column 3) and in the next line it's dying (ut_type = 8)
A quick check of the inittab file we find our problem:
# grep xvfb /etc/inittab
xvfb:2:respawn:/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &
In this case, the xvfb entry was starting an X terminal server daemon.
SOLUTION
The solution would then to resolve the command problem or change the entry in inittab from respawn to off using the chitab utility:
# chitab xvfb:2:off:'/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &'
In this specific case the trailing "&" sign was removed from the X Server command and it started up normally
If the init command finds that it is continuously running an entry in
the /etc/inittab file (more than five times in 225 seconds), it assumes
that an error in the entry command string exists. It then prints an
error message to the console and logs an error in the system error log.
After the message is sent, the entry does not run for 60 seconds. If
the error continues to occur, the command will respawn the entry only
five times every 240 seconds. The init command continues to assume an
error occurred until the command does not respond five times in the
interval, or until it receives a signal from a user. The init command
logs an error for only the first occurrence of the error.
To find out what is being respawned use the steps below.
1. Check the console or console logs
Check on the console to see if init is writing an error message similar to the one below:
0 Thu Jan 22 10:16:27 EST 2009
INIT: Command is respawning too rapidly. Check for possible errors.
id: xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"
Or search through the console log using the alog command:
# alog -t console -o | more
2. Check errpt
Next there may be an entry in the errpt output with the label "INIT_RAPID" like below:
LABEL: INIT_RAPID
IDENTIFIER: 3A30359F
Date/Time: Wed Jan 28 10:14:17 2009
Sequence Number: 1789
Machine Id: 00CC2F914C00
Node Id: libgng
Class: S
Type: TEMP
Resource Name: init
Description
SOFTWARE PROGRAM ERROR
Probable Causes
SOFTWARE PROGRAM
User Causes
PERFORMANCE DEGRADED
Recommended Actions
REVIEW DETAILED DATA
Detail Data
SOFTWARE ERROR CODE
Command is respawning too rapidly. Check for possible errors.
COMMAND
id: xvfb "/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &"
Both messages clearly identify the failing command that is being run out of the /etc/inittab file.
3. Check the wtmp file
If the warning messages are not noticed on the system console or in errpt, the next indication of he problem may be that the /var file system is filling up. This is a result of init creating an entry in /var/adm/wtmp file during each attempt to start the problem process. See below for a procedure to format the wtmp file in readable characters for diagnosis.
This technique makes use of the fwtmp utility which is part of the bos.acct fileset.
# lslpp -w /usr/sbin/acct/fwtmp
File Fileset Type
------------------------------------------------------------------
/usr/sbin/acct/fwtmp bos.acct File
What's In The wtmp File
The actual content of the wtmp is not viewable, as the wtmp entries are written as binary structures (see /usr/include/utmp.h for the format). The fwtmp utility can be used to extract the contents into a human readable format.
For example we redirect the the contents of the /var/adm/wtmp file:
# /usr/sbin/acct/fwtmp < /var/adm/wtmp--> /tmp/wtmp_readable
At quick cat of the /tmp/wtmp_readable file and we find that the
file mainly consists of the following entries:
xvfb xvfb 5 319596 0000 0000 1078170250 Mon Mar 1 11:44:10 2004
xvfb 8 319596 0000 0001 1078170250 Mon Mar 1 11:44:10 2004
xvfb xvfb 5 319598 0000 0000 1078170250 Mon Mar 1 11:44:10 2004
xvfb 8 319598 0000 0001 1078170250 Mon Mar 1 11:44:10 2004
The first numeric column shows us the ut_type of entry, as defined in the utmp.h header file. The interesting types in our case are:
#define INIT_PROCESS 5 /* Process spawned by "init" */
#define LOGIN_PROCESS 6 /* A "getty" process waiting for login */
#define USER_PROCESS 7 /* A user process */
#define DEAD_PROCESS 8
In this example the "xvfb" entry is being started by init (signified by the "5" in column 3) and in the next line it's dying (ut_type = 8)
A quick check of the inittab file we find our problem:
# grep xvfb /etc/inittab
xvfb:2:respawn:/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &
In this case, the xvfb entry was starting an X terminal server daemon.
SOLUTION
The solution would then to resolve the command problem or change the entry in inittab from respawn to off using the chitab utility:
# chitab xvfb:2:off:'/usr/bin/X11/X -force -vfb -x abx -x dbe -x GLX :1 &'
In this specific case the trailing "&" sign was removed from the X Server command and it started up normally
Very useful blog post! Thanks for publishing it!
ReplyDeleteAs noted in the post, an inittab entry should never use & to put the command in the background. That's because when the command goes into the background, init sees the process disappear, assumes that it has failed, and tries to respawn it. Clearly not the desired behavior.
Please note that a command is put into the background primarily so that it will not die if/when it's parent dies. But since the init process never dies, it's children do not need go into the background.
If there is some other motivation to start a process in the background at AIX boot time, add a start/stop script to the directory tree below /etc/rc.d, as described in the "Starting and Stopping Software via System V RC Directories" Technote (at https://www.ibm.com/support/pages/starting-and-stopping-software-system-v-rc-directories).