Saturday, January 29, 2011

bosboot fails with malloc error 0301-106

bosboot fails with malloc error 0301-106
Problem(Abstract)

During or after an OS upgrade, bosboot fails with the following error:

0301-106 /usr/lib/boot/bin/mkboot_chrp the malloc call failed for size

0301-158 bosboot: mkboot failed to create bootimage.

0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.



Symptom

During or after an OS upgrade, bosboot fails with the following error:

0301-106 /usr/lib/boot/bin/mkboot_chrp the malloc call failed for size

0301-158 bosboot: mkboot failed to create bootimage.

0301-165 bosboot: WARNING! bosboot failed - do not attempt to boot device.




Diagnosing the problem

Check size of PdDv.vc ODM class file...

eg...

# ls -al /usr/lib/objrepos/PdDv*
-rw-r--r-- 1 root system 110592 Apr 14 11:42 PdDv
-rw-r--r-- 1 root system 200937472 Apr 14 11:42 PdDv.vc


Resolving the problem

bosboot uses the PdDv ODM class files to build device information into the boot image and pre-allocate memory for these devices. If the file is too large, malloc cannot satisfy the request, causing bosboot to fail.

The following instructions can be used to reduce the size of the PdDv.vc file:

# mkdir /tmp/objrepos
# cd /tmp/objrepos
# export ODMDIR=/usr/lib/objrepos
# odmget PdDv > PdDv.out
# cp /usr/lib/objrepos/PdDv /usr/lib/objrepos/PdDv.bak
# cp /usr/lib/objrepos/PdDv.vc /usr/lib/objrepos/PdDv.vc.bak
# export ODMDIR=/tmp/objrepos
# echo $ODMDIR
# odmcreate -c /usr/lib/cfgodm.ipl
# ls -l PdDv*
# odmadd /tmp/objrepos/PdDv.out
# ls -l PdDv*
# cp /tmp/objrepos/PdDv /usr/lib/objrepos/PdDv
# cp /tmp/objrepos/PdDv.vc /usr/lib/objrepos/PdDv.vc
# export ODMDIR=/etc/objrepos
# rm -rf /tmp/objrepos





Bosboot too Small

Bosboot too Small
Action Taken: Need to remove hd5(boot image) from all the drives and
recreate it on a disk in rootvg with the correct size.

 Here are the steps to take once the system has been restored from
mksysb tape.

  1.Login as root

  2.Remove the logical volume hd5
      rmlv hd5

  3. Clear the boot record from each drive.
      chpv -c hdisk#
   run this command for each drive that had hd5 on it.

  4. Run the mklv command to create the logical volume space on hdisk0.
    mklv -t boot -y hd5 -ae rootvg 1 hdisk0

  NOTE: the '1' in this command stands for 1 partition.
  Make sure the default size of your partitions is 16mb or larger.
  To find out run;

   lsvg rootvg -> look for the parameter PP SIZE. If it's 16mb or larger
run the above command as is. If it's smaller than 16mb run the mklv
command with a 2 instead of a 1.

 5. Create the boot image
    bosboot -ad /dev/hdisk0

 6. Make hdisk0 the first device in the boot list.
    bootlist -m normal hdisk0

Now reboot and see if the system come up cleanly

'bosboot' hangs

######################################################################
PROBLEM: 'bosboot' hangs

CAUSE: The major/minor number of the hdiskX(rootvg) in the odm is different from the /dev directory.

SOLUTION: Rectifying the odm entry using 'odmdelete' 'odmadd' commands. Explanation given below.
Please note that odm commands have to be handled very carefully; it is at the SAs own risk.
########

Here's a strange problem.

bosboot hangs. If you can do a "ps -ef | grep bootinfo", you'll see "/usr/sbin/bootinfo -g /dev/hd5". Or, if you run "ksh -x /usr/sbin/bosboot -ad /dev/hdisk0", you'll see it hang at "valid_dev /dev/hdisk0".

Here's the solution (assuming hdisk0):
cd dev
ls -l hdisk0 -> 26, 1

odmget -q value3=hdisk0 CuDvDr  -> 24, 1

looks like odm thinks that hdisk0 is a different major/minor number then /dev thinks it is      [root cause for the bosboot hang]
see if any other devices are in ODM at 26,1

odmget -q 'value1=26 value2=1' CuDvDr

if there is, and it is something that you can live without, get rid of it from the odm

odmdelete -q 'value1=26 value2=1' -o CuDvDr

Now, let's fix hdisk0.

odmget -q value3=hdisk0 CuDvDr >hdisk0.out

vi the file and change the major/minor number to reflect /dev

odmdelete -q value3=hdisk0 -o CuDvDr ->to delete old entry

odmadd hdisk0.out

synclvodm -Pv rootvg -> OK

odmget -q value3=hdisk0 CuDvDr  ->looks good

bosboot -ad /dev/hdisk0 -> OK this time

bootlist -m normal hdisk0 ->OK

odmget info ex(on one machine):
# odmget -q value3=hdisk0 CuDvDr|more

CuDvDr:
        resource = "devno"
        value1 = "22"
        value2 = "0"
        value3 = "hdisk0"
#

fixing broken fileset issue in HACMP

We updated HACMP 5.3 to 5.5 and is seeing lppchk output for three
commands:

# lppchk -v ==> The 5.3 versions of 3 HACMP show up as "broken"
cluster.es.cspoc.cmds
cluster.es.cspoc.dsh
cluster.es.cspoc.rte

# lslpp -l | grep cluster.es.cspoc ==> Only 5.5 versions show up

we tar's up ODM:
# cd /
# tar -cvf /tmp/odm.tar ./etc/objrepos ./usr/lib/objrepos


cluster filesets upgrade to ha 5.5, but the install gave
messages that the following filesets are broken...

cluster.es.cspoc.* 5.3


# export ODMDIR=/usr/lib/objrepos
# odmget -q "name=cluster.es.cspoc.cmds and rel=3" lpp

# lppchk -v
lppchk:  The following filesets need to be installed or corrected to
bring
         the system to a consistent state:

  bos.txt.bib.data 4.1.0.0                (not installed; requisite
fileset)
  cluster.es.cspoc.cmds 5.3.0.3           (BROKEN)
  cluster.es.cspoc.dsh 5.3.0.0            (BROKEN)
  cluster.es.cspoc.rte 5.3.0.3            (BROKEN)
# export ODMDIR=/usr/lib/objrepos
# odmget -q "lpp_name=cluster.es.cspoc.cmds and rel=3" product

product:
        lpp_name = "cluster.es.cspoc.cmds"
        comp_id = "5765-F6200"
        update = 0
        cp_flag = 273
        fesn = ""
        name = "cluster.es.cspoc"
        state = 10
        ver = 5
        rel = 3
        mod = 0
        fix = 0
        ptf = ""
        media = 3
        sceded_by = ""
        fixinfo = ""
        prereq = "*coreq cluster.es.cspoc.rte 5.3.0.0\n\
"
        description = "ES CSPOC Commands"
        supersedes = ""

product:
        lpp_name = "cluster.es.cspoc.cmds"
        comp_id = "5765-F6200"
        update = 1
        cp_flag = 289
        fesn = ""
        name = "cluster.es.cspoc"
        state = 7
        ver = 5
        rel = 3
        mod = 0
        fix = 3
        ptf = ""
        media = 3
        sceded_by = ""
        fixinfo = ""
        prereq = "*ifreq cluster.es.cspoc.rte (5.3.0.0) 5.3.0.1\n\
*ifreq cluster.es.server.diag (5.3.0.0) 5.3.0.1\n\
*ifreq cluster.es.server.rte (5.3.0.0) 5.3.0.1\n\
"
        description = "ES CSPOC Commands"
        supersedes = ""
# odmget -q "name=cluster.es.cspoc.cmds and rel=3" lpp

lpp:
        name = "cluster.es.cspoc.cmds"
        size = 0
        state = 7
        cp_flag = 273
        group = ""
        magic_letter = "I"
        ver = 5
        rel = 3
        mod = 0
        fix = 0
        description = "ES CSPOC Commands"
        lpp_id = 611

# odmdelete -q lpp_id=611 -o lpp
# odmdelete -q "lpp_name=cluster.es.cspoc.cmds and rel=3" -o product
2 objects deleted
# odmdelete -q lpp_id=611 -o lpp
1 objects deleted
# odmdelete -q lpp_id=611 -o inventory
199 objects deleted
# odmdelete -q lpp_id=611 -o history
4 objects deleted

We canclean up the lppchk -v "BROKEN" entries by doing the following:

Getting the lpp_id's:
# export ODMDIR=/usr/lib/objrepos
# odmget -q "name=cluster.es.cspoc.cmds and rel=3" lpp | grep lpp_id
        lpp_id = 611
# odmget -q "name=cluster.es.cspoc.dsh and rel=3" lpp | grep lpp_id
        lpp_id = 604
# odmget -q "name=cluster.es.cspoc.rte and rel=3" lpp | grep lpp_id
        lpp_id = 610

Deleting the 5.3 entries:
# export ODMDIR=/usr/lib/objrepos
# odmdelete -q "lpp_name=cluster.es.cspoc.cmds and rel=3" -o product
# odmdelete -q lpp_id=611 -o lpp
# odmdelete -q lpp_id=611 -o inventory
# odmdelete -q lpp_id=611 -o history
# odmdelete -q "lpp_name=cluster.es.cspoc.dsh and rel=3" -o product
# odmdelete -q lpp_id=604 -o lpp
# odmdelete -q lpp_id=604 -o inventory
# odmdelete -q lpp_id=604 -o history
# odmdelete -q "lpp_name=cluster.es.cspoc.rte and rel=3" -o product
# odmdelete -q lpp_id=610 -o lpp
# odmdelete -q lpp_id=610 -o inventory
# odmdelete -q lpp_id=610 -o history
# export ODMDIR=/etc/objrepos

That will leave you with this:
# lppchk -v
lppchk:  The following filesets need to be installed or corrected to
bring
         the system to a consistent state:

  bos.txt.bib.data 4.1.0.0                (not installed; requisite
fileset)

For that to go away, you'll need to install that from Volume 1 of your
AIX installation media.

--------------------------

I followed your procedure and got the following results.  It appears I
don?t end up with ?bos.txt.bib.data 4.1.0.0? needing to be installed.  I
did notice two of the inventory commands deleting large numbers of
objects and would like to know if that is a potential issue.  Everything
else looks great. 

oxxxxxxx:/te/root> export ODMDIR=/usr/lib/objrepos
oxxxxxxx:/te/root> lppchk -v
lppchk:  The following filesets need to be installed or corrected to
bring
         the system to a consistent state:

  cluster.es.cspoc.cmds 5.3.0.3           (BROKEN)
  cluster.es.cspoc.dsh 5.3.0.0            (BROKEN)
  cluster.es.cspoc.rte 5.3.0.3            (BROKEN)

oxxxxxxx:/te/root> odmget -q "name=cluster.es.cspoc.cmds and rel=3"
lpp | grep lpp_id
        lpp_id = 611
oxxxxxxx:/te/root> odmget -q "name=cluster.es.cspoc.dsh and rel=3"
lpp | grep lpp_id
        lpp_id = 604
oxxxxxxx:/te/oot> odmget -q "name=cluster.es.cspoc.rte and rel=3"
lpp | grep lpp_id
        lpp_id = 610

oxxxxxxx:/te/root> odmdelete -q "lpp_name=cluster.es.cspoc.cmds and
rel=3" -o product
0518-307 odmdelete: 0 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=611 -o lpp
0518-307 odmdelete: 1 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=611 -o inventory
0518-307 odmdelete: 199 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=611 -o history
0518-307 odmdelete: 4 objects deleted.
oxxxxxxx:/te/root> odmdelete -q "lpp_name=cluster.es.cspoc.dsh and
rel=3" -o product
0518-307 odmdelete: 0 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=604 -o lpp
0518-307 odmdelete: 1 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=604 -o inventory
0518-307 odmdelete: 3 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=604 -o history
0518-307 odmdelete: 2 objects deleted.
oxxxxxxx:/te/root> odmdelete -q "lpp_name=cluster.es.cspoc.rte and
rel=3" -o product
0518-307 odmdelete: 0 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=610 -o lpp
0518-307 odmdelete: 1 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=610 -o inventory
0518-307 odmdelete: 53 objects deleted.
oxxxxxxx:/te/root> odmdelete -q lpp_id=610 -o history
0518-307 odmdelete: 4 objects deleted.
oxxxxxxx:/te/root> export ODMDIR=/etc/objrepos
oxxxxxxx:/te/root> lppchk -v
oxxxxxxx:/te/root>

Fixing broken fileset in AIX

axxxx@xxxxxxx)lppchk -v
lppchk:  The following filesets need to be installed or corrected to bring the system to a consistent state:
openssh.license 4.1.0.5300              (BROKEN)
In order to fix this you'll need to get the base level of the
'openssh.license' fileset and run a force overwrite.
# installp -acFNXYd <device or directory location> openssh.license
This will reinstall the fileset in the committed state and remove the
broken status.


broken X11.base.lib fileset in the Broken state
after 6100-02 to 6100-04 update.
Action Taken : we had the base 6100-04 version.
# installp -acFNXYd . X11.base.rte  -> success
# lppchk -v  -> clean
# oslevel -s  -> 6100-04-01


To correct your phantom fileset problem, please run the following :
$ export ODMDIR=/usr/lib/objrepos
$ odmdelete -q lpp_name="http_server.base.source" -o product
 ==> It should answer "1 object deleted".
Then set back the odm dir to default :
$ export ODMDIR=/etc/objrepos

Burn Image to DVD in AIX

Burn Image to DVD
there are two ways to restore a mksysb file.  One is to use NIM, the other is to burn the mksysb image onto
DVD.

  This was the copying a mksysb image to a DVD or creating an ISO
image with the entire DVD image in it.  So I'll just give you some
sample commands:

To create the mksysb image:
# mksysb -i /some/file
Note: Make sure that the filesystem you are using is either larg-file
enabled JFS or JFS2.

To burn that mksysb image onto a DVD (using UDF format):
# mkdvd -U -m /some/file -d /dev/cd0
This will skip the step of creating the mksysb image and use the one you
specify.  Again, this must be done on a system at the same ML or higher
than the original system.

To create an ISO image of the DVD:
# mkdvd -S -m /some/file -d /dev/cd0
If you want the mkdvd command to create the mksysb image for you, just
leave out the -m flag:
# mkdvd -S -d /dev/cd0

To burn an ISO image using an AIX system:
# burn_cd -d /dev/cd0 /some/ISO_file
Note: The -d flag indicates that this is a DVD.  For CDs, leave the -d
out.




Cannot create a file on a NFS mount point


NFS master 'tsm2' exporting /mksysb. Trying to mount /mksysb

mount tsm2:/mksysb /mnt // mounted and available.

/usr/bin/mksysb -i /mnt/filename >> cannot open /mnt/filename
permission denied

cd /mnt
touch jack.out >> cannot create

logged in as root.

#showmount -e tsm2 >> /mksysb everyone

# ls -ld /mnt >> 777 root,system

# hostname -> pxxxxxx2

On the NFS:

host pxxxxx2 >> pxxxxxt.fxxxxxxr.com is 10.20.100.5 aliases:
pxxxxxxt.fxxxxxd.com

host 10.20.100.5 >> pxxxxxxxxt.fxxxxxxxr.com is 10.20.100.5

more /etc/netsvc.conf >> nothing uncommented.

more /etc/hosts >> 10.20.100.5 is not found.

umount /mksysb from the client

on the NFS server:

exportfs -u /mksysb
more /etc/exports >>  /mksysb (no permissions)
cp /etc/exports /etc/exports.old
vi /etc/exports >> comment out /mksysb
smitty mknfsexp >>  Add a Directory to Exports List
  Hosts allowed root access  pxxxxxxxt

# showmount -e >> /mksysb everyone
# more /etc/exports >> /mksysb -root=tsm2  << tsm2 should be pxxxxxxt
# smitty rmnfsexp >> /mksysb
# showmount -e >> mksysb is not listed.
# smitty mknfsexp >> Hosts allowed root access  pxxxxxxxt
# showmount -e >> /mksysb everyone

On the NFSclient:

# mount tsm2:/mksysb /mnt
# touch file >> created the file.

/usr/sbin/mksysb -i /mnt/filename // the mksysb is running.