Thursday, August 18, 2011

Regular Booting issue in Solaris Server


 
Booting problems poses serious challenge to the system administrators as system is down and no one can use it . This article tries to cover some of the general booting problems and their possible solutions to enable understand the problem cause and bring the system up very quickly.

Following are some of the booting issues ,error messages their meaning and possible solutions


1) Booting in single user mode and mounting root disk .

2) Making boot device alias

3) "Timeout waiting for ARP/RARP packet"? error message.

4) "The file just loaded does not appear to be executable" error message.

5) "bootblk: can't find the boot program" error message.

6) "boot: cannot open kernel/unix" error message .

7) "Error reading ELF header"? error message .

8) "Cannot open '/etc/path_to_inst'" error message.

9) "Can't stat /dev/rdsk/c0t3d0s0" error message .

10) Next Steps


1.Booting in single user mode and mounting root hard disk.

Most important step in diagnosing the booting problems is booting the system in single user mode and examining the hard disk for possible errors & work out the corrective measure. Single user mode can be achieved by any of the following methods :-

ok> boot -s ;from root disk

ok> boot net -s ;from network



ok>boot cdrom -s ;from cdrom

Rebooting with command: cdrom -s

Configuring the /devices directory

Configuring the /dev directory


INIT: SINGLE USER MODE

#

# fsck /dev/rdsk/c0t3d0s0

# mount /dev/dsk/c0t3d0s0 /mnt



Perform the required operation on mounted disk , now accessible through /mnt ,& unmount the hard disk after you are done ;

# umount /mnt

# reboot



2.Making boot device alias

In case system can not boot from primary disk and it is needed to make another boot disk to access the data , nvalias command is used .

nvalias command makes the device alias and assigns an alternate name to a physical disk. Physical address of target disk is required which can be had by show-disk command on ok>.



ok> nvalias disk7 /iommu@f,e0000000/sbus@f,e0001000/dma@3,81000/esp@3,80000/sd2,0

The new aliased disk can be named as boot disk or can be used for booting by refering its name .

ok> setenv boot-device disk7

ok>reset

or

ok> boot disk7



3."Timeout waiting for ARP/RARP packet"?

At ok> type printenv and look for these parameters .

boot-device disk

mfg-switch? false

diag-switch? false

if you see "boot-device net " or true value for the other two parameter change it to the values above.

In case you wants to boot from network make sure your client is properly configured in boot server and network connections & configuration are proper.



4."The file just loaded does not appear to be executable."

Boot block on the hard disk is corrupted .Boot the system in single user mode with cdrom and reinstall boot block .

#installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t3d0s0



5."bootblk: can't find the boot program"

boot block can not find the boot programe - ufsboot in Solaris .Either ufsboot is missing or corrupted . In such cases it can be restored from the cdrom after booting from cdrom & mounting the hard disk

# cp /platform/`uname -i`/ufsboot /mnt/platform/`uname -i`



6."boot: cannot open kernel/unix"

Kernel directory or unix kernel file in this directory is not found .Probably deleted during fsck or deleted by mistake .Copy it from the cdrom or restore from the backup tape.

# cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel



7."Error reading ELF header."?

Kernel directory or unix kernel file in this directory is corrupted.Copy it from the cdrom or restore from the backup tape.

# cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel



8."Cannot open '/etc/path_to_inst'"

System can not find the /etc/path_to_install file .It might be missing or corrupted and needs to be rebuild.

To rebuild this file boot the system with -ar option :

ok>boot -ar

Press enter to select default values for the questions asked during booting and select yes to rebuild /etc/path_to_install

The /etc/path_to_inst on your system does not exist or is empty. Do you want to rebuild this file [n]? y

system will continue booting after rebuilding the file.



9."Can't stat /dev/rdsk/c0t3d0s0"

When booted from cdrom and done fsck the root partition comes out to be fine but on booting from root disk this error occurs. The device name for / is missing from /dev/dsk directory and to resolve the issue /dev & /devices directories has to be restored from root backup tapes .

Tuesday, August 2, 2011

How to reset the root Password for a ZFS File System in the Solaris 10

This document shows the steps to reset the root password for ZFS file system in Solaris 10 Operating System..




Steps to Recovery the root password:



Example 1: Resetting the root Password after Booting from the Network

In this example, I boot from the network into single-user mode and I assume that the JumpStart server has been set up properly.

Note: You can also use this method if you boot from CD.

1. Boot the server from the network into single-user mode.

ok> boot net -s

2. Check what pools are available to import. The system will report that rpool is available to import.

# zpool import

3. Import rpool.

# zpool import rpool

The system will report messages similar to this:

cannot mount '/export': failed to create mountpoint

cannot mount '/export/home': failed to create mountpoint

cannot mount '/rpool': failed to create mountpoint

Although the ZFS file systems in the pool cannot be mounted, they exist.

# zfs list

NAME USED AVAIL REFER MOUNTPOINT

rpool 12.5G 54.4G 97K /rpool

rpool/ROOT 6.97G 54.4G 21K legacy

rpool/ROOT/s10s_u8wos_08a 6.97G 54.4G 6.97G /

rpool/dump 1.00G 54.4G 1.00G -

rpool/export 2.53G 54.4G 23.5K /export

rpool/export/home 2.53G 54.4G 2.53G /export/home

rpool/swap 2G 56.4G 16K -

The file /etc/shadow that we need to access is in rpool/ROOT/s10s_u8wos_08a, whose mountpoint, /, is already in use.

# zfs get mountpoint rpool/ROOT/s10s_u8wos_08a

NAME PROPERTY VALUE SOURCE

rpool/ROOT/s10s_u8wos_08a mountpoint / local

# zfs get mounted rpool/ROOT/s10s_u8wos_08a

NAME PROPERTY VALUE SOURCE

rpool/ROOT/s10s_u8wos_08a mounted no -

4. Change the mountpoint of rpool/ROOT/s10s_u8wos_08a:

# zfs set mountpoint=/mnt rpool/ROOT/s10s_u8wos_08a

5. Mount rpool/ROOT/s10s_u8wos_08a:

# zfs mount rpool/ROOT/s10s_u8wos_08a

6. Change the password for root.

# cd /mnt/etc

# cp shadow shadow.bk

I have found that most of the time, in single-user mode, the vi editor does not perform well. So I use sed `s/current_root_passwd/new_root_password/` shadow to change the password, for example:

# sed 's/5Qa1EuzftNkIQ/v.UaDklqLain6:14586/' shadow > shadow2

# mv shadow2 shadow

7. Unmount the file system.

# cd /

# zfs umount rpool/ROOT/s10s_u8wos_08a

8. Reset the mountpoint back to /.

# zfs set mountpoint=/ rpool/ROOT/s10s_u8wos_08a

9. Reboot the system and you can log in to the system with root again.

# init 6





Example 2: Resetting the Password From a Second Disk in the System

If you have another OS, such as the Solaris 10 05/09 OS, on a second disk with a ZFS root file system, use the following procedure. This method is especially useful and practical when you are testing operating systems and applications on one development box and you need to move files between operating systems and applications.

1. With the OS running on the second disk, check what pools are available to import. The system will report that rpool is available to import.

# zpool import

2. Since the current system has rpool, import rpool on the first disk using a different name, for example, r2pool.

# zpool import rpool r2pool

You will see messages complaining that mountpoint / and /export are not empty.

3. Check that the ZFS file systems in pool r2pool are imported.

# zfs list -r r2pool

NAME USED AVAIL REFER MOUNTPOINT

r2pool 25.0G 42.0G 97K /rpool

r2pool/ROOT 6.97G 42.0G 21K legacy

r2pool/ROOT/s10s_u8wos_08a 6.97G 42.0G 6.97G /r2poolroot

r2pool/dump 8.00G 42.0G 8.00G -

r2pool/export 23.5K 42.0G 23.5K /export

r2pool/swap 10G 52.0G 16K -

4. Change the mountpoint of r2pool/ROOT/s10s_u8wos_08a and mount it.

# zfs set mountpoint=/r2poolroot r2pool/ROOT/s10s_u8wos_08a

# zfs mount r2pool/ROOT/s10s_u8wos_08a

5. Access the root file system in the first disk to change the password.

# cd /r2poolroot/etc

# vi shadow

root:5Qa1EuzftNk00:6445::::::

6. Unmount the file system.

# zfs umount r2pool/ROOT/s10s_u8wos_08a

7. Reset the mountpoint back to /.

# zfs set mountpoint=/ r2pool/ROOT/s10s_u8wos_08a

8. Set the system to boot from the first disk and reboot.

# eeprom boot-device="disk0 disk1"

9. After booting into the first disk, you will see that the root pool name is r2pool, which does not affect OS operation.

# init 6

# zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT

r2pool 68G 15.0G 53.0G 22% ONLINE -



Procedure to replace VxVM Bootdisk(i.e for rootdisk02)

Ex:
 Here is disk c2t0d0(rootdisk02) you needs to replace the c2t0d0 from all the commands below to the correct device file of rootdisk02, etc




a) If the disk is failed then vxdisk list will show the following:

# vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0t0d0 simple rootdisk01 rootdg online

c2t0d0 simple - - failed

- - rootdisk02 rootdg failed was:c2t0d0



b) Replace the faulty disk c2t0d0:

#ioscan -fnCdisk



Ensure disk is CLAIMED



#vxdctl enable

#vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0t0d0 simple rootdisk01 rootdg online

c2t0d0 simple - - online Invalid

- - rootdisk02 rootdg failed was:c2t0d0



c) Remove the disk rootdisk02 from vxdiskadm ==> Option 3

After that vxdisk list should show like this

#vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0t0d0 simple rootdisk01 rootdg online

c2t0d0 simple - - online Invalid

- - rootdisk02 rootdg removed was:c2t0d0



d) OPTIONAL STEP

Note : If you have latest VxVM Command patches are installed then there is no need to do vxdisk rm c2t0d0 i.e step d)



If you got the following error when we are doing

vxdisksetup -iB then use step d) otherwise not required



#/etc/vx/bin/vxdisksetup -iB c2t0d0

vxvm:vxdisk: ERROR: Device c2t0d0: define failed:

Attribute cannot be changed with a reinit



#vxdisk rm c2t0d0



After that vxdisk will show like this

#vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0t0d0 simple rootdisk01 rootdg online

- - rootdisk02 rootdg removed was:c2t0d0



e) Initialize the replacement disk

#/etc/vx/bin/vxdisksetup -iB c2t0d0

Ensure private offset length is 2144 from the command: vxdisk list c2t0d0

# vxdisk list c2t0d0



private: slice=0 offset=2144 len=1024





f) Use vxdiskadm option 4 to replace the disk

This will automatically start the syncing the volumes

Check the status of the mirror with the command: vxtask list



# vxtask list

TASKID PTID TYPE/STATE PCT PROGRESS

165 PARENT/R 75.00% 8/6(1) VXRECOVER

165 165 ATCOPY/R 20.32% 0/1093632/222208 PLXATT usrvol usrvol-02





g) Configure the LIF area and boot,swap,root information

#/etc/vx/bin/vxbootsetup rootdisk02



Ensure LIF Area and boot,swap,root configuration are correct & check ISL and HPUX are there

#lifls /dev/rdsk/c2t0d0

ODE MAPFILE SYSLIB CONFIGDATA SLMOD2

SLDEV2 SLDRV2 SLSCSI2 MAPPER2 IOTEST2

PERFVER2 PVCU SSINFO ISL HPUX

AUTO LABEL



Check boot,swap,root are configured properly

#vxvmboot -v /dev/rdsk/c2t0d0

LIF Label File @ (1k) block # 1434 on VxVM Disk /dev/rdsk/c2t0d0:

Label Entry: 0, Boot Volume start: 3168; length: 350 MB

Label Entry: 1, Root Volume start: 8750176; length: 512 MB

Label Entry: 2, Swap Volume start: 361568; length: 8192 MB

Label Entry: 3, Dump Volume start: 361568; length: 8192 MB



Ensure all the volumes and plexs belongs to rootdisk02 are in enabled and active states with vxprint command

If required use vxrecover -b command once again and verify everything is proper



Once all the disks are fine, if downtime permits try to boot the server with rootdisk02 hardware path.