solaris

Monday, March 14, 2016

Sun Cluster Introduction

Sun Cluster 3.2 has the following features and limitations:

* Support for 2-16 nodes.

* Global device capability--devices can be shared across the cluster.

* Global file system --allows a file system to be accessed simultaneously by all cluster nodes.

* Tight implementation with Solaris--The cluster framework services have been implemented in the kernel.

* Application agent support.

* Tight integration with zones.

* Each node must run the same revision and update of the Solaris OS.

* Two node clusters must have at least one quorum device.

* Each cluster needs at least two separate private networks. (Supported hardware, such as ce and bge may use tagged VLANs to run private and public networks on the same physical connection.)

* Each node's boot disk should include a 500M partition mounted at /globaldevices

* Attached storage must be multiply connected to the nodes.

* ZFS is a supported file system and volume manager. Veritas Volume Manager (VxVM) and Solaris Volume Manager (SVM) are also supported volume managers.

* Veritas multipathing (vxdmp) is not supported. Since vxdmp must be enabled for current VxVM versions, it must be used in conjunction with mpxio or another similar solution like EMC's Powerpath.

* SMF services can be integrated into the cluster, and all framework daemons are defined as SMF services

* PCI and SBus based systems cannot be mixed in the same cluster.

* Boot devices cannot be on a disk that is shared with other cluster nodes. Doing this may lead to a locked-up cluster due to data fencing.

The overall health of the cluster may be monitored using the cluster status or scstat -v commands. Other useful options include:

* scstat -g: Resource group status

* scstat -D: Device group status

* scstat -W: Heartbeat status

* scstat -i: IPMP status

* scstat -n: Node status

Failover applications (also known as "cluster-unaware" applications in the Sun Cluster documentation) are controlled by rgmd (the resource group manager daemon). Each application has a data service agent, which is the way that the cluster controls application startups, shutdowns, and monitoring. Each application is typically paired with an IP address, which will follow the application to the new node when a failover occurs.

"Scalable" applications are able to run on several nodes concurrently. The clustering software provides load balancing and makes a single service IP address available for outside entities to query the application.

"Cluster aware" applications take this one step further, and have cluster awareness programmed into the application. Oracle RAC is a good example of such an application.

All the nodes in the cluster may be shut down with cluster shutdown -y -g0. To boot a node outside of the cluster (for troubleshooting or recovery operations, run boot -x

clsetup is a menu-based utility that can be used to perform a broad variety of configuration tasks, including configuration of resources and resource groups.

Cluster Configuration

The cluster's configuration information is stored in global files known as the "cluster configuration repository" (CCR). The cluster framework files in /etc/cluster/ccr should not be edited manually; they should be managed via the administrative commands.

The cluster show command displays the cluster configuration in a nicely-formatted report.

The CCR contains:

* Names of the cluster and the nodes.

* The configuration of the cluster transport.

* Device group configuration.

* Nodes that can master each device group.

* NAS device information (if relevant).

* Data service parameter values and callback method paths.

* Disk ID (DID) configuration.

* Cluster status.

Some commands to directly maintain the CCR are:

* ccradm: Allows (among other things) a checksum re-configuration of files in /etc/cluster/ccr after manual edits. (Do NOT edit these files manually unless there is no other option. Even then, back up the original files.) ccradm -i /etc/cluster/ccr/filename -o

* scgdefs: Brings new devices under cluster control after they have been discovered by devfsadm.

The scinstall and clsetup commands may

We have observed that the installation process may disrupt a previously installed NTP configuration (even though the installation notes promise that this will not happen). It may be worth using ntpq to verify that NTP is still working properly after a cluster installation.

Resource Groups

Resource groups are collections of resources, including data services. Examples of resources include disk sets, virtual IP addresses, or server processes like httpd.

Resource groups may either be failover or scalable resource groups. Failover resource groups allow groups of services to be started on a node together if the active node fails. Scalable resource groups run on several nodes at once.

The rgmd is the Resource Group Management Daemon. It is responsible for monitoring, stopping, and starting the resources within the different resource groups.

Some common resource types are:

* SUNW.LogicalHostname: Logical IP address associated with a failover service.

* SUNW.SharedAddress: Logical IP address shared between nodes running a scalable resource group.

* SUNW.HAStoragePlus: Manages global raw devices, global file systems, non-ZFS failover file systems, and failover ZFS zpools.

Resource groups also handle resource and resource group dependencies. Sun Cluster allows services to start or stop in a particular order. Dependencies are a particular type of resource property. The r_properties man page contains a list of resource properties and their meanings. The rg_properties man page has similar information for resource groups. In particular, the Resource_dependencies property specifies something on which the resource is dependent.

Some resource group cluster commands are:

* clrt register resource-type: Register a resource type.

* clrt register -n node1name,node2name resource-type: Register a resource type to specific nodes.

* clrt unregister resource-type: Unregister a resource type.

* clrt list -v: List all resource types and their associated node lists.

* clrt show resource-type: Display all information for a resource type.

* clrg create -n node1name,node2name rgname: Create a resource group.

* clrg delete rgname: Delete a resource group.

* clrg set -p property-name rgname: Set a property.

* clrg show -v rgname: Show resource group information.

* clrs create -t HAStoragePlus -g rgname -p AffinityOn=true -p FilesystemMountPoints=/mountpoint resource-name

* clrg online -M rgname

* clrg switch -M -n nodename rgname

* clrg offline rgname: Offline the resource, but leave it in a managed state.

* clrg restart rgname

* clrs disable resource-name: Disable a resource and its fault monitor.

* clrs enable resource-name: Re-enable a resource and its fault monitor.

* clrs clear -n nodename -f STOP_FAILED resource-name

* clrs unmonitor resource-name: Disable the fault monitor, but leave resource running.

* clrs monitor resource-name: Re-enable the fault monitor for a resource that is currently enabled.

* clrg suspend rgname: Preserves online status of group, but does not continue monitoring.

* clrg resume rgname: Resumes monitoring of a suspended group

* clrg status: List status of resource groups.

* clrs status -g rgname

Data Services

A data service agent is a set of components that allow a data service to be monitored and fail over within the cluster. The agent includes methods for starting, stopping, monitoring, or failing the data service. It also includes a registration information file allowing the CCR to store the information about these methods in the CCR. This information is encapsulated as a resource type.

The fault monitors for a data sevice place the daemons under the control of the process monitoring facility (rpc.pmfd), and the service, using client commands.

Public Network

The public network uses pnmd (Public Network Management Daemon) and the IPMP in.mpathd daemon to monitor and control the public network addresses.

IPMP should be used to provide failovers for the public network paths. The health of the IPMP elements can be monitored with scstat -i

The clrslh and clrssa commands are used to configure logical and shared hostnames, respectively.

* clrslh create -g rgname logical-hostname

Private Network

The "private," or "cluster transport" network is used to provide a heartbeat between the nodes so that they can determine which nodes are available. The cluster transport network is also used for traffic related to global devices.

While a 2-node cluster may use crossover cables to construct a private network, switches should be used for anything more than two nodes. (Ideally, separate switching equipment should be used for each path so that there is no single point of failure.)

The default base IP address is 172.16.0.0, and private networks are assigned subnets based on the results of the cluster setup.

Available network interfaces can be identified by using a combination of dladm show-dev and ifconfig.

Private networks should be installed and configured using the scinstall command during cluster configuration. Make sure that the interfaces in question are connected, but down and unplumbed before configuration. The clsetup command also has menu options to guide you through the private network setup process.

Alternatively, something like the following command string can be used to establish a private network:

* clintr add nodename1:ifname

* clintr add nodename2:ifname2

* clintr add switchname

* clintr add nodename1:ifname1,switchname

* clintr add nodename2:ifname2,switchname

* clintr status

The health of the heartbeat networks can be checked with the scstat -W command. The physical paths may be checked with clintr status or cluster status -t intr.

Quorum

Sun Cluster uses a quorum voting system to prevent split-brain and cluster amnesia. The Sun Cluster documentation refers to "failure fencing" as the mechanism to prevent split-brain (where two nodes run the same service at the same time, leading to potential data corruption).

"Amnesia" occurs when a change is made to the cluster while a node is down, then that node attempts to bring up the cluster. This can result in the changes being forgotten, hence the use of the word "amnesia."

One result of this is that the last node to leave a cluster when it is shut down must be the first node to re-enter the cluster. Later in this section, we will discuss ways of circumventing this protection.

Quorum voting is defined by allowing each device one vote. A quorum device may be a cluster node, a specified external server running quorum software, or a disk or NAS device. A majority of all defined quorum votes is required in order to form a cluster. At least half of the quorum votes must be present in order for cluster services to remain in operation. (If a node cannot contact at least half of the quorum votes, it will panic. During the reboot, if a majority cannot be contacted, the boot process will be frozen. Nodes that are removed from the cluster due to a quorum problem also lose access to any shared file systems. This is called "data fencing" in the Sun Cluster documentation.)

* Quorum devices must be available to at least two nodes in the cluster.

* Disk quorum devices may also contain user data. (Note that if a ZFS disk is used as a quorum device, it should be brought into the zpool before being specified as a quorum device.)

* Sun recommends configuring n-1 quorum devices (the number of nodes minus 1). Two node clusters must contain at least one quorum device.

* Disk quorum devices must be specified using the DID names.

* Quorum disk devices should be at least as available as the storage underlying the cluster resource groups.

Quorum status and configuration may be investigating using:

* scstat -q

* clq status

These commands report on the configured quorum votes, whether they are present, and how many are required for a majority.

Quorum devices can be manipulated through the following commands:

* clq add did-device-name

* clq remove did-device-name: (Only removes the device from the quorum configuration. No data on the device is affected.)

* clq enable did-device-name

* clq disable did-device-name: (Removes the quorum device from the total list of available quorum votes. This might be valuable if the device is down for maintenance.)

* clq reset: (Resets the configuration to the default.)

By default, doubly-connected disk quorum devices use SCSI-2 locking. Devices connected to more than two nodes use SCSI-3 locking. SCSI-3 offers persistent reservations, but SCSI-2 requires the use of emulation software. The emulation software uses a 64-bit reservation key written to a private area on the disk.

In either case, the cluster node that wins a race to the quorum device attempts to remove the keys of any node that it is unable to contact, which cuts that node off from the quorum device. As noted before, any group of nodes that cannot communicate with at least half of the quorum devices will panic, which prevents a cluster partition (split-brain).

In order to add nodes to a 2-node cluster, it may be necessary to change the default fencing with scdidadm -G prefer3 or cluster set -p global_fencing=prefer3, create a SCSI-3 quorum device with clq add, then remove the SCSI-2 quorum device with clq remove.

NetApp filers and systems running the scqsd daemon may also be selected as quorum devices. NetApp filers use SCSI-3 locking over the iSCSI protocol to perform their quorum functions.

The claccess deny-all command may be used to deny all other nodes access to the cluster. claccess allow nodename re-enables access for a node.

Purging Quorum Keys

CAUTION: Purging the keys from a quorum device may result in amnesia. It should only be done after careful diagnostics have been done to verify why the cluster is not coming up. This should never be done as long as the cluster is able to come up. It may need to be done if the last node to leave the cluster is unable to boot, leaving everyone else fenced out. In that case, boot one of the other nodes to single-user mode, identify the quorum device, and:

For SCSI 2 disk reservations, the relevant command is pgre, which is located in /usr/cluster/lib/sc:

* pgre -c pgre_inkeys -d /dev/did/rdks/d#s2 (List the keys in the quorum device.)

* pgre -c pgre_scrub -d /dev/did/rdks/d#s2 (Remove the keys from the quorum device.)

Similarly, for SCSI 3 disk reservations, the relevant command is scsi:

* scsi -c inkeys -d /dev/did/rdks/d#s2 (List the keys in the quorum device.)

* scsi -c scrub -d /dev/did/rdks/d#s2 (Remove the keys from the quorum device.)

Global Storage

Sun Cluster provides a unique global device name for every disk, CD, and tape drive in the cluster. The format of these global device names is /dev/did/device-type. (eg /dev/did/dsk/d2s3) (Note that the DIDs are a global naming system, which is separate from the global device or global file system functionality.)

DIDs are componentsof SVM volumes, though VxVM does not recognize DID device names as components of VxVM volumes.

DID disk devices, CD-ROM drives, tape drives, SVM volumes, and VxVM volumes may be used as global devices. A global device is physically accessed by just one node at a time, but all other nodes may access the device by communicating across the global transport network.

The file systems in /global/.devices store the device files for global devices on each node. These are mounted on mount points of the form /global/.devices/node@nodeid, where nodeid is the identification number assigned to the node. These are visible on all nodes. Symbolic links may be set up to the contents of these file systems, if they are desired. Sun Cluster sets up some such links in the /dev/global directory.

Global file systems may be ufs, VxFS, or hsfs. To mount a file system as a global file system, add a "global" mount option to the file system's vfstab entry and remount. Alternatively, run a mount -o global... command.

(Note that all nodes in the cluster should have the same vfstab entry for all cluster file systems. This is true for both global and failover file systems, though ZFS file systems do not use the vfstab at all.)

In the Sun Cluster documentation, global file systems are also known as "cluster file systems" or "proxy file systems."

Note that global file systems are different from failover file systems. The former are accessible from all nodes; the latter are only accessible from the active node.

Maintaining Devices

New devices need to be read into the cluster configuration as well as the OS. As usual, we should run something like devfsadm or drvconfig; disks to create the /device and /dev links across the cluster. Then we use the scgdevs or scdidadm command to add more disk devices to the cluster configuration.

Some useful options for scdidadm are:

* scdidadm -l: Show local DIDs

* scdidadm -L: Show all cluster DIDs

* scdidadm -r: Rebuild DIDs

We should also clean up unused links from time to time with devfsadm -C and scdidadm -C

The status of device groups can be checked with scstat -D. Devices may be listed with cldev list -v. They can be switched to a different node via a cldg switch -n target-node dgname command.

Monitoring for devices can be enabled and disabled by using commands like:

* cldev monitor all

* cldev unmonitor d#

* cldev unmonitor -n nodename d#

* cldev status -s Unmonitored

Parameters may be set on device groups using the cldg set command, for example:

* cldg set -p failback=false dgname

A device group can be taken offline or placed online with:

* cldg offline dgname

* cldg online dgname

VxVM-Specific Issues

Since vxdmp cannot be disabled, we need to make sure that VxVM can only see one path to each disk. This is usually done by implementing mpxio or a third party product like Powerpath. The order of installation for such an environment would be:

1. Install Solaris and patches.

2. Install and configure multipathing software.

3. Install and configure Sun Cluster.

4. Install and configure VxVM

If VxVM disk groups are used by the cluster, all nodes attached to the shared storage must have VxVM installed. Each vxio number in /etc/name_to_major must also be the same on each node. This can be checked (and fixed, if necessary) with the clvxvm initialize command. (A reboot may be necessary if the /etc/name_to_major file is changed.)

The clvxvm encapsulate command should be used if the boot drive is encapsulated (and mirrored) by VxVM. That way the /global/.devices information is set up properly.

The clsetup "Device Groups" menu contains items to register a VxVM disk group, unregister a device group, or synchronize volume information for a disk group. We can also re-synchronize with the cldg sync dgname command.

Solaris Volume Manager-Specific Issues

Sun Cluster allows us to add metadb or partition information in the /dev/did format or in the usual format. In general:

* Use local format for boot drive mirroring in case we need to boot outside the cluster framework.

* Use cluster format for shared disksets because otherwise we will need to assume the same controller numbers on each node.

Configuration information is kept in the metadatabase replicas. At least three local replicas are required to boot a node; these should be put on their own partitions on the local disks. They should be spread across controllers and disks to the degree possible. Multiple replicas may be placed on each partition; they should be spread out so that if any one disk fails, there will still be at least three replicas left over, constituting at least half of the total local replicas.

When disks are added to a shared diskset, database replicas are automatically added. These will always be added to slice 7, where they need to remain. If a disk containing replicas is removed, the replicas must be removed using metadb.

If fewer than 50% of the replicas in a diskset are available, the diskset ceases to operate. If exactly 50% of the replicas are available, the diskset will continue to operate, but will not be able to be enabled or switched on another node.

A mediator can be assigned to a shared diskset. The mediator data is contained within a Solaris process on each node and counts for two votes in the diskset quorum voting.

Standard c#t#d#s# naming should be used when creating local metadb replicas, since it will make recovery easier if we need to boot the node outside of a cluster context. On the other hand, /dev/did/rdsk/d#s# naming should be used for shared disksets, since otherwise the paths will need to be identical on all nodes.

Creating a new shared diskset involves the following steps:

(Create an empty diskset.)

metaset -s set-name -a -h node1-name node2-name

(Create a mediator.)

metaset -s set-name -a -m node1-name node2-name

(Add disks to the diskset.)

metaset -s set-name -a /dev/did/rdsk/d# /dev/did/rdsk/d#

(Check that the diskset is present in the cluster configuration.)

cldev list -v

cldg status

cldg show set-name

ZFS-Specific Issues

ZFS is only available as a Sun Cluster failover file system, not as a global file system. No vfstab entries are required, since that information is contained in the zpools. No synchronization commands are required like in VxVM; Sun Cluster takes care of the synchronization automatically.

Zones

Non-global zones may be treated as virtual nodes. Keep in mind that some services, such as NFS, will not run in non-global zones.

Services can be failed over between zones, even zones on the same server. Where possible, it is best to use full rather than sparse zones. Certain types of failures within the non-global zone can cause a crash in the global zone.

Configuration of cluster resources and resource groups must be performed in the global zone. The rgmd runs in the global zone.

To specify a non-global zone as a node, use the form

nodename:zonename

or specify -n nodename -z zonename

Tuesday, October 27, 2015

Solaris Health check Script ...

#!/bin/sh
#this script is to perform health check to the system.

#########################
# Name:         Health Check Script #
# Call:         Excecuting the script or cronjob Or By Run it Manually #
# Purpose:      Perform Hardware Audit & Health check on the system to scan and exam #
# Created: Bikash Kumar #
#########################
#/usr/bin/clear
/usr/bin/echo “starting …”

# The Time and Date
TIME=`date ‘+20%y%m%d’`
HOST=`hostname`
OUTPUT_DIR=/tmp/
OUTPUT=$OUTPUT_DIR$HOST.$TIME.out
# Create Output File
/usr/bin/echo “Creating output file …”
/usr/bin/touch $OUTPUT
/usr/bin/echo “output file created … ”

/usr/bin/echo “############################Health Check############################” >> $OUTPUT

/usr/bin/hostname >> $OUTPUT
/usr/bin/echo “Uptime:” >> $OUTPUT
/usr/bin/uptime >> $OUTPUT
##checking Network:
/usr/bin/echo “Checking Network …”

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Checking Network:” >> $OUTPUT
/usr/sbin/ifconfig -a >> $OUTPUT

##checking Disk space:
/usr/bin/echo “Checking Disk space …”

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Checking Disks Capacity:” >> $OUTPUT
/usr/bin/df -kh >> $OUTPUT

/usr/bin/echo | format >> $OUTPUT

/usr/bin/echo “checking Processes status …”
/usr/bin/echo “##############################################################” >> $OUTPUT

/usr/bin/echo “Checking Processes:” >> $OUTPUT
/usr/bin/prstat -aR 1 2 >> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Checking System messages …”
/usr/bin/echo “Checking System messages:” >> $OUTPUT
/usr/bin/dmesg | tail -20 >> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Reporting report virtual memory statistics …”
/usr/bin/echo “Reporting report virtual memory statistics:” >> $OUTPUT
/usr/bin/vmstat 3 4 >> $OUTPUT
/usr/bin/echo “##############################################################” >> $OUTPUT

echo Reporting report I/O statistics …
/usr/bin/echo “Reporting report I/O statistics:” >> $OUTPUT
#/usr/bin/echo “Reporting I/O …”
/usr/bin/echo “Reporting I/O:” >> $OUTPUT
/usr/bin/iostat 4 3 >> $OUTPUT
/usr/bin/iostat -xPnce >> $OUTPUT
/usr/bin/echo “iostat -En:” >> $OUTPUT
/usr/bin/iostat -En| grep -i soft >> $OUTPUT
/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Reporting System Activities:” >> $OUTPUT
/usr/bin/sar 5 3 >> $OUTPUT
#/usr/bin/echo ” sar -d 3 5 Report activity for each block device” >> $OUTPUT
#/usr/bin/echo ” device %busy avque r+w/s blks/s avwait avserv” >> $OUTPUT
#/usr/bin/sar -d 3 5 >> $OUTPUT
/usr/bin/echo “kstat -m cpu_stat | egrep ‘user |idle |kernel |wait ‘ for Solaris 9 and earlier versions %wio won’t reported as 0” >> $OUTPUT
/usr/bin/date >> $OUTPUT
/usr/bin/kstat -m cpu_stat | egrep ‘user |idle |kernel |wait ‘ >> $OUTPUT
/usr/bin/sleep 10
/usr/bin/date >> $OUTPUT
/usr/bin/kstat -m cpu_stat | egrep ‘user |idle |kernel |wait ‘ >> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “report inter-process communication facilities status”
/usr/bin/echo “report inter-process communication facilities status ACTIVE Semaphores” >> $OUTPUT
/usr/bin/ipcs -s >> $OUTPUT
#/usr/bin/ipcs -mb>> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Checking Swap Space …”
/usr/bin/echo “Checking Swap Space:” >> $OUTPUT
/usr/bin/echo “swap -l:” >> $OUTPUT
/usr/sbin/swap -l >> $OUTPUT
/usr/bin/echo “_______________________________________________” >> $OUTPUT
/usr/bin/echo “swap -s:” >> $OUTPUT
/usr/sbin/swap -s | nawk -F” ” ‘{print “used “$9 ” available ” $11 ” Free Memory: “substr(substr($11,1,8)/(substr($11,1,8)+substr($9,1,8))*100,1,5)”%” }’ >> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Reporting display status for metadevice or hot spare pool:” >> $OUTPUT
/usr/sbin/metadb >> $OUTPUT
/usr/bin/echo “Reporting display status for metadevice or hot spare pool …”
/usr/sbin/metastat >> $OUTPUT

/usr/bin/echo “#################################################################################” >> $OUTPUT
/usr/bin/echo “” >> $OUTPUT
/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “CPU / MEM Log and Overall Log Records:” >> $OUTPUT
/usr/bin/grep -i afsr /var/adm/me* >> $OUTPUT
/usr/bin/grep -i afar /var/adm/me* >> $OUTPUT
/usr/bin/grep -i error /var/adm/me*>> $OUTPUT
/usr/bin/grep -i fail /var/adm/me*>> $OUTPUT
/usr/bin/grep -i panic /var/adm/me*>> $OUTPUT
/usr/bin/grep -i scsi /var/adm/me*>> $OUTPUT

/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “FCAL Disk and Loop Connection Checkup:” >> $OUTPUT
/usr/sbin/luxadm probe >> $OUTPUT
/usr/sbin/luxadm -e port >> $OUTPUT
/usr/bin/echo “##############################################################” >> $OUTPUT
/usr/bin/echo “Reporting Crash Files:” >> $OUTPUT
/usr/bin/ls -lh /var/crash/`hostname`/ >> $OUTPUT
/usr/bin/echo “##############################################################” >> $OUTPUT

/usr/bin/echo “Reporting Hardware issues:” >> $OUTPUT
/usr/platform/`uname -i`/sbin/prtdiag -v >> $OUTPUT
/usr/bin/echo “#################################################################################” >> $OUTPUT

/usr/bin/echo “script has been finished successfully …”

SSH takes long time in Solaris System

by samerodeh

While trying to ssh to solaris 11 machine, it takes sometime to give a prompt(more than usual). The issue is mostly related to DNS lookup over the network. Please check below details and how i fixed the issue.

root@cms-cluster1:~# uname -a
SunOS cms-cluster1 5.11 11.1 sun4v sparc sun4v
root@cms-cluster1:~#
root@cms-cluster1:~# ssh -V
Sun_SSH_2.2, SSH protocols 1.5/2.0, OpenSSL 0x100000bf
root@cms-cluster1:~#
root@cms-cluster1:~# echo “GSSAPIAuthentication no” >> /etc/ssh/sshd_config
root@cms-cluster1:~# echo “LookupClientHostnames no” >> /etc/ssh/sshd_config
root@cms-cluster1:~#
root@cms-cluster1:~# svcadm restart ssh
root@cms-cluster1:~#

Monday, December 22, 2014

How to reset password on Solaris’ ALOM

Connect to the serial management port.
Unplug power to the server, then connect power.
=====================================================

ALOM POST 1.0

Dual Port Memory Test, PASSED.

TTY External - Internal Loopback Test

TTY External - Internal Loopback Test, PASSED.

TTYC - Internal Loopback Test

TTYC - Internal Loopback Test, PASSED.

TTYD - Internal Loopback Test

TTYD - Internal Loopback Test, PASSED.

Memory Data Lines Test
Memory Data Lines Test, PASSED.

Memory Address Lines Test
Slide address bits to test open address lines
Test for shorted address lines
Memory Address Lines Test, PASSED.

============================================================
Press the Escape key during ALOM boot right after connecting power.
============================================================

The ALOM boot escape menu is shown
Boot Sector FLASH CRC Test
Boot Sector FLASH CRC Test, PASSED.

Return to Boot Monitor for Handshake ESC keypress detected.

ALOM <ESC> Menu

e - Erase ALOM NVRAM.
m - Run POST Menu.
R - Reset ALOM.
r - Return to bootmon.
Your selection: e {Choose e to erase NVRAM}
ALOM NVRAM erased.

ALOM <ESC> Menu

e - Erase ALOM NVRAM.
m - Run POST Menu.
R - Reset ALOM.
r - Return to bootmon.
Your selection: r {Choose r to reset ALOM}

ALOM POST 1.0
Status = 00007fff

Returned from Boot Monitor and Handshake

Instruction CACHE Test
DISABLE the I-CACHE
ENABLE the I-CACHE
Verify I-CACHE Performance Increase
Instruction CACHE Test, PASSED.

Memory Cells Test
Counting UP: Write data: 00000000
Counting DOWN: Read - Verify - Write data: ffffffff
Counting UP: Read - Verify - Write data: 55aa33cc
Memory Cells Test, PASSED.

Data CACHE Test
Verify D-CACHE Performance Increase
D-CACHE Performance Increase I-CACHE Disabled
D-CACHE Performance Increase I-CACHE Enabled
Verify D-CACHE Memory
Data CACHE Test, PASSED.

Main Sectors FLASH CRC Test
Main Sectors FLASH CRC Test, PASSED.

Loading the runtime image... VxWorks running. Starting Advanced Lights Out Manager CMT v1.3.8 Copyright 2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Current mode: NORMAL
Attaching network interface lo0... done.
Attaching network interface motfec0.... done. Booting from Segment 1

Sun(tm) Advanced Lights Out Manager CMT v1.3.8

SC Alert: SC System booted.

Full VxDiag Tests

BASIC TOD TEST
Read the TOD Clock:
SC Alert: Preceding SC reset due to watchdog
SAT JAN 01 00:02:15 2000
Wait, 1 - 3 seconds
Read the TOD Clock: SAT JAN 01 00:02:17 2000
BASIC TOD TEST, PASSED

ETHERNET CPU LOOPBACK TEST
50 BYTE PACKET - a 0 in field of 1's.
50 BYTE PACKET - a 1 in field of 0's.
900 BYTE PACKET - pseudo-random data.
ETHERNET CPU LOOPBACK TEST, PASSED

Full VxDiag Tests - PASSED

Status summary - Status = 7FFF

VxDiag - - PASSED
POST - - PASSED
LOOPBACK - - PASSED

I2C - - PASSED
EPROM - - PASSED
FRU PROM - - PASSED

ETHERNET - - PASSED
MAIN CRC - - PASSED
BOOT CRC - - PASSED

TTYD - - PASSED
TTYC - - PASSED
MEMORY - - PASSED
MPC885 - - PASSED

sc> shownetwork
SC network configuration is:
IP Address: 10.0.14.189
Gateway address: 0.0.0.0
Netmask: 255.0.0.0
Ethernet address: 00:21:28:4f:a3:1d
=====================================================
sc> setupsc {Set ALOM parameters by running the command setupsc}
Warning: the setupsc command is being ignored because the password for admin is not set.
Setting password for admin.
New password: *********

Re-enter new password: *********

Setup New IP:
=======================================================

sc> setsc netsc_ipnetmask 255.255.255.0
sc> setsc netsc_ipaddr 169.144.193.198
sc> setsc netsc_ipgateway 169.144.193.1
sc> setsc if_network true
sc> showsc
Advanced Lights Out Manager CMT v1.3.8

parameter value
--------- -----
if_network true
if_connection ssh
if_emailalerts false
netsc_dhcp true
netsc_ipaddr 169.144.193.198
netsc_ipnetmask 255.255.255.0
netsc_ipgateway 169.144.193.1
mgt_mailhost
mgt_mailalert
sc_customerinfo
sc_escapechars #.
sc_powerondelay false
sc_powerstatememory false
sc_clipasswdecho true
sc_cliprompt sc
sc_clitimeout 0
sc_clieventlevel 2
--pause-- Press 'q' to quit, any other key to continue

sc_backupuserdata true
diag_trigger power-on-reset error-reset
diag_verbosity normal
diag_level min
diag_mode normal
sys_autorunonerror false
sys_autorestart reset
sys_eventlevel 2
ser_baudrate 9600
ser_parity none
ser_stopbits 1
ser_data 8
netsc_enetaddr 00:21:28:4f:a3:1d
sys_enetaddr 00:21:28:4f:a3:14

sc> console

Tuesday, August 12, 2014

Display Default LDOM services

Purpose	Command
Check ldom manager (ldmd)	# svcs ldmd
Check vntsd is running	# svcs vntsd
Check Default Services are running	# ldm list-services primary
Check ldm software	# ldm -V
check ldoms manager package in Solaris 11	# pkg info ldomsmanager

Creating Default LDOM services

Purpose	Command
add virtual console concentrator (vcc)	# ldm add-vcc port-range=5000-5100 primary-vcc0 primary
add virtual network switch (vsw)	# ldm add-vsw net-dev=net0 primary-vsw0 primary
add virtual disk server (vds)	#ldm add-vds primary-vds0 primary
add virtual storage device to virtual disk service (Add zfs filesystem to existing Guest domain)	# zfs create -V 5G rpool/ldom01_disk01 # ldm add-vdsdev /dev/zvol/dsk/rpool/ldom01_disk01 ldom01_disk01@primary-vds0″

Removing Default LDOM services

Purpose	Command
remove virtual console concentrator (vcc)	# ldm remove-vcc primary-vcc0
remove virtual network switch (vsw)	# ldm remove-vsw primary-vsw0
remove virtual disk server (vds)	# ldm remove-vds primary-vds0
remove virtual storage device to virtual disk service	# ldm remove-vdsdev dvd-iso@primary-vds0

Start Default Services

Purpose	Command
start ldom manager	# svcadm [enable\|restart] ldmd
start vntsd	# svcadm [enable\|restrat] vntsd

Basic Guest LDOM Administration

Purpose	Command
list resources binded to a Guest Domain	#ldm list-bindings ldom01
how to identify if the current domain role ? [Control,Guest,Service or Root]	# virtinfo -a
how to check status of I/O device	# ldm list-io
how to check logical domain (ldom) status	# ldm list-domain -o domain ldom01
list the status of all the guest domains on the system	# ldm list
how to manually list the LDOM config on a system	# ldm list-bindings [ldom_name]
list current LDOM configuration in Solaris	# ldm list-spconfig
Check CPU activation	# ldm list-permits
Check Autoreplacement policy for CPU	# svccfg -s ldmd listprop ldmd/autoreplacement_policy_cpu

stop/start/break/unbind/bind

Purpose	Command
issue send break	# telnet localhost 5000 telnet> send brk
stop Guest Domain	# ldm stop ldom01
start Guest Domain	# ldm start ldom01
unbind Guest Domain	# ldm unbind ldom01
bind Guest Domain	# ldm bind ldom01

Add/Create/Assign

Purpose	Command
Add Guest Domain	# ldm add-domain ldom01
assign cpu threads to Guest Domain	# ldm add-vcpu 6 ldom01
assign vcpu units of cores	# ldm add-core, ldm set-core [number] [ldom]
assign memory to Guest Domain	# ldm add-memory 4G ldom01
add vnet device to Guest Domain	# ldm add-vnet vnet1 primary-vsw0 ldom01
assign disk resource to Guest Domain	# ldm add-vdisk ldom01-disk01 ldom01-disk01@primary-vds0 ldom01

Remove/Delete

Purpose	Command
Remove a Guest Domain	# ldm remove-domain ldom01
Remove disk resource from Guest Domain	# ldm remove-vdisk vdisk01 ldom01
Remove virtual network device from a Guest Domain	# ldm remove-vnet vnet1 ldom01
Remove CPU threads from a Guest Domain	# ldm remove-vcpu 8 ldom01
Remove virtual cpu units in cores from a Guest Domain	# ldm remove-core 2 ldom01
Remove memory from a Guest Domain	# ldm remove-memory 8G ldom01

Save LDOM Config

Purpose	Command
save ldom configuration to the SP	# ldm add-spconfig newconfig
backup of existing configuration from the control domain	# ldm list-constraints -x > /var/tmp/guest-domain-name.xml # ldm list-bindings > /var/tmp/full-bindings # ldm ls -l > /var/tmp/guest-domain-list.xml”

Miscellaneous Commands

Purpose	Command
identify physical resources bindings	# ldm list-constraints
login to the console of a Guest Domain	# telnet localhost 5001
Enable/Disable console loggging function for a Guest Domain	# ldm set-vcons log=[off\|on] [dom-name]
Display current console settings of a Guest Domain	# ldm list -o console ldom01
list all LDOM config from SP with timestamp	-> show /HOST/domain/configs date_created -t
list current LDOM config from SP	-> show /HOST/bootmode config -t
Generate crashdump from SP	-> set /HOST/send_break_action=dumpcore
Crash a guest domain from the control domain	# ldm panic-domain ldom01
to check failed cpu or memory components from Control Domain	# ldm list-domain -l -S

Monday, June 16, 2014

Brocade: Configuring IP addresses

The Brocade DCX 8510-8 requires three IP addresses, which are configured using the ipAddrSet command. IP addresses are required for both CP blades (CP0 and CP1) and for the chassis management IP (shown as SWITCH under the ipAddrShowcommand) in the Brocade DCX 8510-8.

NOTE: The default IP addresses and host names for the Brocade DCX 8510-8 are:
– 10.77.77.75 / CP0 (the CP blade in slot 6 at the time of configuration)
– 10.77.77.74 / CP1 (the CP blade in slot 7 at the time of configuration)

ATTENTION! Resetting an IP address while the Brocade DCX 8510-8 has active IP traffic or has management and monitoring tools running, such as DCFM, Fabric Watch, and SNMP, can cause traffic to be interrupted or stopped.

Complete the following steps to set the IP addresses for the Brocade DCX 8510-8.

Set up the Brocade DCX 8510-8 IP address by entering the ipaddrset -chassis command:

swDir:admin> ipAddrSet -chassis

Enter the information at the prompts. Specify the -chassis IP address. The -sw 0 IP address is not valid on this chassis.

NOTE: The addresses 10.0.0.0 through 10.0.0.255 are reserved and used internally by the Brocade DCX 8510-8. External IPs must not use these addresses.

Set up the CP0 IP address by entering the ipaddrset -cp 0 command:

swDir:admin> ipAddrSet -cp 0

Enter the information at the prompts.

Set up the CP1 IP address by entering the ipaddrset -cp 1 command:

swDir:admin> ipAddrSet -cp 1

Enter the information at the prompts.

This is a sample IP configuration:

swDir:admin> ipaddrset -chassis

Ethernet IP Address [0.0.0.0]: 192.168.1.1

Ethernet Subnetmask [0.0.0.0]: 255.255.255.0

Fibre Channel IP Address [0.0.0.0]:

Fibre Channel Subnetmask [0.0.0.0]:

Issuing gratuitous ARP...Done.

Committing configuration...Done.

swDir:admin> ipaddrset -cp 0

Host Name [cp0]:

Ethernet IP Address [10.77.77.75]: 192.168.1.2

Ethernet Subnetmask [0.0.0.0]: 255.255.255.0

Gateway IP Address [0.0.0.0]: 192.168.1.254

IP address is being changed...Done.

Committing configuration...Done.

swDir:admin> ipaddrset -cp 1

Host Name [cp1]:

Ethernet IP Address [10.77.77.74]: 192.168.1.3

Ethernet Subnetmask [0.0.0.0]: 255.255.255.0

Gateway IP Address [0.0.0.0]: 192.168.1.254

IP address of remote CP is being changed...Done.

Committing configuration...Done.

Wednesday, June 11, 2014

Linux Kickstart POST Script to Bond Two NICs

=========================================

for i in $(ifconfig -a | sed 's/[ \t].*//;/^\(lo\|\)$/d' | grep eth)
do
STATUS=$(ethtool $i | grep 'Link detected' | awk -F: '{print $2}')

if [ $STATUS == 'yes' ]; then
                COUNTER=$((COUNTER+1))

                NIC[$COUNTER]="$i"
fi
done

cat << EOF1 >/etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
IPADDR=$IPADDRESS
NETMASK=$NETMASK
USERCTL=no
BONDING_OPTS="mode=1 miimon=100 primary=${NIC[1]}"
EOF1

cat << EOF2 >>/etc/sysconfig/network-scripts/ifcfg-${NIC[1]}
MASTER=bond0
SLAVE=yes
EOF2

cat << EOF3 >>/etc/sysconfig/network-scripts/ifcfg-${NIC[2]}
MASTER=bond0
SLAVE=yes
EOF3

echo 'alias bond0 bonding' >> /etc/modprobe.conf