Support Singapore

Support Singapore YOG 2010

Tuesday, March 12, 2013

SOLARIS and DISM

Recently i ran into lot of Database Performance issues so i though of writing it here so that anyone who faces slowness in Solaris can be benefitted.

Solaris : 10
Oracle:11.2.03.
16 Core, 2 socket and 128 vCPU
Oracle T4-2 Rac One Database:
120GB RAM

Size 800GB and growing at 5GB per day.

We were running the DB with 20GB Memory without any issues for the almost 4 months and suddenly the ORA-04031 started appearing on shared Pool...after some testing we decided to increase the memory as we had 120GB Memory. THe database was configured with AMM(Memory_Max and Memory_Max_Target) parameter. We increase the Value to 30GB and set a default value of 10GB for shared pool. Here is where everything happened. The database startup took 15 minutes to start the instance and was opened. By the time the DB was started the server begin to Crawl. The DB server was absolutely slow and we had to revert to 20GB again and bounced the DB(basically downtime and pressure). The DB started fine and it was running...
Here is where all troubleshooting started..After checking lot of documents in metalink and contacting Oracle Support , The database was running on DISM(Dynamic Intimate shared Memory) which gives the felixibility to the application to lock the memory rather than OS and in this DB will lock and release memory. DISM also lock the equivalent amount of RAM on the disk swap as well. THis is where all went wrong becuase we had 30 GB of diskswap. For eg. if you want to allocate 40GB of memory to DB you need to have 40GB of disk swap allocated anything less than the Memory assingned will lead to server hanging or slow to respond as it has exhausted the disk swap.

DISM:
Oracle 40GB Memory=40GM Memory +40GB Disk swap=Vitrual swap

so the recommendation was to change the DB behavior from DISM to ISM

Unset the Memory_Max and Memory_Max_Target.

set the SGA_MAX_SIZE and SGA_TARGET to a same value. When we set this value Shared memory is allocated and locks the Memory during the startup and it doesnt care about the disk swap and the DB starts in few seconds....

If you have DISM enable make sure you have disk swap equivalent to memory allocation

ps -ef | grep dism

will display if the DISM process running..

ps -ef | grep oracle


Get the process id and check if the process is using dism or ism

pmap -xs  
will display lot of things and look for
ism shmid.......

In solaris the virtual disk usage is really hard to interpret

If using DISM

swap -l will give physically allocated size

vmstat 5 10 will display the disk space usage especially the swap column.....

in ISM

VMSTAT 5 10

now the swap columns means disk swap free + memory free...

use echo ::memstat | mdb -k
to find the accurate usage of the Memory by oracle... which will be displayed as ANON.













Saturday, March 2, 2013

ZFS FILE DATA and Performance issues in Solaris 10

I wanted to start blogging just becuase of the sheer number of issues i face on production systems and thought of sharing some real tough issues.

We have quite a lot Sun T4-2 boxes for our Oracle Databases running Oracle 11.2.0.3. This server is quite a powerful server with 2 Socket(16Core) with 120GB of Memory. Off late we started to encounter ora-04031 in our production systems.We saw rapid growth in the shared pool and subsequently hitting memory issues.

Initially when the system went live we allocated  20GB of Memory to the DB with AMM Enabled. After discussion we decided to increase the Oracle DB memory to 40GB . This is where all started to happen, When we tried to bounce to database it took More than usual timing to open the DB.The Server became really slow and the database was crawling.

prstat -t shows Oracle running at 23GB and OS as 2%  but the overall system has 120GB of memory .After couple of hours after discussing with the management we decide to revert to 20GB. The system was back to normal but when it grew beyond including all virtual memories it begin to hang again....Since this was RAC system we decided to shutdown this Instance to investigate. We got a clue that this Server begins to crawl when the memory begins to go beyond 20GB....
Severity was raised with Oracle Solaris Team but they asked us to check with Oracle DB Team . But i was very clear it was something to do with some setup.

After a day of scratching my head it was found out that ZFS FILE DATA was consuming 80GB of Memory which is the maximum value(3/4 of Physcial RAM) that can be allocated to ZFS ARC SIZEwhich is the default value. Then we have decided to cap a limit for the ZFS FILE DATA with the parameter ZFS_ARC_MAX in the etc/system .The reason for why ZFS FILE DATA went to 75% is unknown :(


It was really a nightmare to see a FIle system cache occupying 80GB.

Prstat -t --will never accurately give the usage

vmstat 5 10 should give more accurate free  memory usage

mdb -k then followed by ::memstat will give clear picture on the memory usage

NIce day.



Monday, October 1, 2012

Oracle Rac One

I felt something cool about Rac one node database and i am sharing my experience here.

We have a 2 node Sun Sparc cluster with 11.2.0.3 with one active and other one being passive, Suddenly one fine day we saw the Instance switched to node 2 and my colleague was fighting hard to find the reason for the  node switch and raised a SR with oracle and then we later found the /oracle installed direcotory was completetly exhausted and node swithced...


But this is really a cool feature of Oracle 11G.



Monday, December 28, 2009

Listener Dynamic registration and Service name

There was a discussion in oracle forums about the SIDvs Service name...

After going through the thread i just realised that

in 10g on a Dynamic registered instance we will be able to connect only using SERVICE_NAME and not using the SID_NAME..



C:\WINDOWS\system32>lsnrctl status

LSNRCTL for 32-bit Windows: Version 10.2.0.4.0 - Production on 29-DEC-2009 10:43
:35

Copyright (c) 1991, 2007, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=pc-44.)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for 32-bit Windows: Version 10.2.0.4.0 - Produ
ction
Start Date 29-DEC-2009 10:14:38
Uptime 0 days 0 hr. 28 min. 57 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File D:\ORACLE\PRODUCT\10.2.0\DB_1\network\admin\listener.o
ra
Listener Log File D:\ORACLE\PRODUCT\10.2.0\DB_1\network\log\listener.log

Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=pc-44)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

C:\WINDOWS\system32>tnsping test

TNS Ping Utility for 32-bit Windows: Version 10.2.0.4.0 - Production on 29-DEC-2
009 10:43:53

Copyright (c) 1997, 2007, Oracle. All rights reserved.

Used parameter files:
D:\ORACLE\PRODUCT\10.2.0\DB_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (SOURCE_ROUTE=yes) (ADDRESS
=(PROTOCOL=tcp)(HOST=pc-44)(PORT=1630)) (ADDRESS=(PROTOCOL=TCP)(HOST = PC-44)(PO
RT = 1522))) (CONNECT_DATA = (SID = TEST)))
TNS-12541: TNS:no listener

C:\WINDOWS\system32>tnsping test

TNS Ping Utility for 32-bit Windows: Version 10.2.0.4.0 - Production on 29-DEC-2
009 10:44:16

Copyright (c) 1997, 2007, Oracle. All rights reserved.

Used parameter files:
D:\ORACLE\PRODUCT\10.2.0\DB_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (SOURCE_ROUTE=yes) (ADDRESS
=(PROTOCOL=tcp)(HOST=pc-44)(PORT=1630)) (ADDRESS=(PROTOCOL=TCP)(HOST = PC-44)(PO
RT = 1521))) (CONNECT_DATA = (SID = TEST)))
TNS-12541: TNS:no listener

C:\WINDOWS\system32>tnsping test

TNS Ping Utility for 32-bit Windows: Version 10.2.0.4.0 - Production on 29-DEC-2
009 10:45:58

Copyright (c) 1997, 2007, Oracle. All rights reserved.

Used parameter files:
D:\ORACLE\PRODUCT\10.2.0\DB_1\network\admin\sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS=(PROTOCOL=TCP)(HOS
T = PC-44)(PORT = 1521))) (CONNECT_DATA = (SSERVICE_NAME = TEST)))
OK (60 msec)

C:\WINDOWS\system32>

Tuesday, December 8, 2009

DR excercise with image Backup

I just want the share the experience we had last week when we were doing our Annual DR excercise for our systems

1.There are daily backups(hotbackups) to tape and also entire system image backups..These backups were taken when the DB was running

SUNDAY--Hotbackup
Rest of the Days--Archivelog backups



We did restore the image backups along with tape backups on Saturday
We had 2 production database , 1 was heavily utilised and the other one was just sitting idle..

When we try to start the DB , the database that was idle came up without any issue but the Heavily accesed database didnt comeup...with error messages FUZZY backup..File header verification failed.....

Next Step

1. Resored the Sunday Full backup
2.Restored all the archivelogs upto saturday

I tried restoring a particular file from the backup and applied all the logs but when the point where the backup was taken reached , the redolog has corrupted datablock and the recovery halted....couldnt open the DB but need to restore the entire db from the hotbackup...

Erros
Fuzzy backup
header verification failes....


Do rely on image backups in case of Disaster ..

Thursday, May 14, 2009

JDBC and Reverse DNS lookup

We had a strange application slowness in one of our HRIS system and let me describe what happened.

This is our environment

1.Oracle 8.1.7.4.1--Dont ask me why it was never migrated :)
2.Toplink
3.Tomcat 5.5(Uses JDK 1.5)
4.JDK1.5(Appserver builtin JDK1.5)
5.IIS 6.


We have a HRIS system(OLTP ) providing Employee self service for clients.

OUr IIS and APpserver sits on a same Physical server and DB in a diferent box. To comply with auditing standards we are asked to move the IIS to a DMZ(Which is quite normal but we havent so far) separate the APPServer from IIS.


New set up should be similiar to this

IIS-->AJP(1.3)--Tomcat--Appserver--JDBC--DB
IIS will be redirecting to the TOmcat container and then from Tomcat to APPserver. TOmcat and APpserver sits on a Same Server. Different colors reprsents different boxes.

We purchased Brand new servers to replace all the existing servers except the DB server but the IPs/Names were retained to minimize the complexity during migration.All the applications were migrated without any issue and everything looked fine for 2 weeks but all that was waiting so long to explode finally exploded.

We the concurrent access increases we were simply not able to go beyond the login page(Which is the first page). People who have logged in dont have any problems but the problem lied in login page .

Our appserver makes a set of connection when we startup and then makes new set of connections based on the usage

We tried increasing at the Tomcat MAx number of threads, Timeout,Enable lookss=False. After a week long we couldnt arrive at a conclusion. Appserver CPU and DB server CPU utlization hardly went beyond 10% during the slowness so we came to a conslusion that there is no resource shortage.

When contacted the application team they were simply saying "its all because of migration that you guys did, it was working before why now, so please look at the application setup"

Since the iSsue became very critical Development team as requested to look into the issues,We got the thread dump during the problem which stated the message below

at java.net.Inet4AddressImpl.getHostByAddr(Native Method)
at java.net.InetAddress$1.getHostByAddr(InetAddress.java:842)
at java.net.InetAddress.getHostFromNameService(InetAddress.java:532)
at java.net.InetAddress.getHostName(InetAddress.java:475)
at java.net.InetAddress.getHostName(InetAddress.java:447)
at java.net.InetSocketAddress.getHostName(InetSocketAddress.java:210)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:341)
at java.net.Socket.connect(Socket.java:507)
at java.net.Socket.connect(Socket.java:457)
at java.net.Socket.(Socket.java:365)
at java.net.Socket.(Socket.java:178)
at oracle.net.nt.TcpNTAdapter.connect(Unknown Source)
at oracle.net.nt.ConnOption.connect(Unknown Source)
at oracle.net.nt.ConnStrategy.execute(Unknown Source)
at oracle.net.resolver.AddrResolution.resolveAndExecute(Unknown Source)
at oracle.net.ns.NSProtocol.establishConnection(Unknown Source)
at oracle.net.ns.NSProtocol.connect(Unknown Source)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:706)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.


The first few lines gave some hint the we are having some problems in the DNS server. When we tried NSLOOKUP IP it didnt work but NSLOOKUP with HOSTNAME works so we cam e to a conclusion that reverse dns lookpup fails which matched our error message along with a Metalink doc.

As a fix we added the Other server names in etc/host file of appserver and DB server.

Oh no...There was problem after that and then we came to know that JDBC uses Reverse DNSlookup to make connections.

Happy reading

Wednesday, February 18, 2009

Oracle Dataguard in 10GR2

Its been a strange experience for me.This is the first time i set up dataguard configuration on Oracle 10G. Its so straighforward but its too annoying.

I was stuck with this error almost 100 times and i realy dont know why

ORA-16072: a minimum of one standby database destination is required


This is what i did.
1.Forced the Primary to forced logging mode
2.Create pfile2 from spfile
3.Create stdby controlfile

Copied the data,Redolog files, Renamed the controlfile as per init,ora file
Primary:
Added Log_archive_dest_2='Service=STDBY LGWR Sync affirm'