Monday, 21 January 2008

Oracle OID performance tuning

At a client's site we were recently asked to investigate performance issues for an existing single sign-on (SSO) server on the top of an Oracle Internet Directory (OID) box. This post self-documents the issue and the solution we arrived at, and maybe useful to other readers hitting similar issues. This post isn't a particular an easily-digestible blog entry so may not be of interest to the casual blog reader.

Performance Problems

Within an existing production environment, logging into Oracle Portal using SSO for users in a defined LDAP group (lets refer to this group as Alpha) took an unacceptable amount of time (~25 sec) and needed to be addressed. The Alpha users are the predominate users of the system. Logging in as a second LDAP group (referred to hereafter as Bravo) experienced acceptable sub-second responses. Once logged in the rest of the Oracle Portal system behaved acceptably indicating a performance issue with SSO and OID.

Other performance issues were also seen, including within the OID web directory interface, selecting the Alpha group simply did not return, while selecting the Bravo group returned after a considerable amount of time.

In the case of the development environment, much faster responses were experienced compared to the production environment and the OID web directory did return in both cases.

The poor to non existent performance relating to users within the Alpha LDAP group on production was at issue and had to be addressed.

(As a sub-note to this section, there were also other speed related issues to this problem. On attempting to manually load users into OID via the PL/SQL package dbms_ldap, we discovered in development loads times for sets of 1000 members in the Alpha group went from 1 minute to 15 minutes.)

OID LDAP Characteristics

As there was a considerable difference in performance between the development and production environments, we first decided to investigate what was different between the two environments. We discovered the following:

Development LDAP Characteristics

Contains 29 groups, most of which had 10 entries or less
Group Alpha had ~25,000 entries
Group Bravo had ~2,800 entries
Login took approximately 1 second as an Alpha member.

Production LDAP Characteristics

Contains 25 groups, most having 10 entries or less
Group Alpha had ~140,000 entries
Group Bravo had ~4,900 entries
Login took approximately 25 seconds as an Alpha member.

As can be seen, the key difference between the 2 environments was the 25,000 vs 140,000 members in the Alpha group.

OID Performance Tuning Guidelines

At this time we started a search for OID tuning recommendations, and pretty quickly stumbled across the document Oracle Internet Directory Tuning and Configuration - A Quick Reference Guide September 2007.

One of the key points to take from the OID document is the 2nd paragraph:

"The out of the box OID configuration is not optimal for most production or test deployments. It is hence imperative that some basic tuning and configuration changes are performed after OID installation to achieve optimal performance and availability."

Note our emphasis on "is not optimal for most production or test deployments." I'm unsure if this comment is only mentioned within the tuning documentation; I hope Oracle has taken care to document this elsewhere.

For the OID setup we'd inherited this had not been undertaken, and as such we decided (perhaps wisely! ;) to follow the recommendations of the Oracle tuning documentation. The rest of this post described the current environment as relevant to the tuning recommendations, as well as the recommendations we decided to follow, and the steps to implement them.

Pre-Tuning Upgrade Server Characteristics

The following settings describe our development and production database server and OID settings before the tuning upgrade, as relevant to the tuning recommendations. On reading the tuning documentation the reason why these are documented here will become clear:

Development Database Characteristics

Hardware: 4 CPU 4 GB RAM
Database version: 10.1.0.3.0 EE
OID version: 9.0.4.1.0

init.ora settings:

sga_target: 1392508928 (big integer)
sga_max_size: 1392508928 (big integer)
db_cache_size: 150994944 (big integer)
shared_pool_size: 150994944 (big integer)
session_cached_cursors: 0
cursor_space_for_time: FALSE
processess: 600 (integer)
pga_aggregate_target: 463470592 (big integer)

Bitmap index:

CREATE INDEX ei_attrstore ON ds_attrstore (entryid);


Development OID Settings

orclmaxcc (Max Number of DB Connections): 2
orclserverprocs (Number of Child Processes): 1
orclecacheenabled (Enable Entry Cache): 1
orclskiprefinsql (Skip Referral Process): 0
orclldapconntimeout (LDAP Connection Timeout): 0
orclmatchdenabled (Enable MatchDN Processing): 1

Production Database Characteristics

Hardware: 4 CPU 8 GB RAM
Database version: 10.1.0.3.0 EE
OID version: 9.0.4.1.0

init.ora settings:

sga_target: 1392508928 (big integer)
sga_max_size: 1392508928 (big integer)
db_cache_size: 905969664 (big integer)
shared_pool_size: 285212672 (big integer)
session_cached_cursors: 0
cursor_space_for_time: FALSE
processess: 150 (integer)
pga_aggregate_target: 408944640 (big integer)

Bitmap index:

CREATE INDEX ei_attrstore ON ds_attrstore (entryid);


Production OID Settings

orclmaxcc (Max Number of DB Connections): 2
orclserverprocs (Number of Child Processes): 1
orclecacheenabled (Enable Entry Cache): 1
orclskiprefinsql (Skip Referral Process): 0
orclldapconntimeout (LDAP Connection Timeout): 0
orclmatchdenabled (Enable MatchDN Processing): 1

OID Tuning Recommendations Accepted

As documented in the OID tuning document, following are the recommendations we decided to follow:

Database init.ora Entries

Our database does not use automatic tuning. The following settings were therefore recommended:

db_cache_size: 1200M
shared_pool_size: 200M

In addition we accepted the following recommendations:

session_cached_cursors: 100
cursor_space_for_time: TRUE
pga_aggregate_target: 300M
processes: 500

OID configuration changes

The following tuning recommendations were accepted within the Oracle tuning documentation and are applicable to our OID environment:
  • As per point 1 on page 5 of the Oracle documentation, update orclmaxcc=10 (Max Number of DB Connections) and orclserverprocs=4 (number of CPUs). The LDAP instance needs to be restarted at the end of this change.
  • As per point 3 on page 5 & 6 of the Oracle documentation, LDAP referrals are not used, allowing orclskiprefinsql (Skip Referral Process) to be set to 1. Our understanding is LDAP referrals is when an LDAP entry refers to another LDAP entry in entirely different LDAP tree. As we only have one OID LDAP install in each environment, referrals are not used.
  • As per point 4 on page 6 of the Oracle documentation, set the orclldapconntimeout=60 (LDAP Connection Timeout).
  • As per the OID Database Statistics section, run oidstats.sql once, as we had already preloaded considerable amount of members into the Alpha group. I suspect this step is not that relevant as our 10g database will have collated statistics on the underlying table once a night anyway.
  • As per the Using OID Server Entry Cache section on page 10, as there will be frequent updates to the directory in our environment, it's suggested Entry Cache is turned off (orclecachenabled=0).
  • As per the Large Group Entries section on page 7, the following SQL DDL will be run:
drop index ei_attrstore;

create bitmap index ei_attrstore

on ds_attrstore (entryid, attrname)

tablespace olts_attrstore nologging compute statistics;


alter index ei_attrstore noparallel;


OID Tuning Recommendations Rejected

The following points were considered within the Oracle tuning documentation and rejected with the following reasons:
  • As per point 2 on page 5 of the Oracle documentation, DIP features are currently used for the Active Directory and Portal integration.
  • As per point 5 on page 6 of the Oracle documentation, in contacting the author Ajay Keni of the Oracle tuning document, Ajay noted this tuning feature is important if the depth of the BaseDN (or its individual components) is substantial. It shouldn't apply in our client's case as the baseDN is appears to be relatively short "dc=aaa,dc=bbb,dc=ccc,dc=au"
Ajay Keni gave the following description:

(start quote)
Lets say, a client sends an LDAP search with BaseDN "cn=jdoe,ou=finance,l=emea,dc=acme,dc=com".

Now, lets say as part of the search request processing OID figures out that this DN does not exist in the Directory. According to the protocol specification, OID is required to figure out whether "ou=finance,l=emea,dc=acme,dc=com" exists, if it doesn't exist, then figure out whether "l=emea,dc=acme,dc=com" and so on until "dc=com". The nearest DN in the heirarchy will be sent as the matchDN value as 'additional info' in the search result. This is the default behaviour.
If you set "orclmatchdnenabled=0", then OID will only check for the existence of the immediate parent i.e. "ou=finance,l=emea,dc=acme,dc=com" and not all the way up to "dc=com".

As you can see, this only comes into play when the baseDN of the search doesn't exist in the Directory and the change in behaviour with this setting affects the 'additional info' part of the result and not the result itself. But, in some deployments we have seen such queries coming in and depending on the depth of the DIT the processing for matchDN can be expensive.
(end quote)

Production OID Change Plan

With the above recommended changes accepted, the following documents our installation change plan:

(Note some of the following OID setting changes are dependent on the version of Oracle Directory Manager client tool you have installed. Unfortunately subsequently I've uninstalled my Oracle client and cannot give you the specific version at this time)
  • Ensure nightly backups of the OID is complete.
  • Notify all users that the OID server will be unavailable (and therefore all systems relying on OID and SSO for login).
  • Gracefully stop the OID OAS server.
  • As the database administrator, change the following init.ora parameters:
* db_cache_size: 1200M
* shared_pool_size: 200M
* session_cached_cursors: 100
* cursor_space_for_time: TRUE
  • Restart the OID database.
  • Restart the OID OAS server.
  • Start the Oracle Directory Manager client tool, and log in as cn=orcladmin on the production server.
  • Select the root level cn=orcladmin@.... node.
  • Select the System Operational Attributes tab on the right hand side if not already selected.
  • Set "Enable Entry Cache" (orclecachenabled) = 0 (record the current setting for rollback purposes)
  • Select the Query Optimisation tab.
  • Set "Skip Referral Process" (orclskiprefinsql) = 1 (record the current setting for rollback purposes)
  • Set "LDAP Connection Timeout" (orclldapconntimeout) = 60. (record the current setting for rollback purposes)
  • In the object tree, expand the Server Management option, then the Directory Server option, and finally select the Default Configuration Set option.
  • Select the General tab on the right hand side if not already selected.
  • Set "Number of Child Processes" (ocrlserverprocs) = 4 (record the current setting for rollback purposes)
  • Set "Max Number of DB Connections" (orclmaxcc) = 10 (record the current setting for rollback purposes)
  • Exit Oracle Directory Manager.
  • Restart the OAS.
  • Check that all services are back up.
  • Open a telnet/ssh session to the OAS host.
  • Set the environment to infrastructure database.
  • cd $ORACLE_HOME/ldap/admin
  • Start SQL*Plus as the ODS user.
  • Run oidstats.sql (this may take a while)
  • Exit
  • Within OEM stop all OID instances.
  • Log in as the ODS user via SQL*Plus in production, and run the following SQL:
drop index ei_attrstore;

create bitmap index ei_attrstore

on ds_attrstore(entryid, attrname)

tablespace olts_attrstore nologging compute statistics;


alter index ei_attrstore no parallel;

  • Restart the OID instance within OEM.
  • Log into the Production portal as Alpha user and judge performance.
  • Log into the Production portal as Bravo user and judge performance.
  • Notify users that the system is now available.
Production Success

On completion of the above changes, logging in as either a member of the Alpha or Bravo groups saw sub-second responses via the SSO to Portal web pages. In the case of the Alpha members this was an improvement by a factor of 25 times, and provided acceptable login times.

I'd like to thanks Ajay Keni and Vikki Lira for their assistance in writing this post.

If you read this posts, other links you may be interested in:
100M User Directory Benchmark OID v10.1.4

3 comments:

Dan Norris said...

Hi Chris,

Great info and I liked the writeup. I'll hang on to this URL as I suspect it will come in handy someday. Thanks for summarizing things so neatly into Dan-proof steps :) !

Phil Lembo said...

Thanks for this extremely valuable contribution.

"I hope Oracle has taken care to document this elsewhere."

Unfortunately, they have not. Although the tuning doc you cite has been updated recently, Oracle continues to ignore these issues entirely in their training and at best bury them in the product documentation. Vive le Internet!

pjmolina said...

Invaluable post. Thank you very much for sharing it!