Fedora Miscellaneous Problems – Slapd at 100% Load

 

There can be a number of factors which can affect this assuming that you have enough ram and processing power.

 

Most likely assuming everything else is working fine, there may be a time skew with your Fedora Directory Server and your clients. They all should have the same time set ie run a network time protocol daemon (ntpd) or sync the time on the machines via a script ie sych as rdate –s time.uwa.edu.au

 

The other problem I’ve noticed there is a memory leak in Fedora Directory Server 1.02, the only reason why I’ve noticed this is I’ve left it idle and network less ie unplug the Ethernet cable and the memory usage for the slapd grew on my test box. Which explains memory usage of the Directory server on my production box which slapd services need to be restarted roughly when the memory usage reached about 1.5GB else the box would crash sooner or latter because of the physical amount of the ram on the box was only 2GB. When the LDAP uses a huge amount of RAM its load on the server is tremendous which idles anything like 30% to 100%. Solution upgrade  to Fedora Directory Server to 1.04 (I don’t really trust upgrades, I would rather dump the entire LDAP directory to an ldif file and resinstall a new copy FDS 1.04 then reimport the LDIF file) which seems to fix the load, not it rarely reaches that amount of memory usage nor the load associated with it.

 

The second mostly like problem is that your server is inundated with two many requests either from too many clients or no of request coming from the client. This is usually solved by configuring your name service cache daemon (nscd), what nscd does, is it caches the information locally instead of repeatedly querying your LDAP server at regular intervals. So imagine couple of hundred of machines each issuing queries every 4 seconds or so. To configure NSCD, edit /etc/nscd.conf

 

cat /etc/nscd.conf

#

# /etc/nscd.conf

#

# An example Name Service Cache config file.  This file is needed by nscd.

#

# Legal entries are:

#

#       logfile                 <file>

#       debug-level             <level>

#       threads                 <initial #threads to use>

#       max-threads             <maximum #threads to use>

#       server-user             <user to run server as instead of root>

#               server-user is ignored if nscd is started with -S parameters

#       stat-user               <user who is allowed to request statistics>

#       reload-count            unlimited|<number>

#       paranoia                <yes|no>

#       restart-interval        <time in seconds>

#

#       enable-cache            <service> <yes|no>

#       positive-time-to-live   <service> <time in seconds>

#       negative-time-to-live   <service> <time in seconds>

#       suggested-size          <service> <prime number>

#       check-files             <service> <yes|no>

#       persistent              <service> <yes|no>

#       shared                  <service> <yes|no>

#       max-db-szie             <service> <number bytes>

#

# Currently supported cache names (services): passwd, group, hosts

#

 

 

#       logfile                 /var/log/nscd.log

#       threads                 6

#       max-threads             128

        server-user             nscd

#       stat-user               nocpulse

        debug-level             0

#       reload-count            5

        paranoia                no

#       restart-interval        3600

 

        enable-cache            passwd          yes

        positive-time-to-live   passwd          500000

        negative-time-to-live   passwd          20

        suggested-size          passwd          211

        check-files             passwd          yes

        persistent              passwd          yes

        shared                  passwd          yes

        max-db-size             passwd          33554432

 

        enable-cache            group           yes

        positive-time-to-live   group           500000

        negative-time-to-live   group           60

        suggested-size          group           211

        check-files             group           yes

        persistent              group           yes

        shared                  group           yes

        max-db-size             group           33554432

 

        enable-cache            hosts           yes

        positive-time-to-live   hosts           1000000

        negative-time-to-live   hosts           20

        suggested-size          hosts           211

        check-files             hosts           yes

        persistent              hosts           yes

        shared                  hosts           yes

        max-db-size             hosts           33554432

 

The values you should be editing is the values for positive-time-to-live for the various files ie passwd and group for user and group information which is repeatedly queried for LDAP information.

 

If caching the information from the LDAP via NSCD doesn’t reduce the load you really only have one option. There is too many queries to your server and you need to distribute the load. This where you multiple LDAP servers, where you have a Master LDAP server and several slave LDAP server. All modifications and changes to the LDAP diretory is done to the Master LDAP server which are replicated down to the slave LDAP servers. And all queries are directed to the slave LDAP servers. I might around to writing how to do this but I just simply don’t have the time nor need to do it as of yet.

 

The last option is the slapd database information ie index information is corrupted, which you may have to rebuild the database information. A smart person would periodically dump the information into an ldif file. Ie dump the people information from ou=People,dc=csse,dc=uwa,dc=edu,dc=au, dump the group information for ou=Groups,dc=csse,dc=uwa,dc=edu,dc=au.

 

Usually what I have done once I’ve set up the Fedora Directory Server configured correctly ie with SSL etc, I would tar ball the whole directory of Fedora Directory Server as I mentioned before its self contained. I would untar the installation and re-import the ldif file and basically its back up. When you dump the ldif file usually you have to go to each entry and add “changetype: add” before the objectClass.

 

If you have a couple thousand entries its best to use an automated too like sed to do this ie

 

cat people-backup.ldif | sed s/”objectClass: top”/”changetype: add\nObjectclass: top”/g >> people-backup-import.ldif

cat group-backup.ldif | sed s/”objectClass: top”/”changetype: add\nObjectclass: top”/g >> group-backup-import.ldif

 

Now you appended the necessary options to your backup ldif just reimport it via the ldap command line.

 

With SSL

/opt/fedora-ds/shared/bin/ldapmodify -D "cn=Directory Manager" –c -p 636 -Z -P /opt/fedora-ds/alias -h hostname -a -w - -f  people-backup-import.ldif

 

Without SSL

/opt/fedora-ds/shared/bin/ldapmodify -D "cn=Directory Manager" –c -p 389 -h hostname -a -w - -f  people-backup-import.ldif

 

That’s me restoring Fedora Directory Server probably into operational mode probably 2 to 3 minutes.