Fedora Miscellaneous Problems – Slapd at
100% Load
There can be a number of factors which can affect this assuming that you have enough ram and processing power.
Most likely assuming everything else is working fine, there may be a time skew with your Fedora Directory Server and your clients. They all should have the same time set ie run a network time protocol daemon (ntpd) or sync the time on the machines via a script ie sych as rdate –s time.uwa.edu.au
The other problem I’ve noticed there is a memory leak in Fedora Directory Server 1.02, the only reason why I’ve noticed this is I’ve left it idle and network less ie unplug the Ethernet cable and the memory usage for the slapd grew on my test box. Which explains memory usage of the Directory server on my production box which slapd services need to be restarted roughly when the memory usage reached about 1.5GB else the box would crash sooner or latter because of the physical amount of the ram on the box was only 2GB. When the LDAP uses a huge amount of RAM its load on the server is tremendous which idles anything like 30% to 100%. Solution upgrade to Fedora Directory Server to 1.04 (I don’t really trust upgrades, I would rather dump the entire LDAP directory to an ldif file and resinstall a new copy FDS 1.04 then reimport the LDIF file) which seems to fix the load, not it rarely reaches that amount of memory usage nor the load associated with it.
The second mostly like problem is that your server is inundated with two many requests either from too many clients or no of request coming from the client. This is usually solved by configuring your name service cache daemon (nscd), what nscd does, is it caches the information locally instead of repeatedly querying your LDAP server at regular intervals. So imagine couple of hundred of machines each issuing queries every 4 seconds or so. To configure NSCD, edit /etc/nscd.conf
cat /etc/nscd.conf
#
# /etc/nscd.conf
#
# An example Name Service Cache config file. This file is needed by nscd.
#
# Legal entries are:
#
# logfile <file>
# debug-level <level>
# threads <initial #threads to use>
# max-threads <maximum #threads to use>
# server-user <user to run server as instead of root>
# server-user is ignored if nscd is started with -S parameters
# stat-user <user who is allowed to request statistics>
# reload-count unlimited|<number>
# paranoia <yes|no>
# restart-interval <time in seconds>
#
# enable-cache <service> <yes|no>
# positive-time-to-live <service> <time in seconds>
# negative-time-to-live <service> <time in seconds>
# suggested-size <service> <prime number>
# check-files <service> <yes|no>
# persistent <service> <yes|no>
# shared <service> <yes|no>
# max-db-szie <service> <number bytes>
#
# Currently supported cache names (services): passwd, group, hosts
#
# logfile /var/log/nscd.log
# threads 6
# max-threads 128
server-user nscd
# stat-user nocpulse
debug-level 0
# reload-count 5
paranoia no
# restart-interval 3600
enable-cache passwd yes
positive-time-to-live passwd 500000
negative-time-to-live passwd 20
suggested-size passwd 211
check-files passwd yes
persistent passwd yes
shared passwd yes
max-db-size passwd 33554432
enable-cache group yes
positive-time-to-live group 500000
negative-time-to-live group 60
suggested-size group 211
check-files group yes
persistent group yes
shared group yes
max-db-size group 33554432
enable-cache hosts yes
positive-time-to-live hosts 1000000
negative-time-to-live hosts 20
suggested-size hosts 211
check-files hosts yes
persistent hosts yes
shared hosts yes
max-db-size hosts 33554432
The values you should be editing is the values for positive-time-to-live for the various files ie passwd and group for user and group information which is repeatedly queried for LDAP information.
If caching the information from the LDAP via NSCD doesn’t reduce the load you really only have one option. There is too many queries to your server and you need to distribute the load. This where you multiple LDAP servers, where you have a Master LDAP server and several slave LDAP server. All modifications and changes to the LDAP diretory is done to the Master LDAP server which are replicated down to the slave LDAP servers. And all queries are directed to the slave LDAP servers. I might around to writing how to do this but I just simply don’t have the time nor need to do it as of yet.
The last option is the slapd database information ie index information is corrupted, which you may have to rebuild the database information. A smart person would periodically dump the information into an ldif file. Ie dump the people information from ou=People,dc=csse,dc=uwa,dc=edu,dc=au, dump the group information for ou=Groups,dc=csse,dc=uwa,dc=edu,dc=au.
Usually what I have done once I’ve set up the Fedora Directory Server configured correctly ie with SSL etc, I would tar ball the whole directory of Fedora Directory Server as I mentioned before its self contained. I would untar the installation and re-import the ldif file and basically its back up. When you dump the ldif file usually you have to go to each entry and add “changetype: add” before the objectClass.
If you have a couple thousand entries its best to use an automated too like sed to do this ie
cat people-backup.ldif | sed s/”objectClass: top”/”changetype: add\nObjectclass: top”/g >> people-backup-import.ldif
cat group-backup.ldif | sed s/”objectClass: top”/”changetype: add\nObjectclass: top”/g >> group-backup-import.ldif
Now you appended the necessary options to your backup ldif just reimport it via the ldap command line.
With SSL
/opt/fedora-ds/shared/bin/ldapmodify -D "cn=Directory Manager" –c -p 636 -Z -P /opt/fedora-ds/alias -h hostname -a -w - -f people-backup-import.ldif
Without SSL
/opt/fedora-ds/shared/bin/ldapmodify -D "cn=Directory Manager" –c -p 389 -h hostname -a -w - -f people-backup-import.ldif
That’s me restoring Fedora Directory Server probably into operational mode probably 2 to 3 minutes.