Mac OS X Server,  Unix

Replica Trees & Tuning Open Directory

You have a fairly large Open Directory environment and you go to add the 33rd replica but you get a funny error that dserr doesn’t have listed. The reason is likely that a single Open Directory Master can only have 32 replicas. However, you can have 32 replicas on each replica (thus having a replica tree), ergo allowing for a total of 1,024 replicas and a master. So rather than bind that 33rd replica to a master, move to a replica tree model, trying to offload replicas in as geographically friendly a fashion as possible (thus reducing slap traffic on your WAN links) by repositioning replicas per site. Similar to how Active Directory infrastructures often have a global catalog at each site, if you’ve got a large number of Open Directory Replicas then you should likely try and limit the number that connects back to each master per site to 1.

Assuming that each replica can sustain a good 350 clients on a bad day (and we always plan for bad days), even the largest pure Mac OS X deployments will have plenty of LDAP servers to authenticate to. However, you’re likely going to have issues with clients being able to tell which Open Directory server is the most appropriate to authenticate through. Therefore, cn=config will need to be customized per group that leverages each replica, or divert rules used with ipfw to act as a traffic cop. Overall, the replica trees seem to be working fairly well in Snow Leopard, netting a fairly scalable infrastructure for providing LDAP services.

You can also get pretty granular with the slurpd (the daemon that manages Open Directory replication) logs by invoking slurpd with a -d option followed by a number from 4 to 65535, with intensity of logs getting more as the number gets higher. You can also use the -r option to indicate a specific log file. If you have more than 32 replicas then it stands to reason that you also have a large number of objects in Open Directory, a fair amount of change occurring to said objects and therefore a fair amount of replication IO. In order to offload this you can move your replication temp directory onto SSD drives, by specifying the -t option when invoking slurpd.

The slurpd replication occurs over port 389 (by default). Therefore, in a larger environment you should be giving priority to network traffic. If you choose to custom make/install slurpd then you’ll also need to go ahead and build your Kerberos principles manually. In this case you would get a srvtab file for the slurpd server and then configure slapd to accept Kerberos Authentication for slaves. Having said this, I haven’t seen an environment where I had to configure slurpd in this fashion.