this is djp's outdated local copy. See also the maintained version of rjs3's own paper

This document describes a load balancing name server written in Perl 4.
I am presenting a paper at the LISA '95 conference on this work and a 
version written in Perl 5. After the conference (Sept95) I will update this
page with a pointer to the Perl 5 version.

lbnamed

lbnamed is a load balancing name server written in Perl . Of course it was meant to be a proof of concept that would get added back into bind, but has worked well enough that I've left it in Perl for now. lbnamed allows you to create dynamic groups of hosts that have one name in the DNS name space. A host may be in multiple groups at the same time.

For example, when somone types:

           telnet best-elaine.Stanford.EDU
They get connected to one of the 57 different SPARCstations which have the name (elaine1-elaine57). The current definition of what the "best" host is can be found in the appendix.

server

The server side consists of two perl programs, lbnamed and poller. These programs run in parallel and communicate using signals and configuration files.

poller

The poller program contacts the client daemon running on the hosts being polled. It reads a configuration file which tells the poller which hosts to poll. The poller periodically sends out requests and receives the responses asynchronously. After it has received all the responses it dumps the information into a configuration file and sends a signal to lbnamed which then reloads its configuration file. If the poller does not receive a response from one of the hosts being polled it simply removes it from the configuration file it feeds to lbnamed.

A problem still exists when a host responds to poller requests, and has a low load, but no one can login because of a problem (lack of swap, etc). A future version may be smarter and watch for trends where a host is constantly handed out but the weight of that host never changes.

lbnamed

lbnamed reads the configuration file generated by the poller and loads it into a number of different data structures. Each group of machines is stored an array, while the weights of all the hosts are stored in one hash table. When a request for a particular group comes in the array for that group is sorted based on the weight of each host in that group. The host with lowest weight is then returned as the best host, and its weight is slightly increased.

To other name servers, lbnamed looks like a standard DNS name server, with the exception that it doesn't answer recursive queries. It only handles requests for the dynamic groups it maintains. lbnamed gets a normal DNS request and based on the name in the request it calculates the host to return. lbnamed then constructs a standard DNS response and sends it back to client that requested it. The time to live (TTL) value in the repsonse is set to 0, in order to prevent this value from being cached by other name servers (which would defeat the whole purpose).

client

Hosts that are going to be polled by the poller need to run a special daemon. That daemon responds to poller requests (over UDP) using a simple protocol. The protocol format is described in the appendix.

Configuring the load balancing name servers

In orer for the load balancing name server to answer requests it must be delegated a virtual domain to serve. This is normally done in the parent domain by adding NS records. In the Stanford.EDU domain the load balancing name server uses the best.Stanford.EDU domain, so in the DNS configuration files for the Stanford.EDU domain there are two NS records:
best          IN      NS      dsodb.Stanford.EDU.
best          IN      NS      sunlight.Stanford.EDU.
These two NS records delegate the best.Stanford.EDU domain to the lbnamed's running on dsodb and sunlight. Now when the primary servers for the Stanford.EDU domain get a request like elaine.best.stanford.edu they know to forward it off the the lbnamed's. Note that the two lbnamed's don't communicate to each other, they both operate independent of each other for simplicity and redundancy.

Availability

The code is available using the following URL:

http://www-leland.stanford.edu/~schemers/dist/lb.tar

Use the code at your own risk. We have been using it for over a year and a half with good results.

Appendix

Weight formula

The formula used to determine the weight of a host is:
    fudge = (tot_user - uniq_user)*20;
    weight = (uniq_user)*100 + (3*l1) + fudge;
Where the variables are:
tot_user
total number of users logged in
uniq_user
unique number of users logged in
l1
the load average over the last minute multiplied by 100.
The formula tries to favor hosts with the least amount of unique logins, and lower load averages. It has worked well (i.e, better then nothing) but could probably be improved.

Poller configuration file

The poller configuration file tells which hosts the poller should poll, and which dynamic groups those hosts are in. The format is simple:
   host    weight-multiplier   group1 [group2 ...]
The weight-multiplier field is currently not used but will be used in the future to allow for better selection among different hardware in the same group.

The following is a sample poller configuration file with some lines removed to save space.

#
# groups
# -------------------------
# sweet     all machines
# elaine    elaine1-elaine57
# sparc     elaine1-elaine57
# sunos     elaine1-elaine57
# sparc2    sparc2 (elaine1-elaine19)
# sparc1    sparc1 (elaine20-elaine57)
# adelbert  adelbert1-adelbert26
# ultrix    adelbert1-adelbert26
# dec       adelbert1-adelbert26
# dec5000   adelbert1-adelbert13
# dec3100   adelbert14-adelbert26
# rs        rs1-rs10
# rs6000    rs1-rs10
# aix	    rs1-rs10
#
rs1    1    rs rs6000 aix 
rs2    1    rs rs6000 aix 
...
rs10   1    rs rs6000 aix 
#
elaine1    1    elaine sparc2 sparc sunos sweet
elaine2    1    elaine sparc2 sparc sunos sweet
...
elaine19   1    elaine sparc2 sparc sunos sweet
#
elaine20   1    elaine sparc1 sparc sunos sweet
elaine21   1    elaine sparc1 sparc sunos sweet
...
elaine57   1    elaine sparc1 sparc sunos sweet
#
adelbert1  1    adelbert dec5000 dec ultrix sweet
adelbert2  1    adelbert dec5000 dec ultrix sweet
...
adelbert13  1    adelbert dec5000 dec ultrix sweet
#
adelbert14  1    adelbert dec3100 dec ultrix sweet
...
adelbert26  1    adelbert dec3100 dec ultrix sweet
#

lbnamed configuration file

The lbnamed configuration file tells lbnamed what the weight of each host is, what its IP address is, and which dynamic groups a hosts is in. The format is simple:
   weight host ipaddress group1 [group2 ...]

The following is a sample lbnamed configuration file with some lines removed to save space.

2200 elaine11 36.214.0.127 elaine sparc2 sparc sunos sweet
639 adelbert10 36.211.0.81 adelbert dec5000 dec ultrix sweet
651 elaine20 36.215.0.208 elaine sparc1 sparc sunos sweet
2336 elaine3 36.212.0.119 elaine sparc2 sparc sunos sweet
...
866 adelbert6 36.211.0.76 adelbert dec5000 dec ultrix sweet
243 adelbert26 36.212.0.201 adelbert dec3100 dec ultrix sweet

protocol

The protocol between the poller and client daemon is fairly simple. Everything is in network byte order. I used UDP so I could easily send out multiple polls the same time and receive responses asynchronously. The packet format (described by C structures) is:

#define PROTO_PORTNUM 4330 
#define PROTO_MAXMESG 2048    /* max udp message to receive */
#define PROTO_VERSION 2


typedef enum P_OPS {
    op_lb_info_req             =1,  /* load balance info, request and reply */
} p_ops_t;

typedef enum P_STATUS {
    status_request             =0,  /* a request packet */
    status_ok                  =1,  /* ok */
    status_error               =2,  /* generic error */
    status_proto_version       =3,  /* protocol version error */
    status_proto_error         =4,  /* any other protocol error */
    status_unknown_op          =5,  /* unknown operation requested */
} p_status_t;

typedef struct {
  u_short   version;  /* protocol version */
  u_short   id;       /* requestor's uniq request id */
  u_short   op;       /* operation requested */
  u_short   status;   /* set on reply */
} P_HEADER,*P_HEADER_PTR;

typedef struct {
  P_HEADER h;
  u_int boot_time;
  u_int current_time;
  u_int user_mtime;  /* time user information last changed */
  u_short l1; /* (int) (load*100) */
  u_short l5;
  u_short l15;
  u_short tot_users;  /* total number of users logged in */
  u_short uniq_users; /* total number of uniq users */
  u_char  on_console; /* true if somone on console */
  u_char  reserved;   /* future use, padding... */
} P_LB_RESPONSE, *P_LB_RESPONSE_PTR;

The protocol was meant to be extensible but I have yet to use the daemon for anything but load balancing requests.


Last Modified: 8/11/94

Roland J. Schemers III, schemers@slapshot.stanford.edu