README.txt This Poller class demonstrates access to poll(2) functionality in Java. Requires Solaris production (native threads) JDK 1.2 or later, currently the C code compiles only on Solaris (SPARC and Intel). Poller.java is the class, Poller.c is the supporting JNI code. PollingServer.java is a sample application which uses the Poller class to multiplex sockets. SimpleServer.java is the functional equivalent that does not multiplex but uses a single thread to handle each client connection. Client.java is a sample application to drive against either server. To build the Poller class and client/server demo : javac PollingServer.java Client.java javah Poller cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\ Poller.c You will need to set the environment variable LD_LIBRARY_PATH to search the directory containing libpoller.so. To use client/server, bump up your fd limit to handle the connections you want (need root access to go beyond 1024). For info on changing your file descriptor limit, type "man limit". If you are using Solaris 2.6 or later, a regression in loopback read() performance may hit you at low numbers of connections, so run the client on another machine. BASICs of Poller class usage : run "javadoc Poller" or see Poller.java for more details. { Poller Mux = new Poller(65535); // allow it to contain 64K IO objects int fd1 = Mux.add(socket1, Poller.POLLIN); ... int fdN = Mux.add(socketN, Poller.POLLIN); int[] fds = new int[100]; short[] revents = new revents[100]; int numEvents = Mux.waitMultiple(100, fds, revents, timeout); for (int i = 0; i < numEvents; i++) { /* * Probably need more sophisticated mapping scheme than this! */ if (fds[i] == fd1) { System.out.println("Got data on socket1"); socket1.getInputStream().read(byteArray); // Do something based upon state of fd1 connection } ... } } Poller class implementation notes : Currently all add(),remove(),isMember(), and waitMultiple() methods are synchronized for each Poller object. If one thread is blocked in pObj.waitMultiple(), another thread calling pObj.add(fd) will block until waitMultiple() returns. There is no provided mechanism to interrupt waitMultiple(), as one might expect a ServerSocket to be in the list waited on (see PollingServer.java). One might also need to interrupt waitMultiple() to remove() fds/sockets, in which case one could create a Pipe or loopback localhost connection (at the level of PollingServer) and use a write() to that connection to interrupt. Or, better, one could queue up deletions until the next return of waitMultiple(). Or one could implement an interrupt mechanism in the JNI C code using a pipe(), and expose that at the Java level. If frequent deletions/re-additions of socks/fds is to be done with very large sets of monitored fds, the Solaris 7 kernel cache will likely perform poorly without some tuning. One could differentiate between deleted (no longer cared for) fds/socks and those that are merely being disabled while data is processed on their behalf. In that case, re-enabling a disabled fd/sock could put it in it's original position in the poll array, thereby increasing the kernel cache performance. This would best be done in Poller.c. Of course this is not necessary for optimal /dev/poll performance. Caution...the next paragraph gets a little technical for the benefit of those who already understand poll()ing fairly well. Others may choose to skip over it to read notes on the demo server. An optimal solution for frequent enabling/disabling of socks/fds could involve a separately synchronized structure of "async" operations. Using a simple array (0..64k) containing the action (ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll array, and having nativeWait() wake up in the poll() call periodically to process these async operations, I was able to speed up performance of the PollingServer by a factor of 2x at 8000 connections. Of course much of that gain was from the fact that I could (with the advent of an asyncAdd() method) move the accept() loop into a separate thread from the main poll() loop, and avoid the overhead of calling poll() with up to 7999 fds just for an accept. In implementing the async Disable/Enable, a further large optimization was to auto-disable fds with events available (before return from nativeWait()), so I could just call asyncEnable(fd) after processing (read()ing) the available data. This removed the need for inefficient gang-scheduling the attached PollingServer uses. In order to separately synchronize the async structure, yet still be able to operate on it from within nativeWait(), synchronization had to be done at the C level here. Due to the new complexities this introduced, as well as the fact that it was tuned specifically for Solaris 7 poll() improvements (not /dev/poll), this extra logic was left out of this demo. Client/Server Demo Notes : Do not run the sample client/server with high numbers of connections unless you have a lot of free memory on your machine, as it can saturate CPU and lock you out of CDE just by its very resource intensive nature (much more so the SimpleServer than PollingServer). Different OS versions will behave very differently as far as poll() performance (or /dev/poll existence) but, generally, real world applications "hit the wall" much earlier when a separate thread is used to handle each client connection. Issues of thread synchronization and locking granularity become performance killers. There is some overhead associated with multiplexing, such as keeping track of the state of each connection; as the number of connections gets very large, however, this overhead is more than made up for by the reduced synchronization overhead. As an example, running the servers on a Solaris 7 PC (Pentium II-350 x 2 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following times (shorter is better) : 1000 connections : PollingServer took 11 seconds SimpleServer took 12 seconds 4000 connections : PollingServer took 20 seconds SimpleServer took 37 seconds 8000 connections : PollingServer took 39 seconds SimpleServer took 1:48 seconds This demo is not, however, meant to be considered some form of proof that multiplexing with the Poller class will gain you performance; this code is actually very heavily biased towards the non-polling server as very little synchronization is done, and most of the overhead is in the kernel IO for both servers. Use of multiplexing may be helpful in many, but certainly not all, circumstances. Benchmarking a major Java server application which can run in a single-thread-per-client mode or using the new Poller class showed Poller provided a 253% improvement in throughput at a moderate load, as well as a 300% improvement in peak capacity. It also yielded a 21% smaller memory footprint at the lower load level. Finally, there is code in Poller.c to take advantage of /dev/poll on OS versions that have that device; however, DEVPOLL must be defined in compiling Poller.c (and it must be compiled on a machine with /usr/include/sys/devpoll.h) to use it. Code compiled with DEVPOLL turned on will work on machines that don't have kernel support for the device, as it will fall back to using poll() in those cases. Currently /dev/poll does not correctly return an error if you attempt to remove() an object that was never added, but this should be fixed in an upcoming /dev/poll patch. The binary as shipped is not built with /dev/poll support as our build machine does not have devpoll.h.