The select
() and pselect
() system calls are used to efficiently
monitor multiple file descriptors, to see if any of them is, or
becomes, "ready"; that is, to see whether I/O becomes possible,
or an "exceptional condition" has occurred on any of the file
descriptors.
This page provides background and tutorial information on the use
of these system calls. For details of the arguments and
semantics of select
() and pselect
(), see select(2).
Combining signal and data events
pselect
() is useful if you are waiting for a signal as well as
for file descriptor(s) to become ready for I/O. Programs that
receive signals normally use the signal handler only to raise a
global flag. The global flag will indicate that the event must
be processed in the main loop of the program. A signal will
cause the select
() (or pselect
()) call to return with errno set
to EINTR
. This behavior is essential so that signals can be
processed in the main loop of the program, otherwise select
()
would block indefinitely.
Now, somewhere in the main loop will be a conditional to check
the global flag. So we must ask: what if a signal arrives after
the conditional, but before the select
() call? The answer is
that select
() would block indefinitely, even though an event is
actually pending. This race condition is solved by the pselect
()
call. This call can be used to set the signal mask to a set of
signals that are to be received only within the pselect
() call.
For instance, let us say that the event in question was the exit
of a child process. Before the start of the main loop, we would
block SIGCHLD
using sigprocmask(2). Our pselect
() call would
enable SIGCHLD
by using an empty signal mask. Our program would
look like:
static volatile sig_atomic_t got_SIGCHLD = 0;
static void
child_sig_handler(int sig)
{
got_SIGCHLD = 1;
}
int
main(int argc, char *argv[])
{
sigset_t sigmask, empty_mask;
struct sigaction sa;
fd_set readfds, writefds, exceptfds;
int r;
sigemptyset(&sigmask);
sigaddset(&sigmask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == -1) {
perror("sigprocmask");
exit(EXIT_FAILURE);
}
sa.sa_flags = 0;
sa.sa_handler = child_sig_handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGCHLD, &sa, NULL) == -1) {
perror("sigaction");
exit(EXIT_FAILURE);
}
sigemptyset(&empty_mask);
for (;;) { /* main loop */
/* Initialize readfds, writefds, and exceptfds
before the pselect() call. (Code omitted.) */
r = pselect(nfds, &readfds, &writefds, &exceptfds,
NULL, &empty_mask);
if (r == -1 && errno != EINTR) {
/* Handle error */
}
if (got_SIGCHLD) {
got_SIGCHLD = 0;
/* Handle signalled event here; e.g., wait() for all
terminated children. (Code omitted.) */
}
/* main body of program */
}
}
Practical
So what is the point of select
()? Can't I just read and write to
my file descriptors whenever I want? The point of select
() is
that it watches multiple descriptors at the same time and
properly puts the process to sleep if there is no activity. UNIX
programmers often find themselves in a position where they have
to handle I/O from more than one file descriptor where the data
flow may be intermittent. If you were to merely create a
sequence of read(2) and write(2) calls, you would find that one
of your calls may block waiting for data from/to a file
descriptor, while another file descriptor is unused though ready
for I/O. select
() efficiently copes with this situation.
Select law
Many people who try to use select
() come across behavior that is
difficult to understand and produces nonportable or borderline
results. For instance, the above program is carefully written
not to block at any point, even though it does not set its file
descriptors to nonblocking mode. It is easy to introduce subtle
errors that will remove the advantage of using select
(), so here
is a list of essentials to watch for when using select
().
1. You should always try to use select
() without a timeout.
Your program should have nothing to do if there is no data
available. Code that depends on timeouts is not usually
portable and is difficult to debug.
2. The value nfds must be properly calculated for efficiency as
explained above.
3. No file descriptor must be added to any set if you do not
intend to check its result after the select
() call, and
respond appropriately. See next rule.
4. After select
() returns, all file descriptors in all sets
should be checked to see if they are ready.
5. The functions read(2), recv(2), write(2), and send(2) do not
necessarily read/write the full amount of data that you have
requested. If they do read/write the full amount, it's
because you have a low traffic load and a fast stream. This
is not always going to be the case. You should cope with the
case of your functions managing to send or receive only a
single byte.
6. Never read/write only in single bytes at a time unless you
are really sure that you have a small amount of data to
process. It is extremely inefficient not to read/write as
much data as you can buffer each time. The buffers in the
example below are 1024 bytes although they could easily be
made larger.
7. Calls to read(2), recv(2), write(2), send(2), and select
()
can fail with the error EINTR
, and calls to read(2), recv(2)
write(2), and send(2) can fail with errno set to EAGAIN
(EWOULDBLOCK
). These results must be properly managed (not
done properly above). If your program is not going to
receive any signals, then it is unlikely you will get EINTR
.
If your program does not set nonblocking I/O, you will not
get EAGAIN
.
8. Never call read(2), recv(2), write(2), or send(2) with a
buffer length of zero.
9. If the functions read(2), recv(2), write(2), and send(2) fail
with errors other than those listed in 7.
, or one of the
input functions returns 0, indicating end of file, then you
should not pass that file descriptor to select
() again. In
the example below, I close the file descriptor immediately,
and then set it to -1 to prevent it being included in a set.
10. The timeout value must be initialized with each new call to
select
(), since some operating systems modify the structure.
pselect
() however does not modify its timeout structure.
11. Since select
() modifies its file descriptor sets, if the call
is being used in a loop, then the sets must be reinitialized
before each call.