2

I am using non-blocking sockets to connect to a server.
In a specific test scenario, the server is down, which means a TCP SYN goes out, but there is no response and there can never be an established connection.

In this setup, usually select times out after 2 seconds returning 0. This is the behavior most of the time and it seems correct.

However, in roughly 5% of the cases, select immediately returns 1 (indicating the socket is readable in the mask).
But when I read(2) from the socket, -1 is returned with 'Network is unreachable'

sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking

struct timeval tv{};
tv.tv_sec = 2;

int ret = connect(sockfd, addr, addrlen ); // addr set elsewhere
if (ret < 0 && errno == EINPROGRESS)
{
    fd_set cset;
    FD_ZERO(&cset);
    FD_SET(sockfd, &cset);
    
    ret = select(sockfd + 1, &cset, nullptr, nullptr, &tv);
    // returns 1 sometimes
}

In the first post, I incorrectly stated that in the error case, there is only one TCP SYN on the network (without retries).
This is not true; in both the error and non-error case, there is a TCP SYN on the network that is re-sent after 1 second.

What might cause this and is there a way to get consistent behavior with select ?

curiousguy12
  • 1,689
  • 8
  • 15

1 Answers1

4

The correct way to determine if a non-blocking connect() is finished is to ask select() for writability not readability. This is clearly stated in the connect() documentation:

EINPROGRESS
The socket is nonblocking and the connection cannot be completed immediately. (UNIX domain sockets failed with EAGAIN instead.) It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).

It is undefined behavior to use select()/poll() to test a socket for readability before you know the connection has actually been established first.

Try this instead:

sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking

int ret = connect(sockfd, addr, addrlen); // addr set elsewhere
if (ret < 0)
{
    if (errno != EINPROGRESS)
    {
        close(sockfd);
        sockfd = -1;
    }
    else
    {
        fd_set cset;
        FD_ZERO(&cset);
        FD_SET(sockfd, &cset);
    
        struct timeval tv{};
        tv.tv_sec = 2;

        ret = select(sockfd + 1, nullptr, &cset, nullptr, &tv);
        if (ret <= 0)
        {
            close(sockfd);
            sockfd = -1;
        }
        else
        {
            int errCode = 0;
            socklen_t len = sizeof(errCode);
            getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &errCode, &len);

            if (errCode != 0)
            {
                close(sockfd);
                sockfd = -1;
            }
        }
    }
}

if (sockfd != -1)
{
    // use sockfd as needed (read(), etc) ...
    close(sockfd);
}
Remy Lebeau
  • 505,946
  • 29
  • 409
  • 696
  • Well of course you can test for readability, it will just indicate readability (i.e. data received from the peer) rather than a successfully established connection. – n. 1.8e9-where's-my-share m. May 31 '21 at 19:28
  • Thanks, I changed the code per your suggestions.. `getsockopt` still returns `network unreachable` in about 5% of the connection attempts. The strange thing is that even with that error, I can see the TCP SYN go out on the network. I can try `ppoll`, but if this is a message from the kernel, it may trigger the same behavior – curiousguy12 May 31 '21 at 19:43
  • @curiousguy12 I never said a "network unreachable" error wouldn't happen. But you can't be sure the error is available until the socket reports writability, not readability. – Remy Lebeau May 31 '21 at 19:57