Ping Pong needed in TCP/IP protocol?

Discussion:

(too old to reply)

AdamMaynard24

2006-11-15 16:22:32 UTC

I'm using the WSAAsyncSelect() model for a server. Clients that
connect, should be able to connect for many many hours if they like to
leave their client's connected. This is fine - it is a chat server and
typically, people will idle for days in some cases, so I don't want to
disconnect sockets based on idleness.

However, I would like to determine if a connection has stopped
functioning. Now if these are all TCP/IP connections using this
asynchronous server model, does TCP/IP handle any connections that
might drop due to network failures etc? and therefore, generate an
FD_CLOSE?

Or.. perhaps I need to ping all inactive clients after a set period of
inactivity, and if there's no point - timeout the connection, and close
the socket?

Kind regards. Adam.

AdamMaynard24

2006-11-15 17:03:20 UTC

Permalink

Also, another thing:

When a connection has been established, and a client has been
accept()'ed by the server.. Then, the client will need to send it's
user data (username etc.) across to the server.

Now, what happens if this send() never succeeds. I.E, client side, it
seems to have been sucessful, but due to networking issues, it never
arrives.

I'm under the assumption that because TCP/IP "guarantees" delivery -
that the socket would be closed, and i'm just checking that this is
true here?

Adam.

Peter Duniho

2006-11-15 18:25:27 UTC

Permalink

Post by AdamMaynard24
[...]
Now, what happens if this send() never succeeds. I.E, client side, it
seems to have been sucessful, but due to networking issues, it never
arrives.
I'm under the assumption that because TCP/IP "guarantees" delivery -
that the socket would be closed, and i'm just checking that this is
true here?

It's not. It depends on the nature of the failure. If the connection is
actually disconnected, you'll get an error. But if the client fails to
respond because of some bug or something, all you'll know is that you never
got a reply. Since the network components didn't actually fail, no error
would be generated. To detect that sort of situation, you'll need to do
some timeout checking yourself.

Pete

AdamMaynard24

2006-11-16 16:31:29 UTC

Permalink

OK i see.

There is one other issue:

What about the reading server side: It needs to read all from a socket
into a main_buffer, and then parse it and deal with each seperate
command it has received etc. What happens if it hasn't received the
full data of a command? Would it then shift this first part into
another buffer, and continue, ready to add to it later when the rest of
the info comes through?

By this logic, it seems I would have to maintain a seperate buffer for
each connected socket, correct?

Kind regards.

AdamMaynard24

2006-11-16 16:44:02 UTC

Permalink

Thanks for the info about the PING/PONG btw.. that's interesting.

I will implement this I think, but one thing that I was wondering on
this subject (which isn't mentioned in the RFC), is how the massive
PINGs would be coordinated on IRC?

If a server as 30K users connected.. and at a 90 second timer they all
must be pinged, I was wondering about the logistics of trying to pint
30 thousand clients.. Iterative ping?

Kind regards. Adam.

Peter Duniho

2006-11-16 21:40:08 UTC

Permalink

Post by AdamMaynard24
[...]
If a server as 30K users connected.. and at a 90 second timer they all
must be pinged, I was wondering about the logistics of trying to pint
30 thousand clients.. Iterative ping?

"Iterative"? I don't know what you mean...by necessity, if you are to send
a "ping" command to every client, you'll have to iterate all of your
clients. If your server is capable of supporting 30K simultaneous chat
clients, then doing the iteration and sending the "ping" command to each one
should be no problem.

AdamMaynard24

2006-11-16 22:49:13 UTC

Permalink

Post by Peter Duniho
"Iterative"? I don't know what you mean...by necessity, if you are to send
a "ping" command to every client, you'll have to iterate all of your
clients. If your server is capable of supporting 30K simultaneous chat
clients, then doing the iteration and sending the "ping" command to each one
should be no problem.

Yeah, i thought it would be a lot of data to send perhaps, but when you
think about it, it would be less than 30kb :). But perhaps the
iteration i was thinking each 90 seconds of 30K would be wasteful.
Maybe not so with today's processing power.

Adam.

Peter Duniho

2006-11-16 23:07:17 UTC

Permalink

Post by AdamMaynard24
Yeah, i thought it would be a lot of data to send perhaps, but when you
think about it, it would be less than 30kb :). But perhaps the
iteration i was thinking each 90 seconds of 30K would be wasteful.
Maybe not so with today's processing power.

Well, I personally think that sending out a "ping" every 90 seconds is
wasteful. But that has more to do with network bandwidth. Your network
connection is your bottleneck, not the CPU.

You should send out a "ping" as frequently as you need to in order to detect
a disabled client within the time limit you feel is required. That is, if
you feel that you can allow a client to sit for 10 minutes before you need
to know they are idle, then send the "ping" every 10 minutes. If only 90
seconds can be allowed to pass before you need to know, then send the "ping"
every 90 seconds.

Also, keep in mind that you may or may not want to send out all the "pings"
at once. It's easiest to implement if you just iterate all of your clients
at once, but you may find that you get more reliable results and better
network performance by spreading the "pings" over the entire interval at
which they need to happen. So, 30K clients over 90 seconds means waiting 3
ms between each "ping" (of course, if you consider that 3 ms isn't a lot of
time, looking at it that strongly suggests that if you insist on "pinging"
every 90 seconds, you won't have a lot of time left over to do other stuff).

Of course, don't forget that the more often you "ping", the more likely it
is you'll detect a communications problem that was only temporary and
non-fatal in nature. Personally, I think it's likely that "pings" are
entirely overused and that there are better ways to address the particular
issues related to them. But if you do decide to use them, at least make
sure you understand all the implications related to using them.

Pete

David Schwartz

2006-11-17 00:48:13 UTC

Permalink

Post by AdamMaynard24
I will implement this I think, but one thing that I was wondering on
this subject (which isn't mentioned in the RFC), is how the massive
PINGs would be coordinated on IRC?
If a server as 30K users connected.. and at a 90 second timer they all
must be pinged, I was wondering about the logistics of trying to pint
30 thousand clients.. Iterative ping?

There are a lot of ways to do it. You don't have to ping all 30,000
clients in one pass every 90 seconds. You could, for example, divide
the clients randomly into 90 groups and ping all the clients in one
group every second.

ConferenceRoom tracks each client with an individual timer. When the
timer fires, a thread is dispatched to handle that one client. The ping
logic is independent for each client, so unless a client actually pings
out or the attempt to send the ping gets an error or something like
that, no server locks are touched (except to lock that individual
client so it can't be deleted while it's being pinged!). So if you have
more than one CPU/core, the cost is very low.

In a low-load, low-security environment, turning on keepalives may be
sufficient, so long as you place some reasonable limit on the send
queue. It may take as long as two hours to detect a lost network
connection, and a hung client on a live machine may not be detected
reliably.

DS

Peter Duniho

2006-11-16 21:38:16 UTC

Permalink

Post by AdamMaynard24
[...]
By this logic, it seems I would have to maintain a seperate buffer for
each connected socket, correct?

Yes...for each connection, you'll have to have some place to temporarily
store received data until you have enough to process. If you are able to
write your network code so that it receives the data directly into that
temporary storage, so much the better (since doing so would avoid at least
one copy operation with the data).

Pete

David Schwartz

2006-11-17 00:43:57 UTC

Permalink

Post by AdamMaynard24
What about the reading server side: It needs to read all from a socket
into a main_buffer, and then parse it and deal with each seperate
command it has received etc. What happens if it hasn't received the
full data of a command? Would it then shift this first part into
another buffer, and continue, ready to add to it later when the rest of
the info comes through?
By this logic, it seems I would have to maintain a seperate buffer for
each connected socket, correct?

You generally maintain both a send buffer and a receive buffer for
every client. ConferenceRoom has an optimization that the send buffer
is actually a queue of logical messages, so if a message is sent to
thirty users, a reference-counted single instance is put on thirty send
queues.

DS

Peter Duniho

2006-11-15 18:23:49 UTC

Permalink

[...] I would like to determine if a connection has stopped
functioning. Now if these are all TCP/IP connections using this
asynchronous server model, does TCP/IP handle any connections that
might drop due to network failures etc? and therefore, generate an
FD_CLOSE?
Or.. perhaps I need to ping all inactive clients after a set period of
inactivity, and if there's no point - timeout the connection, and close
the socket?

Depends on what sort of behavior you want. IMHO, you should not use "keep
alive" techniques, because you specifically say it's okay for a connection
to live for a very long time and you don't mind an idle connection. It is
theoretically possible for a connection to be disrupted momentarily, but as
long as no attempt to use the connection is made, for the connection to be
reestablished with neither end ever noticing a problem. Why go out of your
way to break this behavior if you have no good reason to?

So, in other words, if something unrecoverable goes wrong with the
connection, you'll get an error. That would be the time to detect the error
and close the socket. Otherwise, I don't see a need to explicitly shutdown
the connection.

If you are concerned about potentially broken connections not being detected
at all and causing your server to get filled with defunct connections, then
sure, it might make sense to send data every so often, to check to see if
the connection is still valid. Of course, if you do this, you could wind up
causing a connection only temporarily disabled to be flagged as broken, even
though the user didn't intend that to happen and the connection would have
been reestablished soon. So you're back to the above...why break it if you
don't have to?

It may make more sense to simply expire your clients on a LRU basis, as you
find yourself running out of resources to deal with new clients.

Pete

David Schwartz

2006-11-15 23:04:22 UTC

Permalink

Post by Peter Duniho
It may make more sense to simply expire your clients on a LRU basis, as you
find yourself running out of resources to deal with new clients.

No, never do that. That makes an attack maximally effective as all the
legitimate connections are displaced by the bogus ones. During an
attack, the longer a connection has been maintained, the more likely
it's legitimate. Why wipe out the most legitimate connections first?!

DS

Peter Duniho

2006-11-16 03:57:09 UTC

Permalink

Post by David Schwartz

Post by Peter Duniho
It may make more sense to simply expire your clients on a LRU basis, as you
find yourself running out of resources to deal with new clients.

You are right about defending against an attack. That said, we are talking
about connections that have been left idle for over a day. It did not occur
to me that I needed to be explicit about every small detail of the
algorithm. Obviously, one would not be expiring connections that are, for
all intents and purposes, still active.

The fact is, by having *no* mechanism to expire idle connections, a similar
attack is just as viable. Normal users are likely to eventually disconnect.
One can reduce the available connections to nearly nothing right away, and
then as the remaining legitimate users disconnect, grab those available ones
as well.

I agree that a naively implemented "expire LRU" algorithm is even easier to
exploit, but in reality no expiration algorithm should be implemented in a
naive way. Any mechanism that a server uses to automatically disconnect
users can be coopted unless some measures are taken to limit the effect of
an attack.

Pete

David Schwartz

2006-11-16 07:18:13 UTC

Permalink

Post by Peter Duniho
You are right about defending against an attack. That said, we are talking
about connections that have been left idle for over a day. It did not occur
to me that I needed to be explicit about every small detail of the
algorithm. Obviously, one would not be expiring connections that are, for
all intents and purposes, still active.

Ahh, I get you now. Somehow I misunderstood 'LRU' to mean oldest, as
opposed to longest inactive. Still, though, an old connection is the
least likely to be part of an attack.

Post by Peter Duniho
The fact is, by having *no* mechanism to expire idle connections, a similar
attack is just as viable. Normal users are likely to eventually disconnect.
One can reduce the available connections to nearly nothing right away, and
then as the remaining legitimate users disconnect, grab those available ones
as well.

Right, but by having no mechanism to expire *old* connections, no
attacker can displace legitimate users who remain connected. In
general, being unable to take new connections is no big deal for a chat
server, but dumping existing ones is very disruptive. Of course, that's
very application specific.

Post by Peter Duniho
I agree that a naively implemented "expire LRU" algorithm is even easier to
exploit, but in reality no expiration algorithm should be implemented in a
naive way. Any mechanism that a server uses to automatically disconnect
users can be coopted unless some measures are taken to limit the effect of
an attack.

Definitely.

DS

David Schwartz

2006-11-15 23:09:54 UTC

Permalink

Post by AdamMaynard24
I'm using the WSAAsyncSelect() model for a server. Clients that
connect, should be able to connect for many many hours if they like to
leave their client's connected. This is fine - it is a chat server and
typically, people will idle for days in some cases, so I don't want to
disconnect sockets based on idleness.
However, I would like to determine if a connection has stopped
functioning. Now if these are all TCP/IP connections using this
asynchronous server model, does TCP/IP handle any connections that
might drop due to network failures etc? and therefore, generate an
FD_CLOSE?
Or.. perhaps I need to ping all inactive clients after a set period of
inactivity, and if there's no point - timeout the connection, and close
the socket?

The short answer is that this should be part of your server's security
model and is not a problem handled well in isolation. Dealing with idle
clients is just a small part of the problems you have to deal with if
you want to operate a chat server that might be subject to illegitimate
connections.

The textbook way to do is it to send every client an application-level
query every so often. The query should include some kind of random
unique token. The client must return the token to the server within a
certain amount of time or the client is disconnected.

This has many side-benefits. It not only catches broken clients and
dead network connections, but it also catches undirectional clients and
clients that are maliciously trying to fill your send queue. If they
can't or won't receive the token, they can't send it back to you, and
hence will get disconnected.

For IRC servers, they usually send something like:

PING :2109745097

every 90 seconds or so, and the client must respond

PONG :210974507

before the next 'PING' is sent.

For situations where malicious connections are unlikely and there are
no real security concerns, enabling TCP keepalives and putting a limit
on the send queue is probably sufficient.

DS