Agile Java Man: java NIO

Showing posts with label java NIO. Show all posts

Sunday, July 19, 2020

Cancellation idioms

Java IO interrupt refresher

"The InterruptibleChannel interface is a marker that, when implemented by a channel, indicates that the channel is interruptible... Most, but not all, channels are interruptible.

"Channels introduce some new behaviors related to closing and interrupts. If a channel implements the InterruptibleChannel interface, then it's subject to the following semantics. If a thread is blocked on a channel, and that thread is interrupted (by another thread calling the blocked thread's interrupt() method), the channel will be closed, and the blocked thread will be sent a ClosedByInterruptException. Additionally, if a thread's interrupt status is set, and that thread attempts to access a channel, the channel will immediately be closed, and the same exception will be thrown." [Java NIO, Hitchens]

Summarising, if a thread is interrupted before or during a blocking call on a channel, the channel is closed and an exception is thrown.

"It may seem rather draconian to shut down a channel just because a thread sleeping on that channel was interrupted. But this is an explicit design decision made by the NIO architects. Experience has shown that it's impossible to reliably handle interrupted I/O operations consistently across all operating systems."

"Interruptible channels are also asynchronously closable. A channel that implements InterruptibleChannel can be closed at any time, even if another thread is blocked waiting for an I/O to complete on that channel. When a channel is closed, any threads sleeping on that channel will be awakened and receive an AsynchronousCloseException. The channel will then be closed and will be no longer usable." [ibid]

The problems with Java

Daniel Spiewak @djspiewak Apr 24 19:47, 2020
There are a couple things with thread interruption that are horrible:

There's no way to build "uninterruptible" code. Meaning that you cannot have a critical section which acquires a resource atomically in several steps. Or in other words, there is no analogue to the acquire action in bracket.

The only way to detect self-cancelation for valid purposes (e.g. resource cleanup) is catching the InterruptedException, but doing this immediately flips the interrupted bit on the Thread back to false!

The only solution to this is to do Thread.currentThread().interrupt() at the end of your exception handler, which almost no one knows to do. To make matters more annoying, even if you do this correctly, you mess up the stack trace on the interruption, because it's technically a new interrupt.
Oh, and exception handlers are not critical regions either, so if someone is just hammering the interrupt() button over and over externally, you could catch the exception, try to clean things up, and then get immediately interrupted again. This actually happens a lot because of the next point.

Catching Exception or Error will silently catch InterruptedException, even when that's almost guaranteed to not be what you want to do. This leads to silently ignoring interruption in most code paths, which is why people repeatedly hammer interrupt() in the first place.

The problems also go deeper than just Java but down to the OS level. "The underlying stream may not know it its closed until you attempt to write to it (e.g. if the other end of a socket closes it)" [SO]. "There's no API for determining whether a stream has been closed." [SO]

The semantics of cancelling

This is a complicated area. (see the interruption model proposed for Cats Effects 3 at https://github.com/typelevel/cats-effect/issues/681)

Basically interruptible/uninterruptible are not composable. The best way to think about it is that "interruptable means always accept the interrupt, no matter what", while "uninterruptible means always suppress the interrupt no matter what". But the uninterruptible(fa >> interruptible(fb) >> fc) breaks either one of the guarantees.
So you have to choose: do you want resource leaks (by biasing in favor of the innermost in that context), or do you want possible deadlocks (by biasing in favor of the outermost)?
And you can't even phrase it as inner/outer, because you can do the same thing in reverse: uninterruptible(interruptible(fa >> uninterruptible(fb) >> fc))
..
as prior art here, Haskell tried all of these and ultimately decided that mask/poll was the sanest solution [Daniel Spiewak, Gitter]

An alternative architecture

"it's a mistake to think of interruptibility as being an attribute of threads or fibers. Instead it should be an attribute of the activities which run on the threads/fibers, and of necessity, that means that any interruptible activity must have it's own first class interrupt channel. If we go down that route then an activity is interruptible if it 1) has an interrupt channel and 2) it's interrupt channel is accessible. If it doesn't have an interrupt channel, or the channel is hidden somehow, then it's uninterruptible.

"Exposing an explicit interrupt channel on every blocking operation that we want to be interruptible is obviously a lot more laborious than just firing random interrupts at globally visible threads/fibers and hoping for the best, but I think it's the only way to go."
[Miles Sabin]

How does this affect Effectful Systems?

Integrating Scala code that uses effectful libraries with Java IO code can cause problems.

Gavin Bisesi @Daenyth Feb 05 21:16, 2020
InputStream is always a blocking api
(note you don't need much to make a blocker; Blocker.apply gives a Resource of one)

Daniel Spiewak @djspiewak Feb 05 21:46, 2020
FYI, all things involving files are blocking except on Windows, and even then they're blocking most of the time.
So the "NIO stuff" that is inside of getResourceAsStream is actually not NIO but rather regular IO wrapped up with a thread pool :-(
I generally use Blocker just to be safe on resource access. It doesn't really cost that much in terms of syntax
... non-blocking things must have a callback-driven API
either directly (via callbacks passed to functions) or indirectly (via Future or CompletableFuture)
If something doesn't have a callback API, then you know it's blocking

When can you cancel?

The effect of a cancellation is felt at every asynch boundary or every N flatMaps (where N=1 for ZIO, it seems).

Note that there is no code in ZIO nor Cats that calls Thread.interrupt() that I could find. Note however that ZIO still gives you the ability to wrap your code in a Future and cancelling this will lead to an InterruptedException (see this gist).

Note that there are still undefined areas in Cats regarding cancellation:

Raas Ahsan @RaasAhsan Jul 07 20:26
calling cancel right after start results in non-deterministic behavior

Fabio Labella @SystemFw Jul 08 19:38
yeah I wanted to say
fa.guarantee(foo) doesn't guarantee that foo will always happen
it guarantees that if fa happens, then foo always happen
In particular if you have fa.guarantee(foo).start.flatMap(_.cancel), foo might happen or not, because the program can be cancelled before fa gets scheduled to run

See tip #2 at this Cats video (An Introduction to Interruption by Jakub Kozlowskiat 11'03" ) where starting and joining in a for comprehension is an anti-pattern (what if one fails to complete?). Instead, one should use (ioa, iob).parTupled.

Saturday, November 9, 2013

Journeys in Networks

After stress testing my pet project, JStringServer, I initially thought I'd made a big boo-boo as the performance was appalling. However, it turned out that my home router was not up to the job. So, on a friend's recommendation, I bought a TP-Link 5-Port Gigbait Desktop Switch. Performance was better but not that great. A quick Google showed I needed Cat 6 cables to make full use of it. D'oh.

OK, so after a few trips to my local electronics store, I set up a newish 1.8GHz i7 Mac Book Air (with a USB network adaptor) and an old 2009 Mac Book Pro trying to hammer my 16 core Linux desktop running my Java NIO server code.

The strategy JStringServer was using was one-thread-does-everything (code here). That is, it listens to the selector associated with the server socket, associates any clients who have connected with a second selector dedicated to clients, checks this second selector for any activity and services them. Although htop shows this thread to be very busy, the rest of the system was doing next to nothing.

The clients were averaging about 6 000 calls per second between them. Now, with a back-of-a-beer-mat calculation, a payload of about 10 000 bytes (ignoring the 2 bytes return from the server) and 6 000 calls per second, this means the network was taking something like 480 gigabits/second (10 000 * 6 000 * 8 / 1 000 000). Not bad, but why not better?

TcpDump

Since JStringServer is currently using just TCP, it turns out that there is a lot of overhead on the network acknowledging the packets the client is sending the server.

If we run tcpdump and capture its output thus:

$ sudo tcpdump -nn host 192.168.1.94 and 192.168.1.65 -i p7p1 > tcpdump_jstringserver_2machines_normalUse.txt

we see as many packets are going to the server (192.168.1.94) as the other way:

$ grep -c "192.168.1.94.8888 >" tcpdump_jstringserver_2machines_normalUse.txt
1996027
$ grep -c "> 192.168.1.94.8888" tcpdump_jstringserver_2machines_normalUse.txt
2005298

So, the figure of 480 gigabits/second seems to be as good as we're going to get on this particular hardware using TCP (2 * 480 ~ 1 gigabit limit).

The return packets that carry the acknowledgement can also carry data [1]. There show up in tcpdump as [P.] where P stands for a PUSH of data and '.' represents an acknowledgement [2]. But since in this particular example, our server replies with very terse responses compared to very verbose requests, this doesn't save us much. A lot of packets are wasted just acknowledging:

$ grep -c -P "192.168.1.94.8888 \>.* \[\.\], ack \\d\\d" tcpdump_jstringserver_2machines_normalUse.txt

1427585

That's about 70% of all traffic from the server to the client.

Another problem with TCP is the handshake uses a lot of packets (as a percentage of the total package used in a connection).

For SYN:

$ grep -c " \[S\]" tcpdump_jstringserver_2machines_normalUse.txt

120675

For SYN-ACK

$ grep -c " \[S\.\]" tcpdump_jstringserver_2machines_normalUse.txt

118371

and for ACK (handshake only):

$ grep -c -P "\[\.\], ack 1," tcpdump_jstringserver_2machines_normalUse.txt

113403

That totals 17% of the total traffic. In this particular example, this connection pooling would solve this.

[1] Does tcp send a syn-ack on every packet or only on the first connection StackOverflow.
[2] tcpdump man pages

Saturday, July 6, 2013

Common NIO mistakes

Just for grins, I've been writing my own NIO server. Ostensibly, it's to provide a server that does not need to create lots of String objects (whose constant creation and destruction can hammer performance). Secondarily, I'm learning more about the minutiea of NIO.

These are the silliest mistakes I've made so far:

Buffer Reading and Writing

Most Java developers never have to worry about this as it's all hidden by your container of choice. But if you roll your own, remember the order for reading is read-flip-get. That is:

ReadableByteChannel.read(ByteBuffer)
Buffer.flip()
ByteBuffer.get(...)

For writing, it's put-flip-write, that is:

ByteBuffer.put(...)
Buffer.flip()
WritableByteChannel.write(ByteBuffer)

ServerSocketChannel Backlogs

Don't forget to set this in the bind method or you may be left wondering why your clients start throwing exceptions under a modest load. If too many connections swamp the serber, client's will see exceptions that say something like "connection reset by peer".

Call register() and select() in the Same Thread

It's not obvious, but you must call AbstractSelectableChannel.register and Selector.select(...) in the same thread [1].

[1] Stack Overflow

Agile Java Man