Threads, Line-Synchronization, and OutputStream: ThreadLocals and Java I/O

The other day, our team ran across an interesting design problem related to Java synchronization. Without getting into the details of why this problem came up, the gist of it was that we had to somehow to write a synchronization wrapper around OutputStream. (Well, technically, it was a BufferedWriter…but the issue is the same.) This wrapper needed to correctly allows multiple unsynchronized writer threads to write to an underlying writer, each thread atomically writing lines of text. We wound up managing to avoid having to do this by changing our interface, but the solution to the OutputStream problem still provides an interesting look at a lesser-known aspect of the Java Runtime: ThreadLocal variables.

To illustrate the problem, consider this simpler version of the OutputStream interface. (Unlike the full version, this interface eliminates the two bulk-write forms of the write method, which don’t matter for this conversation.)

interface SimpleOutputStream {
    void write(int byte);
}

The simplest way to writing a synchronized wrapper for this interface is to wrap each call to the underlying implementation method in a synchronized block. This wrapper then guarantees that each thread entering write will have exclusive and atomic access to write its byte to the underlying implementation.

    public void write(int byte)
    {
        synchronized(underlying)
        {
            underlying.write(byte);
        }
    }

The difficulty with this approach is that the locking is too fine grained to produce a coherent stream of output bytes on the underlying writer. A context switch in the middle of a line of output text will cause that line to contain bytes from two source threads, with no way to distinguish one thread’s data from another. The only thing the locking has provided is a guarantee that we won’t have multiple threads reentrantly calling into the underlying stream. What we need is a way for our wrapper to buffer the lines of text on a per-thread basis, and then write full lines within a synchronized block, and this is where ThreadLocal comes into play.

An instance of ThreadLocal is exactly what it sounds like: a container for a value that is local to a thread. Each thread that gets a value from an instance of a ThreadLocal gets its own, unique copy of that value. Ignoring how exactly it might work, this is the abstraction that will enable our implementation of OutputStream to buffer lines of text from each writer thread, prior to writing them out. The key is is in the specific use of the thread local.

    ThreadLocal threadBuf = new ThreadLocal() {
        protected StringBuffer initialValue() {
            return new StringBuffer();
        }
    };

This code allocates an instance of a local derivation of ThreadLocal specialized to hold a StringBuffer. The overrided initialValue method determines the initial value of the ThreadLocal for each thread that retrieves a value – it is called once for each thread. In this specific case, it allocates a new StringBuffer for each thread that requests a value from threadBuf. This provides us with a line buffer per thread. From here on out, the implementation is largely connecting the dots.

First, we need an implementation of write that buffers into threadBuf:

public void write(int b)
      throws IOException
{
    threadBuf.get().append((char)b);

    if (b == (int)'\n')
        flush();
}

The get method pulls the thread local buffer out of threadBuf. Note that because this buffer is thread local, it is not shared, so we don’t need to synchronize on it.

Second, we need an implementation of flush that atomically writes the current thread line buffer to the underlying output:

    protected void flush()
        throws IOException
    {
        synchronized(underlying) {
            underlying.write(threadBuf.get().toString().getBytes());

            threadBuf.get().setLength(0);
        }
    }

Here we do need synchronization, because underlying is shared between threads, even if the threadBuf is not.

While there are still a few implementation details to worry about, this provides the bulk of the functionality we were originally looking for. Multiple threads can write into an output stream wrapped in this way, and the output will be synchronized on a per-line basis. If you’d like to experiment with the idea a bit, or read through an implementation, I’ve put code on github here: https://github.com/mschaef/tls-writer. This sample project contains a synchronized output stream, as well as a test program that launches multiple threads, all writing to System.out, via our synchronized. A command line flag lets you turn the wrapper off to see the ill effects of finer grained synchronization.

Tags: , , , , Categories: Java
Trackback URL for this post: https://www.ksmpartners.com/2013/06/threads-line-synchronization-and-outputstream-threadlocals-and-java-io/trackback/

3 Responses to Threads, Line-Synchronization, and OutputStream: ThreadLocals and Java I/O

  1. Pingback: Thread Local State and its Interaction with Thread Pools |

  2. alex Comments:

    The aproach described in the article is very interesting. Thanks.

    Note that one could enhance the flush method:


    protected void flush()
    throws IOException
    {
    synchronized(underlying) {
    underlying.write(threadBuf.get().toString().getBytes());
    }

    threadBuf.get().setLength(0); // no need to synchronize here
    }

    Thread local operations do not need synchronization because they do not interact between threads. Note also that if the underlying.write method is guaranteed to be atomical there is no need to synchronize at all.

    After looking at this implementation https://github.com/mschaef/tls-writer/blob/master/src/main/java/com/ksmpartners/tlswriter/ConcurrentLineStream.java I have found some bugs. The most critical is absence of synschronization in the close method:


    public void close()
    throws IOException
    {
    for(StringBuffer lbuf : threadLineBuffers) // no synchronization! at least visibility problem!
    flushBuffer(lbuf);

    underlying.close();
    }

    If one doesn’t use synchronization while traversing the list he gets NO guarantee to see the changes ever made to it.

    Other problems include:
    – Situations with closing the stream more then one time from one thread (or even worse from many threads)
    – The implementation of flush method doesn’t invoke flush method of the underlying output stream. This means that there is little chance that the data will reach its destination. It is an important java idiom to invoke flush for underlying stream when flushing your stream.
    – Such an implementation has no limit on how many bytes are written before flush. It can lead to out of memory errors.
    – One could accidentally modify the underlying and the threadLineBuffers fields (they should be final)
    – Implementing only write(int) method is ineffective. There is an important java idiom to implement write(byte[], int, int) method each time you implement write(int) method.

  3. Michael Schaeffer Comments:

    Alex,

    Thanks for the comments… the close method does need some work. I agree with you on the need for synchronization, and also the fact that it should tolerate multiple calls to close.

    A couple other comments:

    – The trouble with modifying flush in the way you propose is that the point of the class is to batch up writes to the underlying stream on line boundaries. Each thread gets the guarantee that each line of text it writes will make it to the underlying stream in one atomic chunk of text. Unless you could guarantee that the client calls to flush happen on line boundaries, this guarantee would no longer hold. If a thread called flush in the middle of writing a line of text, the line would be written to the underlying stream in two atomic writes rather than the expected one.

    – Agreed on the out of memory errors. To make this commercially ready, there needs to be a hard limit on the line length, and an exception that gets thrown when a thread overruns that limit.

    – All the write(int) method is doing is appending to a memory buffer that expands itself in exponentially increasing chunks. The writes to the underlying (presumably physical I/O) stream are done as a batch. There probably are scenarios where the append to the memory buffer presents a performance issue, but I can’t imagine there’d be all that many. This is one of those cases where I’d want performance numbers justifying the work before optimizing it further.

    Thanks,
    Mike

Leave a Reply

Your email address will not be published. Required fields are marked *