Monday, October 7, 2013

Java 8 - Using Parallel Streams

Java 8 is not slated to come out until Spring 2014, but the features are so enticing that I just had to download the snapshot, and give it a test run. After messing with python the past few weeks the Java 8 feature that I am most looking forward to is lambda support. The second feature I'm digging is the new Stream API.

Now these are both very useful APIs on their own, but I really like how Python uses them with the functions filter, map, and reduce. You can read about that here if you'd like. Java 8 provides the same functionality, but it is a bit more verbose. On the plus side, it gives you a few extra useful features that Python does not. (At least that I know of, I'm a Python n00b.)

Note: All example code can be found here.

My first test was just a few simple filters, and maps.

public void test1()
{
    List strings = Arrays.asList(
        "One","Two","Three","Four","Five");
    List longerThanThree = strings
        .stream()
        .filter(s -> s.length() > 3)
        .collect(Collectors.toList());

    List uppers = strings
        .stream()
        .map(s -> s.toUpperCase())
        .collect(Collectors.toList());
 
    List beginsWithTBothCase = strings
        .stream()
        .filter(s -> s.startsWith("T"))
        .map(s -> s.toLowerCase() + s.toUpperCase())
        .collect(Collectors.toList());


    System.out.println(strings);
    System.out.println(longerThanThree);
    System.out.println(uppers);
    System.out.println(beginsWithTBothCase);
}

The output:

[One, Two, Three, Four, Five]
[Three, Four, Five]
[ONE, TWO, THREE, FOUR, FIVE]
[twoTWO, threeTHREE]

The previous example is pretty basic. One of the more advanced features that Java has built into the new Stream API is the ability to easily split up streams and process their contents in parallel. This turns out to be incredibly easy to utilize, and the following example outlines a trivial program that processes a large list of String objects.

To get a large data file, I grabbed the file /usr/share/dict/linux.words file, which has about 500k unique words in it. The following method will return this as a list.

public static List getLinuxWords() throws Exception
{
    BufferedReader br = new BufferedReader(
        new FileReader(
            new File("linux.words")));
    List linuxWords = new ArrayList();
    String line = br.readLine();
    while(line != null)
    {
        linuxWords.add(line);
        line = br.readLine();
    }
    return linuxWords;

}
Next, I wrote a stream processing method. The map lambda doesn't do anything important, it just adds some work to keep the CPU busy.
public void processStream(Stream wordStream)
    throws Exception
{
    Long start = System.currentTimeMillis();

    List beginsWithTBothCase = wordStream
        .map(s -> {
            for(int i = 0; i < 100; i++)
            {
                s = s.toLowerCase().toUpperCase();
            }
            return s;
        })
        .collect(Collectors.toList());

    long totalTime = System.currentTimeMillis() - start;
    System.out.println(
        String.format(
            "Task took %.3f seconds to execute.",
            totalTime/1000f)
    );
}

Now, I test the method. Twice with a normal stream, and twice with a parallel stream. One thing to note is that a Stream is like an Iterable object. It can only be traversed once. It is also not a normal List or Array, which is why it must be collected.

public static void main(String[] args) throws Exception
{
    System.out.println("Welcome to Java 8");
    JavaEightTests jet = new JavaEightTests();
    List words = JavaEightTests.getLinuxWords();
    jet.processStream(words.stream());
    jet.processStream(words.stream());
    jet.processStream(words.parallelStream());
    jet.processStream(words.parallelStream());
}
Task took 10.876 seconds to execute.
Task took 10.682 seconds to execute.
Task took 3.054 seconds to execute.
Task took 3.142 seconds to execute.

It is easy to see that the Stream API was able to split the stream up and process it in parallel resulting in a massive speed up. While this is a trivial example, I can see parallel streams being very useful in helping Java programmers make their applications more performant on multi-core systems.

If you'd like to run this example on your machine, install the Java 8 JDK snapshot, and check out my code which I've posted to GitHub.