Faster flatMaps with Stream::mapMulti in Java 16 // nipafx

Faster flatMaps with Stream::mapMulti in Java 16

source code

Want to play around with the code yourself? Check out the repository Java X Demo, a project showcasing all important Java 9-16 features - it contains many of the snippets shown in this blog post. It has a permissive license, so you can reuse the code for your projects.

table of contents

share & follow

Share this post with your community:

I'm active on various platforms. Watch this space or follow me there to get notified when I publish new content:

▼

src

toc

s+f

Share this post with your community:

I'm active on various platforms. Watch this space or follow me there to get notified when I publish new content:

Stream::flatMap is declarative: "Give me a function that maps each element to a stream and I give you a stream of elements back." The mild annoyance of having to turn collections into streams aside, this works very well. But it has two drawbacks:

flatMap says: "Give me a function that maps to a stream"

some collections aren't a Collection and turning them into a Stream can be cumbersome
creating a lot of small or even empty Stream instances can drag performance down

▚Enter `Stream::mapMulti`

Java 16 addedd a new stream method:

// plus wildcards
<R> Stream<R> mapMulti(BiConsumer<T, Consumer<R>> mapper)

You call it on a Stream<T> to map a single element of type T to multiple elements of type R. So far, so flatMap, but in contrast to that method, you don't pass a function that turns T into Stream<R>. Instead you pass a "function" that receives a T and can emit arbitrary many Rs by passing them to a Consumer<R> (that it also receives). I say "function" because it's actually a bi-consumer that doesn't return anything.

▚A Pointless Example

Here's an example where we don't actually do anything:

Stream.of(1, 2, 3, 4)
	.mapMulti((number, downstream) -> downstream.accept(number))
	.forEach(System.out::print);
// prints "1234"

Our BiConsumer<Integer, Consumer<T>> is called for each element in the stream [1, 2, 3, 4] and each time it simply passes the given number to the downstream consumer. Hence, each number is mapped to itself and so the resulting stream is also [1, 2, 3, 4].

Pointless.

▚An `Optional` Example

It gets a bit more interesting with Optional, where the performance-part of the argument against flatMap can apply. Before we measure that, let's see how to use it here:

Stream
	.of(Optional.of("0"), Optional.of("1"), Optional.empty())
	.mapMulti(Optional::ifPresent) // !!!
	.forEach(System.out::print);

Did you spot the sleek Optional::ifPresent? In case you're wondering why that method reference works here, that's perfectly understandable - it does some heavy lifting. This is what it would look like with a long-form lambda:

.mapMulti(
	(Optional<String> element, Consumer<String> downstream)
		-> element.ifPresent(downstream))

This works because ifPresent accepts a Consumer, which is what downstream happens to be. So the lambda takes the first argument that it receives (of type Optional) and calls a method on it (ifPresent) that accepts all the remaining lambda arguments (one of type Consumer). That's exactly what the Type::method-style method reference (where method is not static) was made for - hence Optional::ifPresent.

▚A Practical Example

Now let's see a more practical example (thanks to Jake for giving me a good idea for one 🙏). Say there's a data structure and the only option it offers to traverse it is with a visitor:

interface Structure<T> {

	// implementation will traverse the structure
	// and pass each element to the visitor
	accept(StructureVisitor<T> visitor)

}

@FunctionalInterface
interface StructureVisitor<T> {

	visit(T element);

}

If you now want to turn a Stream<Structure<T>> into a Stream<T>, flatMap is really unhandy. You'd need to write a method that turns a Structure<T> into a Stream<T>, which would probably mean creating a visitor that adds all visited elements to a collection and then exposes that.

// `CollectingVisitor` implementation goes here

Stream<Structure<T>> structures = // ...
Stream<T> elements = structures.flatMap(structure -> {
	StructureVisitor<T> visitor = new CollectingVisitor<>();
	structure.accept(visitor);
	return visitor.collectedElements.stream();
});

Works, but not exactly elegant. Now let's see it with mapMulti:

// no `CollectingVisitor` needed

Stream<Structure<T>> structures = // ...
Stream<T> elements = structures.mapMulti(
	(structure, downstream) -> structure.accept(downstream::accept));

Much better. And it gets better yet if StructureVisitor extends Consumer or is outright replaced with it:

Stream<Structure<T>> structures = // ...
Stream<T> elements = structures.mapMulti(Structure::accept);

(All of that said, I find easy creation of streams important enough that I'd probably still create a method that accepts a Structure<T> and returns a Stream<T> - it would essentially be the body of the lambda above that gets passed to flatMap. And once I have that, I find flatMap the better choice because it has stronger semantics and is more well-known than mapMulti.)

▚The Unfortunate Type Witness

One thing that quickly becomes apparent when you work with mapMulti is that it confuses the compiler to the point where generic type inference breaks down. Let's revisit the first example, but with a slight twist - now we want to collect to a list:

List<Integer> numbers = Stream.of(1, 2, 3, 4)
	.mapMulti((number, downstream) -> downstream.accept(number))
	.collect(toList());

Simple, right? Unfortunately not.

Just like its sister flatMap, mapMulti changes the stream elements' type and called on a Stream<T> returns a Stream<R>. It does that by a passing, as a second argument to the lambda, a Consumer<R>. And what consumer can consume an Integer? A Consumer<Integer> of course. Or a Consumer<Serializable>. Or a Consumer<Comparable<Integer>>. Or the all-powerful Consumer<Object>.

And so the compiler doesn't know what to infer, gives up (translation: "picks Consumer<Object>") and the stream returned by mapMulti is Stream<Object>. That can't be collected to a List<Integer> and so the snippet above gives a compile error:

error: incompatible types: inference variable T has incompatible bounds

That sucks. Solution: add a type witness for the Consumer's generic type parameter R.

List<Integer> numbers = Stream.of(1, 2, 3, 4)
	.<Integer> mapMulti((number, down) -> down.accept(number))
	.collect(toList());

List<String> strings = Stream
	.of(Optional.of("0"), Optional.of("1"), Optional.of(""))
	.<String> mapMulti(Optional::ifPresent)
	.collect(toList());

Not horrible, but makes mapMulti a little less enticing.

▚`Stream<Optional>` Performance

Ok, let's look at the performance. I'm no expert at this (so take everything with a pack of salt), but I did some benchmarks for a Stream<Optional<Integer>>.

▚Benchmarks

I measured the following methods...

private List<Optional<Integer>> numbers;

@Benchmark
public long flatMap_count() {
	return numbers.stream()
		.flatMap(Optional::stream)
		.count();
}

@Benchmark
public long mapMulti_count() {
	return numbers.stream()
		.mapMulti(Optional::ifPresent)
		.count();
}

@Benchmark
public int flatMap_sum() {
	return numbers.stream()
		.flatMap(Optional::stream)
		.mapToInt(i -> i)
		.sum();
}

@Benchmark
public int mapMulti_sum() {
	return numbers.stream()
		.<Integer> mapMulti(Optional::ifPresent)
		.mapToInt(i -> i)
		.sum();
}

... where numbers has 10k, 100k, or 1M optionals with 1%, 10%, 50%, or 80% of them empty (distributed randomly).

▚Results

I ran three forks, each with three warmup and as many measurement runs per benchmark method. I gave each method 5 seconds. System:

JDK 16-ea+19-985
JMH 1.23
Gentoo Linux with 5.8.16 kernel
Ryzen 9 3900X
2 x 16GB G.Skill Trident Z b/w, DDR4-3600

Here are my raw results.

▚Interpretation

Trying to make sense of them, I compared otherwise identical configurations of flatMap vs mapMulti:

count and sum
10k, 100k, and 1M optionals
1%, 10%, 50%, and 80% empty

That's 24 comparisons in total. Here are the speedups of mapMulti over flatMap:

`count`	10k	100k	1M
1%	8.2	6.3	4.9
10%	10.1	6.4	5.5
50%	6.9	3.1	0.4
80%	14.0	4.7	4.8
---	---	---	---
`sum`	10k	100k	1M
1%	6.2	6.3	0.7
10%	6.4	6.2	0.8
50%	11.5	3.3	0.7
80%	11.3	4.3	4.6

A speedup > 1 means mapMulti is faster so at first glance, this looks pretty good. But I struggle to make sense of many of these numbers. For example, what's up with the wide margin and inconsistent impact of the share of empty optionals?

It looks pretty good for mapMulti

It's also pretty surprising (to me) that the speedup of mapMulti over flatMap not only decreases as numbers of elements increase, it even drops below 1, meaning mapMulti becomes slower than flatMap. Looking at the raw measurements again, we can see that this is the result of flatMap getting some ridiculous speedups at 1 million elements:

Benchmark	%0s	Size	Score ± Error (us/op)
flatMap_count	0.5	10'000	101.504 ± 3.363
flatMap_count	0.5	100'000	1150.309 ± 17.065
flatMap_count	0.5	1'000'000	1561.187 ± 324.065
flatMap_sum	0.01	10'000	113.009 ± 6.977
flatMap_sum	0.01	100'000	1158.694 ± 74.973
flatMap_sum	0.01	1'000'000	1622.151 ± 533.694
flatMap_sum	0.1	10'000	108.073 ± 1.227
flatMap_sum	0.1	100'000	1155.964 ± 54.148
flatMap_sum	0.1	1'000'000	1777.393 ± 453.216
flatMap_sum	0.5	10'000	113.230 ± 5.485
flatMap_sum	0.5	100'000	1284.879 ± 63.869
flatMap_sum	0.5	1'000'000	2906.395 ± 259.311

You can see that from 10k to 100k elements, it takes roughly 10x the time (as expected), but from 100k to 1M it's well below that, somewhere around 1.5x. These are all the instances where that happens and you can see that these are exactly the cases where mapMulti's speedup collapses. Why does flatMap get so fast, but mapMulti doesn't? Your guess is as good as mine. Actually, chances are decent that your guess is better. 😉

My conclusion is that mapMulti has the potential to be much faster than flatMap, but this may not always materialize. Fortunately, the way to figure that out for your project is the same way you want to measure any performance work: benchmarks of your actual system with real-life data.

▚Reflection

If you're in a situation where flatMap doesn't quite work because you can't easily turn the element into a Stream or when it's just too slow because of the many Stream instance it creates, give mapMulti a try. It accepts a lambda that gets each stream element in turn together with a Consumer that you can pass arbitrary many elements into to show up in the next stream operation.

This makes it a bit more imperative, which gives you more leeway in turning a single element into many elements. It also prevents the creation of Stream instances, which may improve performance, but while superficial benchmarks are generally favorable, the speedup varies and may even be below 1.

One thing to note is that you will most likely need to add a type witness when using mapMulti, which makes it less convenient than flatMAp. Not only for that reason, the latter should remain your default when mapping a single to multiple elements.

That said, there's at least one very cool thing that you can abuse mapMulti for (hint), but more on that in another post.

▚Enter Stream::mapMulti