Parallel operations on Kotlin collections?

Question

In Scala, one can easily do a parallel map, forEach, etc, with:

collection.par.map(..)

Is there an equivalent in Kotlin?

Some of the fastest parallel collections around are from GS-Collections: https://github.com/goldmansachs/gs-collections ... which you can use from Kotlin (as any Java collection framework can be used) — Jayson Minard, Jan 11 '16 at 12:20

score 60 · Accepted Answer · edited Feb 18 '19 at 09:34

60

The Kotlin standard library has no support for parallel operations. However, since Kotlin uses the standard Java collection classes, you can use the Java 8 stream API to perform parallel operations on Kotlin collections as well.

e.g.

myCollection.parallelStream()
        .map { ... }
        .filter { ... }

edited Feb 18 '19 at 09:34

Graham Russell

872
13
23

answered Jan 09 '16 at 22:56

yole

87,251
17
245
186

2

how can one use Java 8 stream API in Kotlin? – LordScone Feb 20 '16 at 14:36
2

@LordScone The same way as you'd do it in Java. E.g.: `myCollection.parallelStream().map { ... }. filter { ... }` – Grzegorz D. Sep 11 '17 at 12:26
just for information, parallelStream is only advised in case of long lists, for smaller lists, it will be an overhead to run the 2 jobs in different threads as in java world. Stick to coroutines as far as possible – MozenRath Jan 13 '21 at 17:49
parallelStream is Java, not kotlin, that's why it requires Java 8 – Javier Delgado Sep 09 '21 at 08:44

score 40 · Answer 2 · answered Aug 21 '17 at 09:48

40

As of Kotlin 1.1, parallel operations can also be expressed quite elegantly in terms of coroutines. Here is pmap on lists:

fun <A, B>List<A>.pmap(f: suspend (A) -> B): List<B> = runBlocking {
    map { async(CommonPool) { f(it) } }.map { it.await() }
}

Note that coroutines are still an experimental feature.

answered Aug 21 '17 at 09:48

Alex Krauss

8,448
4
24
30

3

With Kotlin 1.3 out, is this still the best answer? I noticed @OlivierTerrien's Stream answer below, but I'd prefer to stick with Kotlin Sequences and Iterables. – Benjamin H Nov 01 '18 at 19:57
@BenjaminH Thanks; I have marked yole's answer as accepted, as it also refers to the stream API and posted before OlivierTerrien's answer. – HRJ Dec 31 '18 at 03:47
45

Quite elegantly? On the contrary, the code is pretty hard to read I would say. – Dzmitry Lazerka Feb 20 '19 at 22:29
3

@DzmitryLazerka I think I see where you're coming from, but this exact code isn't the elegant bit. The use of this code is what's elegant. If the above method is placed somewhere, it can be used with just `foo.pmap { v -> ... }`. I think that's fairly elegant. – Joseph Catrambone Jun 19 '19 at 21:16
1

Currently 'CommonPool' cannot be accessed - it is internal in 'kotlinx.coroutines'! – Strinder Jun 09 '20 at 19:41
This is in fact the best performing solution i think. It does not have any overheads like a parallelStream. Although would have loved to see a helper method from kotlin that could cater to it. – MozenRath Jan 13 '21 at 17:52
@Strinder You can use Dispatchers.Default instead of CommonPool, see https://discuss.kotlinlang.org/t/commonpool-default-for-coroutines/11965/3 and https://github.com/Kotlin/kotlinx.coroutines/pull/633 – BdN3504 Feb 10 '22 at 11:03

Holger Brandl · Answer 3 · 2018-10-02T12:48:34.480

16

There is no official support in Kotlin's stdlib yet, but you could define an extension function to mimic par.map:

fun <T, R> Iterable<T>.pmap(
          numThreads: Int = Runtime.getRuntime().availableProcessors() - 2, 
          exec: ExecutorService = Executors.newFixedThreadPool(numThreads),
          transform: (T) -> R): List<R> {

    // default size is just an inlined version of kotlin.collections.collectionSizeOrDefault
    val defaultSize = if (this is Collection<*>) this.size else 10
    val destination = Collections.synchronizedList(ArrayList<R>(defaultSize))

    for (item in this) {
        exec.submit { destination.add(transform(item)) }
    }

    exec.shutdown()
    exec.awaitTermination(1, TimeUnit.DAYS)

    return ArrayList<R>(destination)
}

(github source)

Here's a simple usage example

val result = listOf("foo", "bar").pmap { it+"!" }.filter { it.contains("bar") }

If needed it allows to tweak threading by providing the number of threads or even a specific java.util.concurrent.Executor. E.g.

listOf("foo", "bar").pmap(4, transform = { it + "!" })

Please note, that this approach just allows to parallelize the map operation and does not affect any downstream bits. E.g. the filter in the first example would run single-threaded. However, in many cases just the data transformation (ie. map) requires parallelization. Furthermore, it would be straightforward to extend the approach from above to other elements of Kotlin collection API.

edited Oct 02 '18 at 12:48

answered Feb 25 '16 at 21:22

Holger Brandl

9,360
1
58
58

I don't see how "destination.add(transform(item))" is thread safe. What's to keep two threads from calling "destination.add" at the same time, thus breaking stuff since ArrayList.add() is not a thread safe operation? – David Leppik Mar 22 '16 at 20:55
Thanks for the hint. Quite [some](http://stackoverflow.com/questions/2715983/concurrent-threads-adding-to-arraylist-at-same-time-what-happens) people think that when just adding elements it should be fine without synchronization. However, I've changed it to use a synchronized list to improve thread-safety. – Holger Brandl Mar 23 '16 at 10:25
1

The order in destination may not be the same as in the original list – Cuper Hector Mar 24 '16 at 12:08
I think many parallel collection implementations (like in [scala](http://beust.com/weblog/2011/08/15/scalas-parallel-collections/)) do not care about preserving order. Though, by changing the for-each loop above to an indexed loop along with downstream resorting, order could be preserved easily. – Holger Brandl Mar 29 '16 at 18:36
I'm interested in a version that returns a `Sequence` (or `Flow`). Unfortunately I can't simply have the whole code execute in an `= execute{` block and call `yield` instead of `destination.add` because `yield` can only execute in the original block, so within `exec.submit { }` is not an option. (Order need not be preserved.) – StephanS Nov 02 '19 at 02:44

score 16 · Answer 4 · answered Nov 06 '19 at 17:02

16

You can use this extension method:

suspend fun <A, B> Iterable<A>.pmap(f: suspend (A) -> B): List<B> = coroutineScope {
    map { async { f(it) } }.awaitAll()
}

See Parallel Map in Kotlin for more info

answered Nov 06 '19 at 17:02

Sharon

418
4
11

score 12 · Answer 5 · answered Sep 12 '18 at 21:57

12

From 1.2 version, kotlin added a stream feature which is compliant with JRE8

So, iterating over a list asynchronously could be done like bellow:

fun main(args: Array<String>) {
  val c = listOf("toto", "tata", "tutu")
  c.parallelStream().forEach { println(it) }
}

answered Sep 12 '18 at 21:57

OlivierTerrien

2,139
1
18
31

I am not following kotlin very closely; isn't this the same as Yole's answer?I appreciate that your answer has sample code. Maybe we can edit Yole's answer to add the sample code. – HRJ Dec 31 '18 at 03:49
@HRJ, not exactly. Yole said Kotlin had no support for stream which is true until version 1.2. Since this version, Kotlin provides a way to stream collections as Java8 does. – OlivierTerrien Jan 01 '19 at 12:13
Yole said "Kotlin has no support for parallel operations". Please check again. – HRJ Jan 01 '19 at 13:48
Yes you are right. Too quickly written. Parallel operations not stream. – OlivierTerrien Jan 01 '19 at 13:53
Probably worth pointing out that as this requires JRE8 it is only available on Android 24 and above. – Rupert Rawnsley May 28 '20 at 14:42

score 6 · Answer 6 · answered Nov 20 '18 at 23:44

Kotlin wants to be idiomatic but not too much synthetic to be hard to understand at a first glance.

Parallel computation trough Coroutines is no exception. They want it to be easy but not implicit with some pre-built method, allowing to branch the computation when needed.

In your case:

collection.map { 
        async{ produceWith(it) } 
    }
    .forEach { 
        consume(it.await()) 
    }

Notice that to call async and await you need to be inside a so called Context, you cannot make suspending calls or launching a coroutine from a non-coroutine context. To enter one you can either:

runBlocking { /* your code here */ }: it will suspend the current thread until the lambda returns.
GlobalScope.launch { }: it will execute the lambda in parallel; if your main finishes executing while your coroutines have not bad things will happen, in that case better use runBlocking.

Hope it may helps :)

While I appreciate Kotlin not wanting to be opaque, surely this is a common enough requirement to warrant an extension method? forEachParallel or something similar — Rupert Rawnsley, May 28 '20 at 15:26

score 4 · Answer 7 · edited Apr 12 '19 at 17:54

4

At the present moment no. The official Kotlin comparison to Scala mentions:

Things that may be added to Kotlin later:

Parallel collections

edited Apr 12 '19 at 17:54

Bhargav Rao

45,811
27
120
136

answered Jan 09 '16 at 19:25

Martin Konecny

54,197
19
131
151

StephanS · Answer 8 · 2019-11-04T22:57:52.770

This solution assumes that your project is using coroutines:

implementation( "org.jetbrains.kotlinx:kotlinx-coroutines-core:1.3.2")

The functions called parallelTransform don't retain the order of elements and return a Flow<R>, while the function parallelMap retains the order and returns a List<R>.

Create a threadpool for multiple invocations:

val numberOfCores = Runtime.getRuntime().availableProcessors()
val executorDispatcher: ExecutorCoroutineDispatcher =
    Executors.newFixedThreadPool(numberOfCores ).asCoroutineDispatcher()

use that dispatcher (and call close() when it's no longer needed):

inline fun <T, R> Iterable<T>.parallelTransform(
    dispatcher: ExecutorDispatcher,
    crossinline transform: (T) -> R
): Flow<R> = channelFlow {

    val items: Iterable<T> = this@parallelTransform
    val channelFlowScope: ProducerScope<R> = this@channelFlow

    launch(dispatcher) {
        items.forEach {item ->
            launch {
                channelFlowScope.send(transform(item))
            }
        }
    }
}

If threadpool reuse is of no concern (threadpools aren't cheap), you can use this version:

inline fun <T, R> Iterable<T>.parallelTransform(
    numberOfThreads: Int,
    crossinline transform: (T) -> R
): Flow<R> = channelFlow {

    val items: Iterable<T> = this@parallelTransform
    val channelFlowScope: ProducerScope<R> = this@channelFlow

    Executors.newFixedThreadPool(numberOfThreads).asCoroutineDispatcher().use { dispatcher ->
        launch( dispatcher ) {
            items.forEach { item ->
                launch {
                    channelFlowScope.send(transform(item))
                }
            }
        }
    }
}

if you need a version that retains the order of elements:

inline fun <T, R> Iterable<T>.parallelMap(
    dispatcher: ExecutorDispatcher,
    crossinline transform: (T) -> R
): List<R> = runBlocking {

    val items: Iterable<T> = this@parallelMap
    val result = ConcurrentSkipListMap<Int, R>()

    launch(dispatcher) {
        items.withIndex().forEach {(index, item) ->
            launch {
                result[index] = transform(item)
            }
        }
    }

    // ConcurrentSkipListMap is a SortedMap
    // so the values will be in the right order
    result.values.toList()
}

score 3 · Answer 9 · answered May 04 '21 at 15:11

3

Another approach I found to be quite elegant is something like this, using the kotlinx.coroutines library:

import kotlinx.coroutines.flow.asFlow

suspend fun process(myCollection: Iterable<Foo>) {
    myCollection.asFlow()
        .map { /* ... */ }
        .filter { /* ... */ }
        .collect { /* ... perform some side effect ... */ }
}

It does however require an extra dependency; kotlinx.coroutines is not in the stdlib.

answered May 04 '21 at 15:11

chris

2,315
1
22
33

5

This doesn't execute operations in parallel! Flows are sequential https://kotlinlang.org/docs/flow.html#flows-are-sequential, all operation will be executed sequentially for each emitted item, one by one. – BigSt Jan 16 '22 at 20:21

score 2 · Answer 10 · answered Oct 07 '21 at 13:43

2

I found this:

implementation 'com.github.cvb941:kotlin-parallel-operations:1.3'

details:

https://github.com/cvb941/kotlin-parallel-operations

answered Oct 07 '21 at 13:43

Milan Jurkulak

464
3
5

BigSt · Answer 11 · 2022-01-25T09:43:41.417

I've come up with a couple of extension functions:

The suspend extension function on Iterable<T> type, which does a parallel processing of items and returns some result of processing each item. By default it uses Dispatchers.IO dispatcher to offload blocking tasks to a shared pool of threads. Must be called from a coroutine (including a coroutine with Dispatchers.Main dispatcher) or another suspend function.

suspend fun <T, R> Iterable<T>.processInParallel(
    dispatcher: CoroutineDispatcher = Dispatchers.IO,
    processBlock: suspend (v: T) -> R,
): List<R> = coroutineScope { // or supervisorScope
    map {
        async(dispatcher) { processBlock(it) }
    }.awaitAll()
}

Example of calling from a coroutine:

val collection = listOf("A", "B", "C", "D", "E")

someCoroutineScope.launch {
    val results = collection.processInParallel {
        process(it)
    }
    // use processing results
}

where someCoroutineScope is an instance of CoroutineScope.

Launch and forget extension function on CoroutineScope, which doesn't return any result. It also uses Dispatchers.IO dispatcher by default. Can be called using CoroutineScope or from another coroutine.

fun <T> CoroutineScope.processInParallelAndForget(
    iterable: Iterable<T>,
    dispatcher: CoroutineDispatcher = Dispatchers.IO,
    processBlock: suspend (v: T) -> Unit
) = iterable.forEach {
    launch(dispatcher) { processBlock(it) }
}

Example of calling:

someoroutineScope.processInParallelAndForget(collection) {
    process(it)
}

// OR from another coroutine:

someCoroutineScope.launch {
    processInParallelAndForget(collection) {
        process(it)
    }
}

2a. Launch and forget extension function on Iterable<T>. It's almost the same as previous, but the extension type is different. CoroutineScope must be passed as argument to the function.

fun <T> Iterable<T>.processInParallelAndForget(
    scope: CoroutineScope,
    dispatcher: CoroutineDispatcher = Dispatchers.IO,
    processBlock: suspend (v: T) -> Unit
) = forEach {
    scope.launch(dispatcher) { processBlock(it) }
}

Calling:

collection.processInParallelAndForget(someCoroutineScope) {
    process(it)
}

// OR from another coroutine:

someScope.launch {
    collection.processInParallelAndForget(this) {
        process(it)
    }
}

Parallel operations on Kotlin collections?

11 Answers11

Linked