160

How to remove duplicates from an Array<String?> in kotlin?

jturolla
  • 6,257
  • 7
  • 25
  • 40
  • If someone is looking for consecutive characters to remove then visit https://handyopinion.com/method-to-remove-consecutive-characters-from-a-string-in-kotlin-android/ – Asad Ali Choudhry Apr 20 '20 at 22:08

1 Answers1

309

Use the distinct extension function:

val a = arrayOf("a", "a", "b", "c", "c")
val b = a.distinct() // ["a", "b", "c"]

There's also distinctBy function that allows one to specify how to distinguish the items:

val a = listOf("a", "b", "ab", "ba", "abc")
val b = a.distinctBy { it.length } // ["a", "ab", "abc"]

As @mfulton26 suggested, you can also use toSet, toMutableSet and, if you don't need the original ordering to be preserved, toHashSet. These functions produce a Set instead of a List and should be a little bit more efficient than distinct.


You may find useful:

Community
  • 1
  • 1
hotkey
  • 127,445
  • 33
  • 333
  • 310
  • 9
    You can also use `toSet` or `toMutableSet` which have less overhead than `distinct` and if ordering does not matter you can use `toHashSet`. – mfulton26 Nov 07 '16 at 14:40
  • @mfulton26, certainly it doesn't always have overhead. E.g. a JPA entity object can have lazy loaded fields, so it's more efficient to distinct its collection by id than perform full comparison – Buckstabue Jan 06 '19 at 20:03
  • 1
    @Buckstabue if you only need a Collection back (and it doesn't matter if it is a List or a Set) then using a Collection optimized for unique elements will be more efficient. The current implementation of distinct uses toMutableSet() in its implementation and then converts it to a List so by using toSet et. al. directly you avoid the extra intermediary Collection instance ([kotlin/_Arrays.kt:9145-9155 at master · JetBrains/kotlin](https://github.com/JetBrains/kotlin/blob/90c787e1023554bd0bc435071a78b9986f001068/libraries/stdlib/common/src/generated/_Arrays.kt#L9148-L9155)). – mfulton26 Jan 07 '19 at 17:37
  • @mfulton26, please re-read my comment. I said "it is NOT ALWAYS efficient" and gave you an example where .distinctBy{ } can be useful. Sometimes calling equals() and hashCode() is a very expensive thing – Buckstabue Jan 08 '19 at 22:27
  • 3
    @Buckstabue I see, I believe we're talking about two different issues: 1) `to*Set` is more efficient (space & time) than `distinct[By]` because it returns the `Set` directly instead of using a `Set` internally and converting it to a `List` as its return value and 2) `distinctBy` is can be more efficient than `distinct` simply because you can avoid full object equality comparison. Both are valid points. I ran with your statement that "certainly it doesn't always have overhead" and I was replying to that and overlooked that you were comparing `distinct` with `distinctBy` (and not with `to*Set`). – mfulton26 Jan 09 '19 at 13:49
  • 1
    @mfulton26, you are correct. I mostly meant that sometimes it's better to use List + distinctBy than Set, because Set intensively use equals/hashCode which potentially might be expensive to call – Buckstabue Jan 10 '19 at 08:39
  • 4
    At time of writing, `Iterable.distinct` actually does `toMutableSet().toList()` internally. So don't worry about performance :-) – Luke Mar 30 '20 at 14:30
  • @Luke, using `.toHashSet()` should be a little faster than `.toMutableSet()` or `.toSet()`, as those create a linked hash set, which has additional links between the elements, consuming a little more time to construct than an ordinary hash set. And also, `.toHashSet()` or `.toSet()` don't spend time converting the result back to a list. But except for hot, performance-critical code, the difference should be negligible. – hotkey Mar 30 '20 at 14:56
  • Please note that `distinct` or `toSet` won't work on a list of arrays. I.e List of ByteArray or List of CharArray – aldok Oct 21 '20 at 02:34
  • @aldok That's because `distinct` and `toSet` rely on `equals` for comparison and arrays don't implement `equals` (they just inherit it form `Object` which uses `==` for equality). I.e. arrays are only equal if they refer to the same array object. You could use `distinctBy { it.toList() }` as a quick solution, but that converts the arrays to lists, then the list of lists to a set, which compares the lists by contents and removes duplicates, and then converts the set back to a list, so it's probably not the most effective way of doing that. – Dario Seidl Jul 27 '21 at 12:51