How to remove duplicates from an Array<String?> in kotlin?
Asked
Active
Viewed 8.4k times
160
-
If someone is looking for consecutive characters to remove then visit https://handyopinion.com/method-to-remove-consecutive-characters-from-a-string-in-kotlin-android/ – Asad Ali Choudhry Apr 20 '20 at 22:08
1 Answers
309
Use the distinct extension function:
val a = arrayOf("a", "a", "b", "c", "c")
val b = a.distinct() // ["a", "b", "c"]
There's also distinctBy function that allows one to specify how to distinguish the items:
val a = listOf("a", "b", "ab", "ba", "abc")
val b = a.distinctBy { it.length } // ["a", "ab", "abc"]
As @mfulton26 suggested, you can also use toSet, toMutableSet and, if you don't need the original ordering to be preserved, toHashSet. These functions produce a Set instead of a List and should be a little bit more efficient than distinct.
You may find useful:
-
9You can also use `toSet` or `toMutableSet` which have less overhead than `distinct` and if ordering does not matter you can use `toHashSet`. – mfulton26 Nov 07 '16 at 14:40
-
@mfulton26, certainly it doesn't always have overhead. E.g. a JPA entity object can have lazy loaded fields, so it's more efficient to distinct its collection by id than perform full comparison – Buckstabue Jan 06 '19 at 20:03
-
1@Buckstabue if you only need a Collection back (and it doesn't matter if it is a List or a Set) then using a Collection optimized for unique elements will be more efficient. The current implementation of distinct uses toMutableSet() in its implementation and then converts it to a List so by using toSet et. al. directly you avoid the extra intermediary Collection instance ([kotlin/_Arrays.kt:9145-9155 at master · JetBrains/kotlin](https://github.com/JetBrains/kotlin/blob/90c787e1023554bd0bc435071a78b9986f001068/libraries/stdlib/common/src/generated/_Arrays.kt#L9148-L9155)). – mfulton26 Jan 07 '19 at 17:37
-
@mfulton26, please re-read my comment. I said "it is NOT ALWAYS efficient" and gave you an example where .distinctBy{ } can be useful. Sometimes calling equals() and hashCode() is a very expensive thing – Buckstabue Jan 08 '19 at 22:27
-
3@Buckstabue I see, I believe we're talking about two different issues: 1) `to*Set` is more efficient (space & time) than `distinct[By]` because it returns the `Set` directly instead of using a `Set` internally and converting it to a `List` as its return value and 2) `distinctBy` is can be more efficient than `distinct` simply because you can avoid full object equality comparison. Both are valid points. I ran with your statement that "certainly it doesn't always have overhead" and I was replying to that and overlooked that you were comparing `distinct` with `distinctBy` (and not with `to*Set`). – mfulton26 Jan 09 '19 at 13:49
-
1@mfulton26, you are correct. I mostly meant that sometimes it's better to use List + distinctBy than Set, because Set intensively use equals/hashCode which potentially might be expensive to call – Buckstabue Jan 10 '19 at 08:39
-
4At time of writing, `Iterable.distinct` actually does `toMutableSet().toList()` internally. So don't worry about performance :-) – Luke Mar 30 '20 at 14:30
-
@Luke, using `.toHashSet()` should be a little faster than `.toMutableSet()` or `.toSet()`, as those create a linked hash set, which has additional links between the elements, consuming a little more time to construct than an ordinary hash set. And also, `.toHashSet()` or `.toSet()` don't spend time converting the result back to a list. But except for hot, performance-critical code, the difference should be negligible. – hotkey Mar 30 '20 at 14:56
-
Please note that `distinct` or `toSet` won't work on a list of arrays. I.e List of ByteArray or List of CharArray – aldok Oct 21 '20 at 02:34
-
@aldok That's because `distinct` and `toSet` rely on `equals` for comparison and arrays don't implement `equals` (they just inherit it form `Object` which uses `==` for equality). I.e. arrays are only equal if they refer to the same array object. You could use `distinctBy { it.toList() }` as a quick solution, but that converts the arrays to lists, then the list of lists to a set, which compares the lists by contents and removes duplicates, and then converts the set back to a list, so it's probably not the most effective way of doing that. – Dario Seidl Jul 27 '21 at 12:51