71

What's the data structure in Java that has the fastest operation for contains() ?

e.g. i have a set of numbers { 1, 7, 12, 14, 20... }

Given another arbitrary number x, what's the fastest way (on average) to generate the boolean value of whether x is contained in the set or not? The probability for !contains() is about 5x higher.

Do all the map structures provide o(1) operation? Is HashSet the fastest way to go?

pnuts
  • 56,678
  • 9
  • 81
  • 133
codechobo
  • 809
  • 1
  • 7
  • 12

4 Answers4

132

look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations. they have O(1) for contains()

This cheatsheet is of great help.

Aravind Yarram
  • 76,625
  • 45
  • 224
  • 313
  • 8
    For what it's worth, hash maps in general aren't O(1) in lookup when hash collisions occur (and they can happen pretty often, if very few at a time). Worst case is O(n) in lookup. – Blindy Jul 17 '10 at 07:27
  • I agree with Blindy. Performance of hash based collection is limited by performance of hash function. – sbidwai Jul 17 '10 at 07:42
  • When I went recently, the site was down. If this happens to you, you can use this [link](http://web.archive.org/web/20120105103844/http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf) – EasilyBaffled Apr 28 '13 at 18:17
  • Page cannot be crawled or displayed due to robots.txt (correct the link plz) – iluvatar_GR Jul 14 '13 at 14:33
9

For your particular case of numbers with a relatively high density I'd use a BitSet, it will be faster and much smaller than a hash set.

starblue
  • 53,481
  • 14
  • 94
  • 148
5

The only data structure faster than HashSet is likely to be TIntHashSet from Trove4J. This uses primitives avoiding the need to use Integer Objects.

If the number of integers is small, you can create a boolean[] where each value present is turned into a "true". This will be O(1). Note: HashSet is not guarenteed to be O(1) and is more likely to be O(log(log(N))).

You will only get O(1) for an ideal hash distribution. However, it is more likely you will get collisions of hashed buckets and some lookups will need to check more than one value.

Peter Lawrey
  • 513,304
  • 74
  • 731
  • 1,106
-2

hashing(hash set) is the best one with O(1)

Satish
  • 203
  • 5
  • 9
  • 9
    There's 8 minutes between your answer and Pangea's. Yours adds no extra value, so why post it? – Bart Kiers Jul 16 '10 at 18:12
  • 1
    @bart real slow internet connection –  Jul 16 '10 at 18:49
  • @Will, perhaps. If so, then by now @Satish had time enough to remove his/her redundant answer. Yet s/he chooses to let it dangle. Perhaps in the hope of collection some points? Maybe that was the intention to begin with, who knows? – Bart Kiers Jul 16 '10 at 18:52
  • 9
    @Bart: your up/down vote ratio suggests you're a little hard toward people. Relax ^_^ – Phil Jul 17 '10 at 07:53
  • 2
    @Po, for your information, I'm not being hard to people but to the answers people give of which I think are wrong or do not add any new information. Before I vote such answers down, I usually first give a comment why I think the answer is wrong/inappropriate. If the OP does not take action, or explains him/herself, I cast my down vote. That's what SO is all about after all! If the OP would have added some extra information to his/her answer or simply removed it, I wouldn't have down voted in this case. Again, I am not voting people down (or up), but the answers they give. A big difference. – Bart Kiers Jul 17 '10 at 08:24
  • @Po, one more thing, it's all rather subjective me being "harsh" with my down-votes. What if someone who voted 1000 times with just 1 down vote comments on your vote-ratio? In his/her eyes, **you** are also harsh. – Bart Kiers Jul 17 '10 at 08:28
  • Ok ok I'm sorry for my superficial comment :D – Phil Jul 17 '10 at 08:49