Why not allow an external interface to provide hashCode/equals for a HashMap?
With a TreeMap
it's trivial to provide a custom Comparator
, thus overriding the semantics provided by Comparable
objects added to the map. HashMap
s however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap
(or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map
problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).
java collections hashmap trove4j
add a comment |
With a TreeMap
it's trivial to provide a custom Comparator
, thus overriding the semantics provided by Comparable
objects added to the map. HashMap
s however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap
(or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map
problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).
java collections hashmap trove4j
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01
add a comment |
With a TreeMap
it's trivial to provide a custom Comparator
, thus overriding the semantics provided by Comparable
objects added to the map. HashMap
s however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap
(or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map
problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).
java collections hashmap trove4j
With a TreeMap
it's trivial to provide a custom Comparator
, thus overriding the semantics provided by Comparable
objects added to the map. HashMap
s however cannot be controlled in this manner; the functions providing hash values and equality checks cannot be 'side-loaded'.
I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap
(or a new class)? Something like this, except with better names:
interface Hasharator<T> {
int alternativeHashCode(T t);
boolean alternativeEquals(T t1, T t2);
}
class HasharatorMap<K, V> {
HasharatorMap(Hasharator<? super K> hasharator) { ... }
}
class HasharatorSet<T> {
HasharatorSet(Hasharator<? super T> hasharator) { ... }
}
The case insensitive Map
problem gets a trivial solution:
new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);
Would this be doable, or can you see any fundamental problems with this approach?
Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)
EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)
EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.
EDIT: Accepted answer with relation to performance; would love a more specific answer!
EDIT: There is an implementation; see the accepted answer below.
EDIT: Rephrased the first sentence to indicate more clearly that it's the side-loading I'm after (and not ordering; ordering does not belong in HashMap).
java collections hashmap trove4j
java collections hashmap trove4j
edited May 23 '17 at 11:53
Community♦
11
11
asked Oct 17 '08 at 23:06
volleyvolley
5,77912327
5,77912327
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01
add a comment |
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01
add a comment |
9 Answers
9
active
oldest
votes
A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap
(in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap
is commons-collections' own IdentityMap
(only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator
" to a Map
instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
add a comment |
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
add a comment |
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
add a comment |
Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.
add a comment |
Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String
, you can write a wrapper class around String
for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}
add a comment |
good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
add a comment |
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
add a comment |
This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap
) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap
and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look atcom.google.common.base.Equivalence
and theirCustomConcurrentHashMap
, that's exactly it.
– maaartinus
Apr 18 '11 at 10:49
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
|
show 2 more comments
There's such a feature in com.google.common.collect.CustomConcurrentHashMap
, unfortunately, there's currently no public way how to set the Equivalence
(their Hasharator
). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f214136%2fwhy-not-allow-an-external-interface-to-provide-hashcode-equals-for-a-hashmap%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
9 Answers
9
active
oldest
votes
9 Answers
9
active
oldest
votes
active
oldest
votes
active
oldest
votes
A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap
(in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap
is commons-collections' own IdentityMap
(only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator
" to a Map
instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
add a comment |
A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap
(in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap
is commons-collections' own IdentityMap
(only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator
" to a Map
instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
add a comment |
A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap
(in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap
is commons-collections' own IdentityMap
(only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator
" to a Map
instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.
A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap
(in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:
protected int hash(Object key) { ... }
protected boolean isEqualKey(Object key1, Object key2) { ... }
protected boolean isEqualValue(Object value1, Object value2) { ... }
protected HashEntry createEntry(
HashEntry next, int hashCode, Object key, Object value) { ... }
An example implementation of such an alternative HashedMap
is commons-collections' own IdentityMap
(only up to 3.2.2 as Java has its own since 1.4).
This is not as powerful as providing an external "Hasharator
" to a Map
instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.
edited Nov 15 '18 at 0:41
caco3
1,8481719
1,8481719
answered Nov 17 '13 at 12:43
Lukas EderLukas Eder
134k72438967
134k72438967
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
add a comment |
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
1
1
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
PlusOne. You might want to update that link to AbstractHashedMap to point to v4 which finally has generics.
– Nicolai
Dec 30 '14 at 9:59
add a comment |
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
add a comment |
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
add a comment |
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.
answered Oct 17 '08 at 23:40
Jon SkeetJon Skeet
1088k68979438440
1088k68979438440
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
add a comment |
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
And it accounts for the concept that there may be more than one concept of equality for a given type.
– Kenogu Labz
Mar 28 '16 at 20:18
add a comment |
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
add a comment |
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
add a comment |
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
edited Mar 4 '15 at 20:55
answered Dec 31 '14 at 23:31
Craig P. MotlinCraig P. Motlin
18.4k1586120
18.4k1586120
add a comment |
add a comment |
Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.
add a comment |
Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.
add a comment |
Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.
Trove4j has the feature I'm after and they call it hashing strategies.
Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.
edited Dec 9 '09 at 20:48
answered Dec 9 '09 at 20:42
volleyvolley
5,77912327
5,77912327
add a comment |
add a comment |
Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String
, you can write a wrapper class around String
for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}
add a comment |
Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String
, you can write a wrapper class around String
for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}
add a comment |
Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String
, you can write a wrapper class around String
for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}
Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.
You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.
With all that being said, if you just want your hashing to ignore the case of a String
, you can write a wrapper class around String
for this purpose and insert those in your data structure instead.
A simple implementation:
public class LowerStringWrapper {
public LowerStringWrapper(String s) {
this.s = s;
this.lowerString = s.toLowerString();
}
// getter methods omitted
// Rely on the hashing of String, as we know it to be good.
public int hashCode() { return lowerString.hashCode(); }
// We overrode hashCode, so we MUST also override equals. It is required
// that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
// restore that invariant.
public boolean equals(Object obj) {
if (obj instanceof LowerStringWrapper) {
return lowerString.equals(((LowerStringWrapper)obj).lowerString;
} else {
return lowerString.equals(obj);
}
}
private String s;
private String lowerString;
}
edited Oct 18 '08 at 18:33
answered Oct 17 '08 at 23:09
hazzenhazzen
12.2k53533
12.2k53533
add a comment |
add a comment |
good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
add a comment |
good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
add a comment |
good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.
good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.
answered Oct 18 '08 at 1:35
james
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
add a comment |
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
– volley
Oct 18 '08 at 14:21
add a comment |
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
add a comment |
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
add a comment |
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
answered Oct 18 '08 at 15:43
volleyvolley
5,77912327
5,77912327
add a comment |
add a comment |
This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap
) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap
and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look atcom.google.common.base.Equivalence
and theirCustomConcurrentHashMap
, that's exactly it.
– maaartinus
Apr 18 '11 at 10:49
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
|
show 2 more comments
This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap
) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap
and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look atcom.google.common.base.Equivalence
and theirCustomConcurrentHashMap
, that's exactly it.
– maaartinus
Apr 18 '11 at 10:49
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
|
show 2 more comments
This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap
) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap
and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.
This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.
This contrasts with a balanced binary search tree (TreeMap
) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.
It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap
and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.
In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.
answered Oct 18 '08 at 16:26
Daniel SpiewakDaniel Spiewak
46.5k1098119
46.5k1098119
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look atcom.google.common.base.Equivalence
and theirCustomConcurrentHashMap
, that's exactly it.
– maaartinus
Apr 18 '11 at 10:49
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
|
show 2 more comments
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look atcom.google.common.base.Equivalence
and theirCustomConcurrentHashMap
, that's exactly it.
– maaartinus
Apr 18 '11 at 10:49
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
1
1
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
– Adam Rosenfield
Oct 18 '08 at 16:49
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
– Daniel Spiewak
Oct 18 '08 at 16:53
1
1
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
– volley
Oct 21 '08 at 20:09
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look at
com.google.common.base.Equivalence
and their CustomConcurrentHashMap
, that's exactly it.– maaartinus
Apr 18 '11 at 10:49
You're completely wrong, just look at the Hasharator, there's nothing like ordering there. Look at
com.google.common.base.Equivalence
and their CustomConcurrentHashMap
, that's exactly it.– maaartinus
Apr 18 '11 at 10:49
1
1
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
Before telling me I'm completely wrong, it might be worth looking at the edit history of the question. What I replied to was something very different from what we have now.
– Daniel Spiewak
Apr 18 '11 at 20:00
|
show 2 more comments
There's such a feature in com.google.common.collect.CustomConcurrentHashMap
, unfortunately, there's currently no public way how to set the Equivalence
(their Hasharator
). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.
add a comment |
There's such a feature in com.google.common.collect.CustomConcurrentHashMap
, unfortunately, there's currently no public way how to set the Equivalence
(their Hasharator
). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.
add a comment |
There's such a feature in com.google.common.collect.CustomConcurrentHashMap
, unfortunately, there's currently no public way how to set the Equivalence
(their Hasharator
). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.
There's such a feature in com.google.common.collect.CustomConcurrentHashMap
, unfortunately, there's currently no public way how to set the Equivalence
(their Hasharator
). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.
I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.
edited Apr 18 '11 at 11:29
answered Apr 18 '11 at 10:54
maaartinusmaaartinus
27.1k2195235
27.1k2195235
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f214136%2fwhy-not-allow-an-external-interface-to-provide-hashcode-equals-for-a-hashmap%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
"This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time." -- HashMap's Javadocs. In other words, HashMap isn't ordered.
– Powerlord
Dec 9 '09 at 20:55
This statement allows ANY hashCode implementation to be used and also allows the Map to resize itself as it goes. So this is a feature and not a problem in this context?
– volley
Dec 9 '09 at 21:01