Map, based on immutable trees

Saturday, 06 February 2010

To end with immutable trees, at least for now, we've implemented IDictionary<K, V>. It's named Map<K, V>. Functionally it looks very like SortedDictionary<K, V>. there are some differences, however:

Map in contrast to SortedDictionary is very cheap on copy.
Bacause Map is based on AVL tree, which is more rigorly balanced than RB tree, so it's a little bit faster asymptotically for lookup than SortedDictionary, and a little bit slower on modification.
Due to the storage structure: node + navigator, Map consumes less memory than SortedDictionary, and is probably cheaper for GC (simple garbage graphs).
As AVL tree stores left and right subtree sizes, in contrast to a "color" in RB tree, we able to index data in two ways: with integer index, and with key value.

Sources are:

Update:

It was impossible to withstand temptation to commit some primitive performance comparision. Map outperforms SortedDictionary both in population and in access. this does not aggree with pure algorithm's theory, but there might be other unaccounted factors: memory consumption, quality of implementation, and so on.

Program.cs is updated with measurements.

Update 2:

More occurate tests show that for some key types Map's faster, for others SortedDictionary's faster. Usually Map's slower during population (mutable AVL tree navigator may fix this). the odd thing is that Map<string, int> is faster than SortedDictionary<string, int> both for allocaction and for access. See excel report.

Update 3:

Interesing observation. The following table shows maximal and average tree heights for different node sizes in AVL and RB trees after a random population:

	AVL		RB
Size	Max	Avg	Max	Avg
10	4	2.90	5	3.00
50	7	4.94	8	4.94
100	8	5.84	9	5.86
500	11	8.14	14	8.39
1000	12	9.14	16	9.38
5000	15	11.51	18	11.47
10000	16	12.53	20	12.47
50000	19	14.89	23	14.72
100000	20	15.90	25	15.72
500000	25	18.26	28	18.27
1000000	25	19.28	30	19.27

Here, according with theory, the height of AVL tree is shorter than the height of RB tree. But what is most interesting is that the depth of an "average node". This value describes a number of steps required to find a random key. RB tree is very close and often is better than AVL in this regard.

Saturday, 06 February 2010 18:31:13 UTC

Comments [0] -
Thinking aloud | Tips and tricks

All comments require the approval of the site owner before being displayed.

Name *
E-mail
Home page

	Remember Me
Comment (Some html is allowed: `a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u`) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.
Enter the code shown (prevents robots):
Live Comment Preview