[m-dev.] Hash table performance

Ralph Becket rafe at cs.mu.OZ.AU
Fri Jan 30 17:11:39 AEDT 2004


Zoltan Somogyi, Friday, 30 January 2004:
> On 30-Jan-2004, Ralph Becket <rafe at cs.mu.OZ.AU> wrote:
> > A spot of phorensic profiling shows why: while the hashing functions are
> > fine (virtually no collisions at about 50% occupancy), every version
> > hash table read incurrs at least three array accesses and every version
> > hash table write incurrs at least four array accesses and seven or eight
> > heap allocations.
> 
> What is the code doing with all those accesses and allocations?

The closed addressing hash table structure includes
- a bitmap to indicate bucket occupancy,
- an array of keys to identify bucket ownership,
- an array of values.
Then assuming the first probe is successful...

READING
- we read the bitmap entry to determine that the bucket is occupied,
- we read the key entry to verify bucket ownership,
- we read the value entry as the result.

WRITING
- we read the bitmap entry to determine that the bucket is empty,
- we set bitmap entry to show that the bucket is now occupied,
- we set the key entry,
- we set the value entry.
Each set operation for a version array incurrs one allocation.
There is an allocation for the updated hash table structure.
For a total of four allocations (so I miscounted in my original e-mail.)

> > The lessons of the experience are:
> > - open addressing can be much better than closed addressing;
> 
> "Can be" is not enough for many purposes, and open addressing has much
> worse worst case behavior than separate chaining.

Well, the performance of any hash table implementation relies upon good
hash functions.  It's not obvious to me that closed addressing is that
much more robust in the face of a poor hashing function.

At least in this benchmark, open addressing did much better.

It occurrs to me that I forgot to attach the results files, so I've
attached them to this e-mail.

-- Ralph
-------------- next part --------------
(read 45392 words from /usr/share/dict/words)

benchmark: 33 x creating and populating with 45392 words...
map:                        11840ms
vhash_table:                 7100ms
hash_table:                  4930ms
        map / vhash_table =    1.67
vhash_table / hash_table  =    1.44

benchmark: 33 x searching for 45392 words when populated with 45392...
map:                        16510ms
vhash_table:                11360ms
hash_table:                  7100ms
        map / vhash_table =    1.45
vhash_table / hash_table  =    1.60

benchmark: 33 x searching each key when populated with 45392...
map:                        16920ms
vhash_table:                11090ms
hash_table:                  7300ms
        map / vhash_table =    1.53
vhash_table / hash_table  =    1.52

benchmark: 66 x creating and populating with 22696 words...
map:                        12880ms
vhash_table:                12130ms
hash_table:                  8250ms
        map / vhash_table =    1.06
vhash_table / hash_table  =    1.47

benchmark: 66 x searching for 45392 words when populated with 22696...
map:                        24460ms
vhash_table:                17410ms
hash_table:                 11450ms
        map / vhash_table =    1.40
vhash_table / hash_table  =    1.52

benchmark: 66 x searching each key when populated with 22696...
map:                        22360ms
vhash_table:                14920ms
hash_table:                  9900ms
        map / vhash_table =    1.50
vhash_table / hash_table  =    1.51

benchmark: 133 x creating and populating with 11348 words...
map:                        17760ms
vhash_table:                12740ms
hash_table:                  8150ms
        map / vhash_table =    1.39
vhash_table / hash_table  =    1.56

benchmark: 133 x searching for 45392 words when populated with 11348...
map:                        14250ms
vhash_table:                14660ms
hash_table:                  9790ms
        map / vhash_table =    0.97
vhash_table / hash_table  =    1.50

benchmark: 133 x searching each key when populated with 11348...
map:                         8640ms
vhash_table:                 8410ms
hash_table:                  5550ms
        map / vhash_table =    1.03
vhash_table / hash_table  =    1.52

benchmark: 266 x creating and populating with 5674 words...
map:                         6040ms
vhash_table:                 5230ms
hash_table:                  3380ms
        map / vhash_table =    1.15
vhash_table / hash_table  =    1.55

benchmark: 266 x searching for 45392 words when populated with 5674...
map:                        19300ms
vhash_table:                21650ms
hash_table:                 13980ms
        map / vhash_table =    0.89
vhash_table / hash_table  =    1.55

benchmark: 266 x searching each key when populated with 5674...
map:                         7740ms
vhash_table:                 7490ms
hash_table:                  5070ms
        map / vhash_table =    1.03
vhash_table / hash_table  =    1.48

benchmark: 533 x creating and populating with 2837 words...
map:                         5410ms
vhash_table:                 4680ms
hash_table:                  3040ms
        map / vhash_table =    1.16
vhash_table / hash_table  =    1.54

benchmark: 533 x searching for 45392 words when populated with 2837...
map:                        29250ms
vhash_table:                36000ms
hash_table:                 23330ms
        map / vhash_table =    0.81
vhash_table / hash_table  =    1.54

benchmark: 533 x searching each key when populated with 2837...
map:                         6840ms
vhash_table:                 6720ms
hash_table:                  4350ms
        map / vhash_table =    1.02
vhash_table / hash_table  =    1.54

benchmark: 1066 x creating and populating with 1419 words...
map:                         4570ms
vhash_table:                 4330ms
hash_table:                  2820ms
        map / vhash_table =    1.06
vhash_table / hash_table  =    1.54

benchmark: 1066 x searching for 45392 words when populated with 1419...
map:                        47460ms
vhash_table:                65330ms
hash_table:                 42090ms
        map / vhash_table =    0.73
vhash_table / hash_table  =    1.55

benchmark: 1066 x searching each key when populated with 1419...
map:                         6140ms
vhash_table:                 6450ms
hash_table:                  4180ms
        map / vhash_table =    0.95
vhash_table / hash_table  =    1.54

benchmark: 2133 x creating and populating with 710 words...
map:                         4000ms
vhash_table:                 4240ms
hash_table:                  2710ms
        map / vhash_table =    0.94
vhash_table / hash_table  =    1.56

benchmark: 2133 x searching for 45392 words when populated with 710...
map:                        81240ms
vhash_table:               123550ms
hash_table:                 78910ms
        map / vhash_table =    0.66
vhash_table / hash_table  =    1.57

benchmark: 2133 x searching each key when populated with 710...
map:                         5250ms
vhash_table:                 6250ms
hash_table:                  3980ms
        map / vhash_table =    0.84
vhash_table / hash_table  =    1.57

benchmark: 4266 x creating and populating with 355 words...
map:                         3490ms
vhash_table:                 4230ms
hash_table:                  2690ms
        map / vhash_table =    0.83
vhash_table / hash_table  =    1.57

benchmark: 4266 x searching for 45392 words when populated with 355...
map:                       140090ms
vhash_table:               240740ms
hash_table:                154010ms
        map / vhash_table =    0.58
vhash_table / hash_table  =    1.56

benchmark: 4266 x searching each key when populated with 355...
map:                         4560ms
vhash_table:                 6190ms
hash_table:                  3970ms
        map / vhash_table =    0.74
vhash_table / hash_table  =    1.56
-------------- next part --------------
(read 45392 words from /usr/share/dict/words)

benchmark: 33 x creating and populating with 45392 words...
map:                        11780ms
vhash_table:                 6270ms
hash_table:                  5010ms
        map / vhash_table =    1.88
vhash_table / hash_table  =    1.25

benchmark: 33 x searching for 45392 words when populated with 45392...
map:                        14820ms
vhash_table:                 9050ms
hash_table:                  7130ms
        map / vhash_table =    1.64
vhash_table / hash_table  =    1.27

benchmark: 33 x searching each key when populated with 45392...
map:                        12890ms
vhash_table:                 7790ms
hash_table:                  7350ms
        map / vhash_table =    1.65
vhash_table / hash_table  =    1.06

benchmark: 66 x creating and populating with 22696 words...
map:                        10260ms
vhash_table:                 6200ms
hash_table:                  5840ms
        map / vhash_table =    1.65
vhash_table / hash_table  =    1.06

benchmark: 66 x searching for 45392 words when populated with 22696...
map:                        12120ms
vhash_table:                 7440ms
hash_table:                  7390ms
        map / vhash_table =    1.63
vhash_table / hash_table  =    1.01

benchmark: 66 x searching each key when populated with 22696...
map:                        10240ms
vhash_table:                 6260ms
hash_table:                  5840ms
        map / vhash_table =    1.64
vhash_table / hash_table  =    1.07

benchmark: 133 x creating and populating with 11348 words...
map:                         7230ms
vhash_table:                 4550ms
hash_table:                  3670ms
        map / vhash_table =    1.59
vhash_table / hash_table  =    1.24

benchmark: 133 x searching for 45392 words when populated with 11348...
map:                        13720ms
vhash_table:                 8360ms
hash_table:                  9100ms
        map / vhash_table =    1.64
vhash_table / hash_table  =    0.92

benchmark: 133 x searching each key when populated with 11348...
map:                         8390ms
vhash_table:                 5290ms
hash_table:                  4910ms
        map / vhash_table =    1.59
vhash_table / hash_table  =    1.08

benchmark: 266 x creating and populating with 5674 words...
map:                         5710ms
vhash_table:                 3870ms
hash_table:                  2950ms
        map / vhash_table =    1.48
vhash_table / hash_table  =    1.31

benchmark: 266 x searching for 45392 words when populated with 5674...
map:                        18900ms
vhash_table:                11370ms
hash_table:                 13690ms
        map / vhash_table =    1.66
vhash_table / hash_table  =    0.83

benchmark: 266 x searching each key when populated with 5674...
map:                         7460ms
vhash_table:                 4930ms
hash_table:                  4540ms
        map / vhash_table =    1.51
vhash_table / hash_table  =    1.09

benchmark: 533 x creating and populating with 2837 words...
map:                         5030ms
vhash_table:                 3670ms
hash_table:                  2850ms
        map / vhash_table =    1.37
vhash_table / hash_table  =    1.29

benchmark: 533 x searching for 45392 words when populated with 2837...
map:                        28530ms
vhash_table:                17530ms
hash_table:                 23200ms
        map / vhash_table =    1.63
vhash_table / hash_table  =    0.76

benchmark: 533 x searching each key when populated with 2837...
map:                         6850ms
vhash_table:                 4730ms
hash_table:                  4330ms
        map / vhash_table =    1.45
vhash_table / hash_table  =    1.09

benchmark: 1066 x creating and populating with 1419 words...
map:                         4570ms
vhash_table:                 3570ms
hash_table:                  2780ms
        map / vhash_table =    1.28
vhash_table / hash_table  =    1.28

benchmark: 1066 x searching for 45392 words when populated with 1419...
map:                        47000ms
vhash_table:                29680ms
hash_table:                 41710ms
        map / vhash_table =    1.58
vhash_table / hash_table  =    0.71

benchmark: 1066 x searching each key when populated with 1419...
map:                         6210ms
vhash_table:                 4640ms
hash_table:                  4160ms
        map / vhash_table =    1.34
vhash_table / hash_table  =    1.12

benchmark: 2133 x creating and populating with 710 words...
map:                         4020ms
vhash_table:                 3510ms
hash_table:                  2680ms
        map / vhash_table =    1.15
vhash_table / hash_table  =    1.31

benchmark: 2133 x searching for 45392 words when populated with 710...
map:                        80180ms
vhash_table:                53180ms
hash_table:                 78290ms
        map / vhash_table =    1.51
vhash_table / hash_table  =    0.68

benchmark: 2133 x searching each key when populated with 710...
map:                         5330ms
vhash_table:                 4460ms
hash_table:                  3990ms
        map / vhash_table =    1.20
vhash_table / hash_table  =    1.12

benchmark: 4266 x creating and populating with 355 words...
map:                         3470ms
vhash_table:                 3460ms
hash_table:                  2680ms
        map / vhash_table =    1.00
vhash_table / hash_table  =    1.29

benchmark: 4266 x searching for 45392 words when populated with 355...
map:                       138880ms
vhash_table:                98710ms
hash_table:                152050ms
        map / vhash_table =    1.41
vhash_table / hash_table  =    0.65

benchmark: 4266 x searching each key when populated with 355...
map:                         4620ms
vhash_table:                 4340ms
hash_table:                  3990ms
        map / vhash_table =    1.06
vhash_table / hash_table  =    1.09



More information about the developers mailing list