[m-dev.] hash tables in standard library

Jonathan Morgan jonmmorgan at gmail.com
Fri Aug 15 14:18:42 AEST 2008


On Fri, Aug 15, 2008 at 11:37 AM, Ralph Becket <rafe at csse.unimelb.edu.au> wrote:
> Jonathan Morgan, Friday, 15 August 2008:
>> On Wed, Aug 13, 2008 at 1:27 PM, Zoltan Somogyi <zs at csse.unimelb.edu.au> wrote:
>> > On 07-Aug-2008, Julien Fischer <juliensf at csse.unimelb.edu.au> wrote:
>> >> The two hash table modules in the standard library  (hash_table.m and
>> >> version_hash_table.m) both have broken delete operations.  PeterW
>> >> has suggested that we replacing the existing implementation with
>> >> one that uses separate chaining; I agree with him.
>> >>
>> >> (See the notes for bug #68 for details)
>> >
>> > No. In fact, (on an unrelated note) I always thought the time we spent in
>> > 433-253 on describing open addressing versions of hashing to be time
>> > ill-spent, since I have always thought that in real life, only chaining
>> > is practical.
>>
>> Python is a language which relies very heavily on dictionaries, and
>> its dictionary uses an open addressing hashing scheme.  This is not
>> only practical, but used by hundreds of thousands of people.
>>
>> I personally would almost always use separate chaining, but apparently
>> the open addressing scheme used has better memory usage and better
>> locality, making it ideal for Python where the dictionary must be
>> extremely efficient.
>
> My understanding is that Python uses hash tables as method tables, from
> which you never delete anything.  Does it also use them for associative
> arrays?

It uses them for associative arrays (dictionaries in Python
terminology).  These are used almost everywhere:

If you have keyword arguments to a function, they are passed in a dictionary:

def func(x, y=2):
  # do something.

func(x=1, y=2)

[you can also do things like capture all keyword arguments to a function:
def generic_func(*args, **kwargs):
    # args is a tuple, kwargs is a dictionary.
    pass
]

Class and instance variables are stored in a dictionary:
class X:
    x = 2
    def __init__(self, y):
        self.y = y

X.__dict__ == dict(x=2, __init__ = <init function>)

X(3).__dict__ == dict(y=3)

and of course dictionaries can be (and frequently are) used in code:
x = dict()
x[1] = 3
x['a'] = 4
del x['a']
...

Dictionaries are resized at around 2/3 full.  While elements can be
deleted, it is never resized to take less space.  There are various
other optimisations to make all of these underlying operations work as
fast as can be expected from working with a dictionary rather than a
fixed location in memory.

Jon
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at csse.unimelb.edu.au
Administrative Queries: owner-mercury-developers at csse.unimelb.edu.au
Subscriptions:          mercury-developers-request at csse.unimelb.edu.au
--------------------------------------------------------------------------



More information about the developers mailing list