Categories
Planet PLUG Programming

Extract n largest values from a Python dict

Having stashed a bunch of data in a python dict, one often needs a quick way to extract the n largest.  Inconveniently for the particular problem I was working on, the key value for the dict was an object and the associated value was a floating-point weight.  A quick-and-dirty method is to use insert all the dict-keys into a heap, using the dict-value as the heap-key Fortuitously, Python has both a heap module and convenient functions for using it in exactly this way.

import heapq

def dict_nlargest(d,n):
    return heapq.nlargest(n ,d, key = lambda k: d[k])

d = { 'a':10.0, 'b':2.0, 'c':5.0, 'd':3.0, 'e':15.0 }
print dict_nlargest(d,3)

Which neatly prints a list of the keys associated with the three largest values:

['e', 'a', 'c']

The heapq.nlargest() function can also use arbitrary object attributes as keys, as well, by supplying a one-argument function to the named parameter key. (This is the role of the lambda in the examples above and below.)

import heapq
class Toy:
    def __init__Toy(self, s, i, f):
        self.s = s
        self.i = i
        self.f = f

l = [ Toy('a', 10, 4.4), Toy('b', 4, 12.0),
      Toy('c', 2, 20.0), Toy('d', 1, 5.0) ]

# largest by the 'f' attribute
print heapq.nlargest(2, l, key = lambda o: o.f)

# largest by the 'i' attribute
print heapq.nlargest(2, l, key = lambda o: o.i)

This is fine for a one-off, but you’ll likely want to maintain this information in a data structure if this you’ll be performing this operation multiple times on a large collection.