Dumbo is a project that allows you to easily write and run
Hadoop
programs in Python (it’s named after Disney’s flying circus elephant,
since the logo of Hadoop is an elephant and Python was named after the BBC series “Monty Python’s Flying Circus”). More generally,
Dumbo can be considered to be a convenient Python API for writing MapReduce programs.
def mapper(key, value):
for word in value.split(): yield word, 1
def reducer(key, values):
yield key, sum(values)
if __name__ == "__main__":
import dumbo
dumbo.run(mapper, reducer, combiner=reducer)
Documentation
Development