Skip to content

explanation of map reduce

shimondoodkin edited this page Oct 30, 2010 · 26 revisions

the structure

function map() //executed many times for each row in the table
{
 this.column_name
 emit(groupkey,value);
}

function reduce(groupkey, rows_for_the_key) // executed for each group at the end after all map calls ware executed.
{
 //rows_for_the_key, is like an array of  rows of data that returned from .find() function but the vals_array[i]._id is the same in several rows.
 return some_value
}

the result of map-reduce execution is a temporary collection it should be dropped at the end of use.

suppose you have this sql statment:

select my_aggregation_function(a) from mytable group by mycolumn

the reduce is the aggregation function

the map is a filtering/preparing function of data that will be passed to the aggregation function

because you group by the same value there will be several rows with same group key. map emits a group key and the associated value.

later the reduce function receives an array of rows of map emits, all the rows in the array are with same group key. the reduce calculates what ever, and returns an answer. to the answer an _id is added equals to the group key name and the answer is stored as value.

so map reduce is an extremely detailed way to define the aggregation function and use it.

select my_aggregation_function(a) from mytable group by mycolumn

so the sql statement above could be converted to this map reduce:

function map() //executed many times for each row in the table
{
 emit(this.mycolumn,this.a);
}

function reduce(groupkey, rows_for_the_key) // executed for each group at the end after all map calls ware executed.
{
 // suppose my_aggregation_function will do summery of values
 var summery=0;
 for(var i=0;i<rows_for_the_key.length;i++)
 {
  summery+=rows_for_the_key[i];
 }
 //rows_for_the_key, is like an array of  rows of data that returned from .find() function but the vals_array[i]._id is the same in several rows.
 return {summery:some_value}
}

advanced map reduce usage:

to group by several columns you can put an object of several values in the group key argument in the emit function function map() //executed many times for each row in the table { emit({this.mycolumn,this.mycolumn_b},this.a); }

to do inner join you can call emit several times

function map() //executed many times for each row in the table
{
 // probably you can do:
 //db.somecollection.find() ... for(...) emit(...)...

 emit(this.mycolumn, { a:this.a , b:0 });
 emit(this.mycolumn, { a:this.a , b:1 });
}

to return all element in the row just emit the this.

function map() //executed many times for each row in the table { emit(this.mycolumn,this); }

in reduce you can not return the vals array because it is technically a cursor but you can go over it and convert it to array. also you cannot return an array because there is a bug in mongodb. reduce must return an object, there is no basis for that, seems it is from some old code.

an example of how to do a grouped by count: (it is little messed up, copied and pasted from some code i had)

        var map = function() {  emit( this.t12_group , 1) ; };
        var reduce = function(key, vals) {
             var sum = 0; 
             for(var i in vals ) sum += parseFloat(vals[i]); 
             return sum;
        };
        var options = {query:where};
        var execute = function(err, mrCollection) {// callback
            if (mrCollection){
                mrCollection.find({} , function(err, cursor) {
                    cursor.toArray(function(err, items) {
                        callback(items);
                        mrCollection.drop();
                    })
                });
            }
        };
        this.collection.mapReduce(map , reduce , options , execute);