-
Notifications
You must be signed in to change notification settings - Fork 11
explanation of map reduce
the structure:
a map function, in it you use an emit function. "this" in map is the data.
function map() //executed many times for each row in the table
{
this.column_name
emit(groupkey,value);
}
the reduce function.
function reduce(groupkey, rows_for_the_key) // executed for each group at the end after all map calls ware executed.
{
//rows_for_the_key, is like an array of rows of data that returned from .find() function but the vals_array[i]._id is the same in several rows and equals to groupkey.
return some_value
}
the result of map-reduce execution is a temporary collection it should be dropped at the end of use.
suppose you have this sql statment:
select my_aggregation_function(a) from mytable group by mycolumn
the reduce is the aggregation function
the map is a filtering/preparing function of data that will be passed to the aggregation function
because you group by the same value there will be several rows with same group key. map emits a group key and the associated value.
later the reduce function receives an array of rows of map emits, all the rows in the array are with same group key. the reduce calculates whatever, and returns an answer. to the answer an _id is added equals to the group key name and the answer is stored as value.
so map reduce is an extremely detailed way to define the aggregation function and use it.
suppose you have this sql statment:
select my_aggregation_function(a) from mytable group by mycolumn
so the sql statement above could be converted to this map reduce:
function map() //executed many times for each row in the table
{
emit(this.mycolumn,this.a);
}
function reduce(groupkey, rows_for_the_key) // executed for each group at the end after all map calls ware executed.
{
// suppose my_aggregation_function will do summery of values
var summery=0;
for(var i=0;i<rows_for_the_key.length;i++)
{
summery+=rows_for_the_key[i];
}
//rows_for_the_key, is like an array of rows of data that returned from .find() function but the vals_array[i]._id is the same in several rows.
return {summery:some_value}
}
advanced map reduce usage:
to group by several columns you can put an object of several values in the group key argument in the emit function function map() //executed many times for each row in the table { emit({this.mycolumn,this.mycolumn_b},this.a); }
to do like inner join you can call emit several times
function map() //executed many times for each row in the table
{
// probably you can do:
//db.somecollection.find() ... for(...) emit(...)...
emit(this.mycolumn, { a:this.a , b:0 });
emit(this.mycolumn, { a:this.a , b:1 });
}
to return all element in the row just emit the this.
function map() //executed many times for each row in the table
{
emit(this.mycolumn,this);
}
a reduce for example:
var reduce = function(key, vals) {
var sum = 0;
for(var i in vals ) sum += parseFloat(vals[i]);
return sum;
};
in reduce you can not return the vals array because it is technically a cursor but you can go over it and convert it to array.
also you cannot return an array because there is a bug in mongodb. reduce must return an object, there is no basis for that, it seems that it is from some old code. as a solution you just wrap the array with an object.
an example of how to do a grouped by count: (it is little messed up, copied and pasted from some code i had)
var map = function() { emit( this.t12_group , 1) ; };
var reduce = function(key, vals) {
var sum = 0;
for(var i in vals ) sum += parseFloat(vals[i]);
return sum;
};
var options = {query:where};
var execute = function(err, mrCollection) {// callback
if (mrCollection){
mrCollection.find({} , function(err, cursor) {
cursor.toArray(function(err, items) {
callback(items);
mrCollection.drop();
})
});
}
};
this.collection.mapReduce(map , reduce , options , execute);
might me incomplete or wrong. feel free to test and correct. the general principle is explained well.