Yesterday I gave a presentation at Xebia India office on MapReduce. It really went well and audience was able to understand the concept of MapReduce (as per their feedback). So, I was happy that I did a good job in explaining MapReduce concept to a technical audience (mainly Java programmer, some Flex programmer and few testers). After all the hard work and a great dinner at Xebia India office I reached back my home. My wife (Supriya) asked me “How was your session on …” , I replied it went well. So next she asked what was your session all about (she is not in software/ programming field)? I replied MapReduce. MapReduce !! what is it? She replied “is it something related to geographical maps?” . I replied No No.. it has nothing to do with geographical maps. So, She said what is it? .Hmmm… I said lets go to Dominos (A pizza chain) and I will explain it over the pizza table. She said great and we went to pizza shop.
After we reached the Dominos and placed our order we were told by the guy at the counter that it will take 15 minutes to prepare the Pizza. So, I asked her .. Do you really want to understand MapReduce concept? She replied with firm Yes. So, I started
Shekhar : How do you prepare Onion chutney? (This is not the exact recipe so please don’t try this at home )
Supriya : She replied “I will take a onion and cut it into pieces and then mix salt and add the water into it and finally grind it with a Mixer-Grinder . And you will get the Onion Chutney”.
Supriya : How this is related to MapReduce?
Shekhar : Wait ! Let me build the full story you will surely understand MapReduce in 15 minutes.
Supriya : Ok.
Shekhar : Now suppose you want to prepare a mixed chutney using Mint, Onion, Tomato, Chilies, Garlic. How will you do it?
Supriya : I will take a bunch of Mint leaves, 1 onion, 1 tomato, 1 chilly, 1 garlic and cut them to pieces. Add the required salt and water in it. And will grind it with Mixer-Grinder and you will get a Mixed Chutney.
Shekhar : Great. Let’s apply MapReduce concept to your recipe. Map and Reduce are two operations. Let me explain them in more detail.
Map : Cutting of onion, tomato, chilly, garlic into pieces is a Map operation applied to each of these individually. So you pass one onion to a map and it will cut the onion to pieces. Similarly you pass chilly, garlic, tomato to the map one by one and you will get many pieces. So when you are cutting the pieces of a vegetable like onion you are doing a map operation. Map operation is applied to each vegetable and it will give one or more output. In our case it will be pieces of a vegetable. In Map operation it might happen that one of the onion is rotten and you just throw that onion. So, in case of rotten onion Map operation just did the filtering and you will not produce any output.
Reduce : In this phase you pass all the pieces of different vegetables to the grinder which grinds all of the pieces to give you one Chutney. It means you reduced all of the ingredients to produce one output. So, reducer usually aggregates the output of the map.
Supriya : So, is this MapReduce?
Shekhar : Yes and No. It is just a part of MapReduce. The power of MapReduce is in distributed computing.
Supriya : Distributed Computing .. What’s that ? Please explain.
Shekhar : Ok..
Shekhar : Suppose that you compete in a Chutney competition and your recipe won the best Chutney award. After wining the award Chutney recipe becomes a hit so you want to start selling your own branded Chutney. Let’s assume you need to produce 10000 Chutney bottles every day. What will you do?
Supriya : I will find a vendor which can provide me ingredients in bulk.
Shekhar : Yes .. That’s correct. Will you be able to do this process alone i.e. cutting of ingredients into pieces? Will a single grinder work now? Also now we need to support different type of chutneys like only onion, only green chilies, only tomato etc.
Supriya : No. I will have to hire more workers which will cut the vegetables. I will also buy more grinders so that I can produce Chutneys faster.
Shekhar : Correct. So you have to distribute the work now. You will need multiple persons cutting the ingredients to pieces parallely. Each person will have to process a bag full of ingredients. Each person corresponds to a single map. Each person iterate over the bag and will process a single ingredient at one time i.e. cut them to pieces.. This is done till the bag is empty.
So after all the workers have done the work. You will have pieces of onion, tomato, garlic,etc. at all the workplaces (where every person is doing his/her work).
Supriya : But how I will create different types of Chutneys?
Shekhar : Now you will see the missing phase of MapReduce — Shuffle phase. MapReduce will group all the outputs written by every Map based on the key. This will be automatically done for you. You can assume key as just a name of ingredient like Onion. So all the onion keys will be grouped together and will be transferred to a grinder which will just process onions. So, you will get onion Chutney. Similarly all the tomatoes will be transferred to the grinder marked for tomato and will produce tomato Chutney.
Finally Pizza arrived and she nodded her head saying that she understood MapReduce. I just hope next time she hear about MapReduce she can better understand what I am doing.