December 8, 2011 – Shekhar Gulati

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/shekhar/hadoop-data</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration> <strong>hdfs-site.xml</strong> <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration>

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>

# Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26 # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. # export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:54310/hbase</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2222</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect. </description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/shekhar/zookeeper</value> <description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. </description> </property> </configuration>

2011-12-06 13:59:29,979 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException 2011-12-06 13:59:30,577 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 2011-12-06 13:59:30,577 WARN org.apache.zookeeper.ClientCnxn: Session 0x134127deaaf0002 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)

In the first post I talked about how indexes affect the write speed in MongoDB. In this second post I will share my findings on how different write concerns affect the write speed on a single node. Please refer to the first post for the setup related information. A write concern controls the behavior of write operation and gives developers the choice to choose the value matching their requirements. For instance there are some documents which are not very important and if one of them get lost your business will not get screwed. For those you can choose less stricter value of write concern and for objects where you want don’t want your object to be lost you should choose stricter value of write concern. Let’s take a look at different write concern values available in Java driver. Please note in this experiment I used MongoDB java driver 2.7.2 instead of Spring MongoDB.

Normal : This is the default option where every write operation is fire and forget which means it just writes to the driver and return back. It does not wait for write to be available in server. So, if another thread tries to read the document just after the document has been written it might found not find it. There is a very high probability of data loss with this option. I think this should not be considered in cases where data durability is important and you are only using single instance of MongoDB server. Even with replication you can loose data with this option (I will talk about in my future post).
None : This is almost same as Normal with just one different that in Normal if network goes down or there is some other network issue you get an exception but with None you don’t get any exception if there are some network issues. This makes it highly unreliable.
Safe : As suggested by name it is safer than the above two. The write operation waits for the MongoDB server to acknowledge the write but data is still not written to disk. With safe you will not face issue that when another thread tried to read the object you just wrote, the object was not found. So, it provides a guarantee that object once written will be found. That’s Good. But still you can loose data because data is not written to disk and if server died for some reason data will be lost.
Journal Safe : Before we talk about this option. Lets first talk about what is Journaling in MongoDB. Journaling is a feature of MongDB where a write ahead log file of all the operations is maintained. In scenarios when MongoDB is not cleanly shutdown like using kill -9 command the data can be recovered from Journal files. By default data is written to journal files after every 100 milliseconds. You can change it to lie between 2 ms to 300 ms. With version 2.0 journaling is enable by default on 64 bit MongoDB servers. With Journal Safe write concern option your write will wait till the journal file is updated.
Fysnc : With Fsync write concern the write operation waits till the data is not written to disk. This is the safest option on a Single node as only way you can loose data is when the hard disk crashes.

I have left the other values which are not applicable to single node but make more sense when replication is enable. I will cover them in future posts.

Test Case

The test case was very simple I will be doing 1 million writes with each of options except fsync and will find out the writes per second speed for each of the write concern values.

Document

The document is similar to the one used in first post. It is 2395 bytes.

{
"_id" : ObjectId("4eda74ef84ae8b2410f5fa8e"),
"age" : "27",
"lName" : "Gulati1",
"fName" : "Shekhar1"
"bio" : "I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. ",
}

JUnit Test

The JUnit test case is shown below. In each test case it inserts one million records with a different value of write concern.

public class SingleNodeWriteConcernTests {

	private final static int ONE_MILLION = 1000000;

	private final Logger logger = Logger
			.getLogger(SingleNodeWriteConcernTests.class);

	@Test
	public void shouldInsertRecordsInNonCurrentMode() throws Exception {
		ServerAddress serverAddress = new ServerAddress("localhost", 27017);

		Mongo mongo = new Mongo(serverAddress);
		mongo.setWriteConcern(WriteConcern.NONE);
		runASingleTestCase(mongo, "NONE");

		mongo = new Mongo(serverAddress);
		mongo.setWriteConcern(WriteConcern.JOURNAL_SAFE);
		runASingleTestCase(mongo, "JOURNAL_SAFE");

		mongo = new Mongo(serverAddress);
		mongo.setWriteConcern(WriteConcern.NORMAL);
		runASingleTestCase(mongo, "NORMAL");

		mongo = new Mongo(serverAddress);
		mongo.setWriteConcern(WriteConcern.SAFE);
		runASingleTestCase(mongo, "SAFE");

	}

	private void runASingleTestCase(Mongo mongo, String name) throws Exception {
		DB db = mongo.getDB("play");
		DBCollection people = db.getCollection("people");
		if (db.collectionExists("people")) {
			people.drop();
		}
		insertRecords(mongo, name);

		mongo.dropDatabase("play");
	}

	private void insertRecords(Mongo mongo, final String name) throws Exception {

		DB db = mongo.getDB("play");
		final DBCollection collection = db.getCollection("people");
		collection.ensureIndex("fName");
		long startTime = System.currentTimeMillis();
		for (int i = 1; i <= ONE_MILLION; i++) {
			BasicDBObject obj = new BasicDBObject();
			Map<String, String> map = new HashMap<String, String>();
			map.put("fName", "Shekhar" + i);
			map.put("lName", "Gulati" + i);
			map.put("age", String.valueOf(i));
			map.put("bio", StringUtils.repeat("I am a Java Developer. ", 100));
			obj.putAll(map);
			collection.insert(obj);
		}
		long endTime = System.currentTimeMillis();
		double seconds = ((double) (endTime - startTime)) / (1000);
		double rate = ONE_MILLION / seconds;

		String message = String
				.format("WriteConcern %s inserted %d records in %.2f seconds at %.2f (rec/s)",
						name, ONE_MILLION, seconds, rate);
		logger.info(message);

	}

}

Results

As you might have also expected Normal and None are the fastest because of the way they work i.e. fire and forget. Safe writes takes 3.5 times more than Normal writes. With Journal safe value you come down to 24 documents per second which is very low. As you can see as you move towards more write safety you loose a lot on write speed. This is again a decision you have to make depending on your use case.

Can something be done to increase write speed in Safe and Journal Safe options?

The results shown above are based on records being inserted sequentially one at a time. I tried an experiment where in I divided 1 million records to a batch of 100,000 records each. And let 10 threads write 1 million record in parallel. The write speed for Safe and Journal Safe increased but None and Normal decreased as shown below.

The write speed for Safe with 10 threads is 1.4 times the write speed with one thread and similarly write speed for Journal Safe is 10 times of the write speed with one thread. This is because while one thread is waiting other threads can work in parallel which allows to better utilize CPU.

Day: December 8, 2011

Installing HBase over HDFS on a Single Ubuntu Box

How MongoDB Different Write Concern Values Affect Performance On A Single Node?