Home » java » How MongoDB write/read speed varies with or without index on a field?

How MongoDB write/read speed varies with or without index on a field?

Sign up for OpenShift Account

Blog Stats

  • 231,765 hits

My Tweets


Last 3 weeks I have been busy working on a PoC where we are thinking of using MongoDB as our datastore. In this series of blog posts I will be sharing my finding with the community. Please take these experiments with grain of salt and try out these experiments on your dataset and hardware. Also share with me if I am doing something stupid.  In this blog I will be sharing my findings on how index affect the write speed.

Scenario

I will be inserting 60 million documents and will be noting the time taken to write each batch of 10 million records.  The average document size is 2400 bytes (Look at the document in under Document heading). The test will be run first without index on the name field and then with index on the name field.

Conclusion

Write speed with index dropped to 0.27 times of write speed without index after inserting 20 million documents. 

Setup 

Dell Vostro Ubuntu 11.04 box with 4 GB RAM and 300 GB hard disk.

Java 6

MongoDB 2.0.1

Spring MongoDB 1.0.0.M5 which internally uses MongoDB Java driver 2.6.5 version.

Document

The documents I am storing in MongoDB looks like as shown below. The average document size is 2400 bytes.  Please note the _id field also has an index. The index that I will be creating will be on name field.

{
"_id" : ObjectId("4ed89c140cf2e821d503a523"),
"name" : "Shekhar Gulati",
"someId1" : NumberLong(1000006),
"str1" : "U",
"date1" : ISODate("1997-04-10T18:30:00Z"),
"bio" : "I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a
Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. "
}

JUnit TestCases

The first JUnit test inserts 10 million record and after every 10 million records dumps the time taken to write batch of 10 million records. Perform  a find query on an unindexed field name and prints the time taken to perform the find operation. This tests runs for 6 batches so 60 million records are inserted.

@Configurable
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "classpath:/META-INF/spring/applicationContext*.xml")
public class Part1Test {

private static final String FILE_NAME = "/home/shekhar/dev/test-data/10mrecords.txt";

	private static final int TOTAL_NUMBER_OF_BATCHES = 6;

	private static final Logger logger = Logger
			.getLogger(SprintOneTestCases.class);

	@Autowired
	MongoTemplate mongoTemplate;

	@Before
	public void setup() {
		mongoTemplate.getDb().dropDatabase();
	}

	@Test
	public void shouldWrite60MillionRecordsWithoutIndex() throws Exception {

		for (int i = 1; i <= TOTAL_NUMBER_OF_BATCHES; i++) {
			logger.info("Running Batch ...." + i);
			long startTimeForOneBatch = System.currentTimeMillis();
		LineIterator iterator = FileUtils.lineIterator(new File(FILE_NAME));
			while (iterator.hasNext()) {
				String line = iterator.next();
				User user = convertLineToObject(line);
				mongoTemplate.insert(user);
			}

			long endTimeForOneBatch = System.currentTimeMillis();
			double timeInSeconds = ((double) (endTimeForOneBatch - startTimeForOneBatch)) / 1000;
			logger.info(String
					.format("Time taken to write %d batch of 10 million records is %.2f seconds",
							i, timeInSeconds));

			Query query = Query.query(Criteria.where("name").is(
					"Shekhar Gulati"));
			logger.info("Unindexed find query for name Shekhar Gulati");
			performFindQuery(query);
			performFindQuery(query);

			CommandResult collectionStats = mongoTemplate
					.getCollection("users").getStats();
			logger.info("Collection Stats : " + collectionStats.toString());

			logger.info("Batch finished running...." + i);
		}

	}

	private void performFindQuery(Query query) {
		long firstFindQueryStartTime = System.currentTimeMillis();
		List<User> query1Results = mongoTemplate.find(query, User.class);
		logger.info("Number of results found are " + query1Results.size());
		long firstFindQueryEndTime = System.currentTimeMillis();
		logger.info("Total Time Taken to do a find operation "
				+ (firstFindQueryEndTime - firstFindQueryStartTime) / 1000
				+ " seconds");
	}

	private User convertLineToObject(String line) {
		String[] fields = line.split(";");
		User user = new User();
		user.setFacebookName(toString(fields[0]));
		user.setSomeId1(toLong(fields[1]));
		user.setStr1(toString(fields[2]));
		user.setDate1(toDate(fields[3]));
		user.setBio(StringUtils.repeat("I am a Java Developer. ", 100));
		return user;
	}

	private long toLong(String field) {
		return Long.parseLong(field);
	}

	private Date toDate(String field) {
		SimpleDateFormat dateFormat = new SimpleDateFormat(
				"yyyy-MM-dd HH:mm:ss");
		Date date = null;
		try {
			date = dateFormat.parse(field);
		} catch (ParseException e) {
			date = new Date();
		}
		return date;
	}

	private String toString(String field) {
		if (StringUtils.isBlank(field)) {
			return "dummy";
		}
		return field;
	}

}

listing 1. 60 million records getting inserted and read without index

In the second test case I first created the index and then started inserting the records. This time find operations were performed on the indexed field name.

	@Test
	public void shouldWrite60MillionRecordsWithIndex()
			throws Exception {

		long startTime = System.currentTimeMillis();
		createIndex();

		for (int i = 1; i <= TOTAL_NUMBER_OF_BATCHES; i++) {
			logger.info("Running Batch ...." + i);
			long startTimeForOneBatch = System.currentTimeMillis();
		LineIterator iterator = FileUtils.lineIterator(new File(FILE_NAME));
			while (iterator.hasNext()) {
				String line = iterator.next();
				User obj = convertLineToObject(line);
				mongoTemplate.insert(obj);
			}

			long endTimeForOneBatch = System.currentTimeMillis();
			logger.info("Total Time Taken to write " + i
					+ " batch of Records in milliseconds : "
					+ (endTimeForOneBatch - startTimeForOneBatch));
	double timeInSeconds = ((double)(endTimeForOneBatch - startTimeForOneBatch))/1000;

                        logger.info(String.format("Time taken to write %d batch of 10 million records is %.2f seconds", i,timeInSeconds));

		Query query = Query.query(Criteria.where("name").is("Shekhar Gulati"));
			logger.info("Indexed find query for name Shekhar Gulati");
			performFindQuery(query);
			performFindQuery(query);

			CommandResult collectionStats = mongoTemplate.getCollection("name").getStats();
			logger.info("Collection Stats : " + collectionStats.toString());

			logger.info("Batch finished running...." + i);
		}

	}

	private void createIndex() {
		IndexDefinition indexDefinition = new Index("name", Order.ASCENDING)
				.named(" name_1");
		long startTimeToCreateIndex = System.currentTimeMillis();
		mongoTemplate.ensureIndex(indexDefinition, User.class);
		long endTimeToCreateIndex = System.currentTimeMillis();
		logger.info("Total Time Taken createIndex "
				+ (endTimeToCreateIndex - startTimeToCreateIndex) / 1000
				+ " seconds");
	}

Write Concern

WriteConcern value was NONE which is fire and forget. You can read more about write concerns here.

Write Resuts

After running the test cases shown above I found out that for the first 10 million i.e. from 0 to 10 million inserts write per second with index was 0.4 times of without index. The more surprising was that for the next batch of 10 million records the write speed with index was reduced to 0.27 times without index.

Looking at the table above you can see that the write speed when we don’t have index is remains consistent and does not degrades. But the write speed when we had index varied a lot from 3492 documents per second to 2281 documents per second. I was not able to complete the test after 20 million as it was taking way too much time to do next 10 million. This can lead to lot of problems in case you added index on a field after you have inserted first 10 million records without index. The write speed is not even consistent and you have to think of sharding to achieve the speed limits you want.

Read Results

Read results don’t show anything interesting except that you should have index on the field you would be querying on otherwise read performance will be very bad. This can be explained very easily because data will not be in RAM and you will be hitting disk. And when you hit disk performance will take a ride.

This is what all I have for this post. I am not making any judgement whether these numbers are good or bad. I think that should be governed by the use case, data, hardware you will working on. Please feel free to comment and share your knowledge.

About these ads

8 Comments

  1. [...] This blog talks about who reads and writes per second varies when you have index on a field and when you don't have any index.    Database Read the original post on DZone… [...]

  2. [...] 8, 2011 Leave a Comment In the first post I talked about how indexes affect the write speed in MongoDB. In this second post I will share my [...]

  3. [...] Last but not least, Shekhar Gulati (@shekhargulati) blogged about how MongoDB write/read speed varies with or without index on a field. [...]

  4. Hey there, Thanks a lot for this amazing posting! I must bookmark How MongoDB write/read speed varies with or without index on a field? I'm Shekhar Gulati , this is my blog. Thank you.

  5. [...] records. The document used is shown under document heading and is the same as the one used in first post just with one extra field index(a simple int field which just auto increments by one) . Before [...]

  6. [...] records. The document used is shown under document heading and is the same as the one used in first post just with one extra field index(a simple int field which just auto increments by one) . Before [...]

  7. Arsalan says:

    Thanks for this article, I need this comparison.

  8. [...] The overhead of indexes greatly effects tables that need to be written to or altered often (even indexes on NoSQL database greatly effect writes). Over indexed tables inflate the database size dramatically which adds to [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 265 other followers

%d bloggers like this: