Day 4 : PredictionIO : How to Build A Blog Recommender


Today is the fourth day of my challenge to learn 30 technologies in 30 days. So far I am enjoying it and getting good response from fellow developers. I am more than motivated to do it for full 30 days. In this blog, I will cover how we can very easily build blog recommendation engine using PredictionIO. I did not find much documentation around using PredictionIO with Java. So, this blog might help people looking for end-to-end PredictionIO Java tutorial. The full blog series can be tracked on this page.

prediction-io

What is PredictionIO?

PredictionIO is an open source machine learning server application written in Scala. It provides an easy to use REST API to build recommendation engines. It also provides client SDKs, which wraps the REST API. The Client SDKs are available in Java, Python, Ruby, and PHP programming languages. PredictionIO core is using Apache Mahout. Apache Mahout is a scalable machine learning library which provides various clustering, classification, filtering algorithms. Apache Mahout can run these algorithms on distributed Hapoop cluster.

As a user, we do not have to worry about all these details. We can just install PredictionIO and start using it. Read the docs to get more information.

Why should I care?

I decided to learn PredictionIO because I wanted to use a library which can help me add machine learning capabilities. PredictionIO can help with functionalities like recommending interesting items and discovering similar items to users.

Installing PredictionIO

There are multiple ways to install PredictionIO as mentioned in the docs. I used the Vagrant approach as it helps me avoid messing up with my machine and set up everything itself.

1. Download the latest vagrant package for your operating system. You can get latest package from http://downloads.vagrantup.com/.

2. Download and install VirtualBox. Please refer to https://www.virtualbox.org/wiki/Downloads.

3. Download the latest PredictionIO vagrant package from https://github.com/PredictionIO/PredictionIO-Vagrant/releases.

4. Unzip the PredictionIO-x.x.x.zip package. This package contains the necessary scripts to setup PredictionIO. Open the command line terminal and change directory to PredictionIO-x.x.x.

The vagrant script will first download the Ubuntu vagrant box, and then will install all the prerequisites — MongoDB, Java, Hadoop, and PredictionIO server. This will take a lot of time depending on your connection. If you are at a location where internet connection is not stable, then I recommend that you download the Ubuntu box using wget. The wget command supports resuming partial downloads. Download the precise64 box to a convenient location as shown below.

wget -c http://files.vagrantup.com/precise64.box

After the download is complete open the Vagrantfile and change the config.vm.box_url property to point to the precise64.box download directory.

config.vm.box_url = "/Users/shekhargulati/tools/vagrant/precise64.box"

5. Now just do vagrant up to start the installation process. This will take time depending on your internet speed.

6. Next we will have to create administer account as mentioned in the docs http://docs.prediction.io/current/installation/install-predictionio-with-virtualbox-vagrant.html#create-an-administrator-account.

7. The server application will be available at http://localhost:9000. Read more at http://docs.prediction.io/current/installation/install-predictionio-with-virtualbox-vagrant.html#accessing-predictionio-server-vm-from-the-host-machine. You will be asked to login to the PredictionIO application, once loggedin you will see dashboard as shown below.PredictionIO-Dashboard

Creating PredictionIO Application

We will start by creating blog-recommender application. Click on Add an App button and enter name as blog-recommender.

PredictionIO-Add-An-App

After application is created you will see it under Applications as shown below.

PredictionIO-Application

Next click on Develop and application details will be shown to you. The important information here is App Key. This is required when we will write our application.

Blog-Recommender-Application

Application Usecase

The usecase that we are implementing is similar to Amazon “Customers Who Bought This Item Also Bought these items” feature. Our usecase is people who viewed this blog also viewed these blogs.

Developing Blog Recommender Java Application

Now that we have created PredictionIO application its time to write our Java application. We will be using Eclipse to develop the application. I am using Eclipse Kepler which has m2eclipse integration built in. Create a new Maven based project by navigating to File > New > Maven Project. Choose the maven-archetype-quickstart and then enter Maven project details. Replace the pom.xml with the one shown below.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.shekhar</groupId>
  <artifactId>blog-recommender</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>blog-recommender</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  </properties>

  <dependencies>

  <dependency>
  	<groupId>io.prediction</groupId>
  	<artifactId>client</artifactId>
  	<version>0.6.1</version>
  </dependency>
  </dependencies>

  <build>
  	<plugins>
  		<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.1</version>
        <configuration>
          <!-- http://maven.apache.org/plugins/maven-compiler-plugin/ -->
          <source>1.7</source>
          <target>1.7</target>
        </configuration>
      </plugin>
  	</plugins>
  </build>
</project>

The things to note in above is the Maven dependency for PredictionIO Java API.

Now we will write a class which will insert data into PredictionIO. The class is shown below.

package com.shekhar.blog_recommender;

import io.prediction.Client;
import io.prediction.CreateItemRequestBuilder;

public class BlogDataInserter {

    private static final String API_KEY = "wwoTLn0FR7vH6k51Op8KbU1z4tqeFGZyvBpSgafOaSSe40WqdMf90lEncOA0SB13";

    public static void main(String[] args) throws Exception {
        Client client = new Client(API_KEY);
        addUsers(client);
        addBlogs(client);
        userItemViews(client);
        client.close();

    }

    private static void addUsers(Client client) throws Exception {
        String[] users = { "shekhar", "rahul"};
        for (String user : users) {
            System.out.println("Added User " + user);
            client.createUser(user);
        }

    }

    private static void addBlogs(Client client) throws Exception {
        CreateItemRequestBuilder blog1 = client.getCreateItemRequestBuilder("blog1", new String[]{"machine-learning"});
        client.createItem(blog1);

        CreateItemRequestBuilder blog2 = client.getCreateItemRequestBuilder("blog2", new String[]{"javascript"});
        client.createItem(blog2);

        CreateItemRequestBuilder blog3 = client.getCreateItemRequestBuilder("blog3", new String[]{"scala"});
        client.createItem(blog3);

        CreateItemRequestBuilder blog4 = client.getCreateItemRequestBuilder("blog4", new String[]{"artificial-intelligence"});
        client.createItem(blog4);

        CreateItemRequestBuilder blog5 = client.getCreateItemRequestBuilder("blog5", new String[]{"statistics"});
        client.createItem(blog5);

        CreateItemRequestBuilder blog6 = client.getCreateItemRequestBuilder("blog6", new String[]{"python"});
        client.createItem(blog6);

        CreateItemRequestBuilder blog7 = client.getCreateItemRequestBuilder("blog7", new String[]{"web-development"});
        client.createItem(blog7);

        CreateItemRequestBuilder blog8 = client.getCreateItemRequestBuilder("blog8", new String[]{"security"});
        client.createItem(blog8);

        CreateItemRequestBuilder blog9 = client.getCreateItemRequestBuilder("blog9", new String[]{"ruby"});
        client.createItem(blog9);

        CreateItemRequestBuilder blog10 = client.getCreateItemRequestBuilder("blog10", new String[]{"openshift"});
        client.createItem(blog10);
    }

    private static void userItemViews(Client client) throws Exception {
        client.identify("shekhar");
        client.userActionItem("view","blog1");
        client.userActionItem("view","blog4");
        client.userActionItem("view","blog5");

        client.identify("rahul");
        client.userActionItem("view","blog1");
        client.userActionItem("view","blog4");
        client.userActionItem("view","blog6");
        client.userActionItem("view","blog7");

    }

}

The class shown above does the following

  1. We create instance of Client class. Client is the class which wraps the PredictionIO REST API. We need to provide it the API_KEY  of PredictionIO blog-recommender application.
  2. Next we created two users using Client instance. The users get created in PredictionIO application. Only mandatory field is userId.
  3. After that we added 10 blogs using Client instance. The blogs get created in PredictionIO application. When creating an item only you need pass two things — itemId and itemType. The blog1 ,.. blog10 are itemIds and javascript, scala, etc are itemTypes.
  4. Next we performed some actions on items. The user “shekhar” viewed “blog1”, “blog2”, and “blog4” and user “rahul” viewed “blog1″,”blog4”, “blog6” , and “blog7”.
  5. Finally, we closed the client instance.

Run this class as a Java application. It will insert records in PredictionIO application instance which you can verify by looking the the dashboard.

PredictionIO-Added-Data

Now that data is inserted into our PredictionIO application we need to add engine to our application. Click on Add an Engine button. Choose Item Similarity Engine as shown below.

Choose Engine

Next create Item Similarity Engine by entering name as “engine1”.

Create Item Similarity Engine

After pressing Create button you will have Item Similarity Engine.  Now you can make some configuration changes but we will use the default settings. Go under Algorithms tab and you will engine is not running. Run the engine by clicking “Train Data Model Now”.

PredictionIO-Train-Data-Model-Now

Now wait for few minutes as this takes time. After data model is trained you will see status as Running.

PredictionIO-Running

The usecase that we are solving is to recommend blogs to user depending on the blog he has viewed. In the code shown below, we are getting all similar items to blog1 for userId “shekhar”.

import io.prediction.Client;

import java.util.Arrays;

public class BlogrRecommender {

    public static void main(String[] args) throws Exception {

        Client client = new Client("wwoTLn0FR7vH6k51Op8KbU1z4tqeFGZyvBpSgafOaSSe40WqdMf90lEncOA0SB13");
        client.identify("shekhar");
        String[] recommendedItems = client.getItemSimTopN("engine1", "blog1", 5);

        System.out.println(String.format("User %s is recommended %s", "shekhar", Arrays.toString(recommendedItems)));

        client.close();

    }
}

Run the Java program and you will see result as “blog4″,”blog5″,”blog6” , and “blog7”.

As you can see from the above example, it is very easy to add recommendation capabilities to the application. I will use it in my future projects and spend more time on it.

That’s it for today. Keep giving feedback.

Shameless Plug

If you are a Java, Python, Node.js, Ruby, or PHP developer then you should take a look at OpenShiftOpenShift is a public Platform as a Service, and you can use it to deploy your applications for free.

2 thoughts on “Day 4 : PredictionIO : How to Build A Blog Recommender”

Leave a comment