Skip to content

thobe/social-network-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Social Network Workshop using Neo4j

The purpose of this tutorial is to learn how to use Neo4j. The tutorial starts off with the basic Neo4j API for creating a social graph and performing some simple traversals of that graph. The next part that's introduced is the concept of indexes for finding nodes in the graph that traversals can start from. We then move on to building a domain API for our social network, before introducing graph algorithms for mining more complex aspects of the social graph. Finally we use the social graph model to make recommendations for the people in the social network.

This is going to be a hands-on tutorial, where You are expected to write code. The source code provided with this tutorial is a code skeleton with blanks that you are expected to fill in.

If you get stuck and need to introspect your data to get further, check out the introspection section about the Neo4j Shell.

To get started with the code, just type:

git clone git://github.com/thobe/social-network-workshop

Or you could download a pre packaged version:

Pre requirements

The first thing you are going to need is a JDK (Java Development Kit), you can download this from the SUN Java website.

While you could compile the sources manually with just javac, the process becomes a lot easier using a build tool such as maven or ant. We recommend using maven since it takes care of dependency management automatically, and works well for standard java projects. The rest of this guideline is going to assume that you are using maven, but if you already feel more at home with ant, we have included an ant build file for you. The instructions will talk about maven but using ant should be pretty straight forward.

Step One - a basic social network

The purpose of this assignment is to get a first experience with Neo4j and its core component, the Neo4j Graph Database Kernel. After this assignment you will be able to build a graph using the core Neo4j Graph Database API and perform some simple traversals over that graph.

For resources for this task, please refer to the Javadoc API documentation for the Neo4j Graph Database Kernel. The first interface you will make acquaintance with is the Neo4j GraphDatabaseService. This is the main interface through which you access the elements of the Graph. The standard implementation of this interface embeds a Neo4j Graph database in your application. From this instance you can retrieve Nodes from which you can retrieve Relationships to other Nodes. Both Nodes and Relationships have methods for manipulating properties. Properties on nodes and relationships are key/value pairs, the keys can be any strings, and the values can be any primitive value. Primitive values include numbers (integer and floating point), Strings, boolean values (true or false), bytes and arrays of any of the former value types.

The Node type, in addition to the methods for getting, setting and removing properties, contain methods for creating relationships to other nodes and retrieving them based on types, direction or both qualifiers (including getting all relationships). Similarly the Relationship type has methods for getting the start and end nodes of the relationship.

All Relationships in Neo4j have a type and a direction. The type of the relationship has nothing to do with data types (even if it can be used for determining data types). Instead the types of the relationships are more like a label that is used to navigate the graph more efficiently. The direction of relationships in Neo4j does not mean that relationships can only be retrieved from one of the nodes of the relationship. Traversing a relationship is equally fast in both directions. The semantics of the direction is up to your application, if the direction of a specific relationship is insignificant, i.e. if you want an undirected relationship, you can simply ignore the direction of it, the Neo4j API is able to treat any relationship as undirected.

In this task you are going to use the features of the above mentioned API to build a simple social network. The social network that we are going to model for this example is a small outtake of the characters of the movie The Matrix. Then use the traversal features of Neo4j to compute information about that social graph.

Tasks

Here is an image as an example of what the social graph of The Matrix will look like:

http://github.com/thobe/social-network-workshop/raw/master/doc/matrix_small.png

Getting the friends of Thomas Anderson in this graph would yield:

  • Morpheus
  • Trinity

Getting the friends of Thomas Anderson's friends recursively would yield:

  • On depth 1: Morpheus
  • On depth 1: Trinity
  • On depth 2: Seraph
  • On depth 2: Niobe
  • On depth 2: Cypher
  • On depth 2: Tank
  • On depth 2: Dozer
  • On depth 2: Apoc
  • On depth 2: Switch
  • On depth 2: Mouse
  • On depth 3: Ghost
  • On depth 3: Lock
  • ...

Intermezzo - Exploring the graph with the Neo4j Shell

To get a feel for the data you have created (and for debugging, should things go wrong) Neo4j comes with a nifty little tool called the Neo4j Shell. The Neo4j Shell is a Unix like terminal interface for browsing the Neo4j graph.

With all dependencies unpacked in the lib-directory, starting the Neo4j shell is easy. All you have to type is:

java -jar lib/neo4j-shell-1.2-1.2.M03.jar -path target/thematrix/

When using maven you would first have to copy all dependencies to a common directory:

mvn dependency:copy-dependencies
java -jar target/dependency/neo4j-shell-1.2-1.2.M03.jar -path target/thematrix/

Take some time to play around with the Shell, familiarizing yourself with this tool could come in handy in many situations. The help command is a good place to start, typing just help will give you a list of available commands. In particular the cd and ls commands are handy for walking around the graph and observing its structure. Read the help page about cd to learn how to navigate to Nodes and Relationships:

help cd

At this point you should be able to navigate a structure similar to this:

neo4j-sh (Thomas Anderson,1)$ ls
*name =[Thomas Anderson]
(me) --<FRIENDS>-> (Trinity,3)
(me) --<FRIENDS>-> (Morpheus,2)
(me) --<INTERESTED_IN>-> (Understanding,36)
(me) --<INTERESTED_IN>-> (Hacking,41)
(me) --<INTERESTED_IN>-> (The future,49)
(me) --<INTERESTED_IN>-> (The Truth,44)
(me) <-<INTERESTED_IN>-- (Persephone,18)
(me) <-<INTERESTED_IN>-- (Trinity,3)
neo4j-sh (Thomas Anderson,1)$ cd 2
neo4j-sh (Morpheus,2)$ ls -v
*name =[Morpheus] (String)
(me) --<FRIENDS,10>-> (Niobe,17)
(me) --<FRIENDS,9>-> (Seraph,19)
(me) --<FRIENDS,8>-> (Switch,21)
(me) --<FRIENDS,7>-> (Mouse,16)
(me) --<FRIENDS,6>-> (Cypher,8)
(me) --<FRIENDS,5>-> (Apoc,7)
(me) --<FRIENDS,4>-> (Dozer,9)
(me) --<FRIENDS,3>-> (Tank,22)
(me) --<INTERESTED_IN,52>-> (Keys,32)
(me) --<INTERESTED_IN,51>-> (Zion,30)
(me) --<INTERESTED_IN,50>-> (The One,47)
(me) --<INTERESTED_IN,49>-> (The Truth,44)
(me) <-<FRIENDS,2>-- (Trinity,3)
(me) <-<FRIENDS,0>-- (Thomas Anderson,1)
neo4j-sh (Morpheus,2)$ cd -r 6
neo4j-sh <FRIENDS,6>$ ls
(Morpheus,2) --<FRIENDS,6>-> (Cypher,8)
neo4j-sh <FRIENDS,6>$ cd ..
neo4j-sh (Morpheus,2)$ cd ..
neo4j-sh (Thomas Anderson,1)$ exit

Step Two - Adding more Relationship types

Different relationship types in are used for creating relationships to nodes that represent different kinds of entities.

In our social network our users want to be able to find new friends based on shared interests. To do this we need to store the information about each persons interests in the graph. In order to be able to find persons with common interests, we represent interests as nodes, and the fact that a specific person has a particular interest by a relationship of type "INTERESTED_IN" from the person node to the interest node. This design allows each person to have several interests.

If generalized to other domains, the concept of interests in a social network is like tagging. Each person can have multiple interests (tags) and each interest can be shared by multiple persons, and we can use the interest nodes (or tag nodes) to find persons that have the same interest. In fact tagging would be implemented in the same way when implemented for other domains as well when using Neo4j.

Tasks

Step Three - Introducing indexing to the social network

In order to traverse a graph you need a starting point. Starting points are acquired using indexes in Neo4j.

More information about how to use indexing in Neo4j is available in the API documentation and the Indexing wiki page. The rest of this section will give you an introduction to working with Neo4j indexing.

Indexing in Neo4j is done explicitly and programatically. It is up to you as a developer to index nodes when they are created, and to update the indexes when the nodes change. This might look like a weakness compared to other database managment systems, but it gives you more power and flexibility in what to index and how to index it. It is also worth noting that unlike Relational databases, where all access is done through indexes, with Neo4j indexes are only used for getting start nodes from which a traversal can be started. Traversing the graph does not use indexes, which is why it is faster than joins in a relational database.

A common approach to indexing is to index some property of a node. This is very similar to how indexes work in relational databases. While this is simple and easy to manage, it is not strictly necessary . Since indexing is done programmatically in Neo4j, indexing can be done using any value. It could be computed from several of the properties on the node, it could be properties from a relationships the node, it could be aggregated from other nodes that are related to the node, it could even be an arbitrary value.

The main interface through which indexing is managed is accessed is the IndexManager, this object is accessed through the Graph Database interface. The IndexManager is used to retrieve index instances that are then used for associating Nodes and relationships with index entries. The recommended pattern is to have one index for each data type, so that all Nodes or Relationships in one index represents entities of the same type. Each index can contain multiple key/value entries for each entity.

Creating an index entry is done using the add method. Updating an index entry is done by removing the current index entry, then creating a new. There are two methods in an Index for accessing indexed nodes. One method for doing exact lookup and one for doing more complex queries. Both of these methods return an iterator over all matching entries. For uniquely indexed entities the returned iterator has a convenience method for accessing the single matching entity.

An entry in an index is (as seen in the add method) made up of not only a Node, but also a key and a value. For the key it is common to use the key of the property being indexed.

In this task you will use the indexing features for Neo4j to add lookup capabilities for persons and interests in the social network. The goal is to be able to look up persons by their name, and to be able to look up the identifier nodes based on its identifying text representation.

Tasks

Step Four - Introducing a domain API

It is time to start turning this example into a proper application. Regardless of how nice the Neo4j API is to work with, managing an application where all entities are of one single data type is a pain. Instead we want to be able to work with objects that represents the entities of our domain: Persons and Interests.

The recommended way to implement a domain using Neo4j is by defining the domain as a set of interfaces, and then create implementations of those interfaces that delegate their state to Neo4j. The way this is done is by letting the implementing class only have one field, the Node or Relationship (depending on what kind of entity it is) that represent it in the graph. Then for each attribute accessor (Java Bean setter or getter), the value is retrieved and stored as a property of the underlying node/relationship. Associations to other objects are stored as, and retrieved through relationships with appropriate RelationshipTypes. Since Neo4j is fully transactional the effect of implementing domain objects by delegating state to Neo4j is that working with the domain objects is like working with Software Transactional Memory.

For retrieving and creating instances of the domain objects it is a good idea to define a repository interface as well. The repository is responsible for looking up nodes by index and returning the appropriate domain objects, and for creating new domain objects with underlying nodes. In this application the repository interface is going to be SocialNetwork, and the domain object is the Person interface.

Your task is now to implement the domain for the social network application by delegating state to Neo4j. You should be able to access the same graph that you have used in the previous steps through the new domain API. In fact the test cases for this step also use the social graph of The Matrix as sample data.

Tasks

Step Five - Graph Algorithms

Graph Databases excel at deep queries and traversals, and apart from the core traversal API Neo4j comes with a package that contains implementations of a few graph algorithms for (among other things) searching in the graph. In this task we will use these features for implementing a "how do I know this person" feature in our social network. The "How do I know this person" will for two persons search the social graph to find the closest chain of friends through which these two persons know each other.

The Graph Algorithms component has API documentation available online. The Neo4j graph algorithms build on the new traversal features introduced in Neo4j version 1.1. The main interface used for searching in the graph is the PathFinder. Creating instances of PathFinder requires that you provide a RelationshipExpander, these can be instantiated using the static methods on the Traversal class.

Tasks

Step Six - Recommendations

The final part of this tutorial is to be able to suggest new friends for the people in the social network. We will use a simple recommendation algorithm for this. The algorithm you are to implement for making friend suggestions is simply based on finding persons that have the same interests and recommending them to one another.

One good starting point for creating simple recommendation algorithms is the new traversal API that was introduced in Neo4j 1.1. This API is built around the concept of TraversalDescription objects that describe how a traversal is to be performed. The Wiki page does a good job in describing the different parts of a TraversalDescription. For creating TraversalDescriptions Neo4j provides a static factory method.

Tasks

About

A tutorial for creating a social network system based on Neo4j

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published