# Knowledge Graph of Wines

To get you through some of the basic concepts of Hume, we will create a Knowledge Graph of popular wines.

To build our Knowledge Graph, we will automate the loading of a CSV file containing the data, the transformation of the CSV into Neo4j Cypher queries and we will visualise and search the data.

This tutorial assumes you have installed Hume.

Wines Knowledge Graph Schema

The necessary files to follow the tutorial are present in the examples/wine-tutorial directory of your Hume installation.

# Create your first Knowledge Graph

The first thing to do is to create a Knowledge Graph that will contain your schema, data imports and visualisations. You can think of a Knowledge Graph as a project which encapsulates a particular business use case.

foo foo

# Create your Schema

The next step is to create your Schema. A Schema is the description of the classes of entities and relationships among them, as well as their attributes. In other words, it represents what your actual domain will look like as a graph.

You can add a new class or relationship by clicking on the Add Class or Add Relationship buttons or right clicking on the blank canvas. Double clicking on a class or relationship will let you edit the icon, the colour and the list of attributes it will contain.

foo

To ease this task during the tutorial, you can delete any element you created, and import the schema by clicking the Import Schema button and select the file wines-schema.json present in the examples/wine-tutorial directory of your Hume installation.

# Configure your Resources

One of Hume's primary concepts is the Hume Ecosystem, where you can configure a multitude of resources such as File Systems, Neo4j database connections, RabbitMQ connections and many other together with Skills (e.g. Natural Language Processing). Details could be found Hume Ecosystem but let's skip it for now and focus on the following:

The Ecosystem is accessible by clicking on the second icon in the left menu.

hume-ecosystem-icon

For this tutorial, we will configure two resources that we need :

  • A local file system from which we can load files
  • A Neo4j database

# Configuring the Local File System Resource

As described in the installation, Hume ships with a public directory that you can access via the /data path. Let's then create a Resource referencing the subdirectory wines under /data.

Click on "Create Resource", then select "Local Filesystem" and enter the following settings :

foo foo

# Configuring the Neo4j Database Resource

Following the same principle, you can create a Neo4j Database Resource :

TIP

The default host and port for Neo4j in a standard Hume installation are neo4j and 7687 respectively, and the default credentials are neo4j/password.

foo foo

All the resources needed for creating our Knowledge Graph are now configured, well done!

# Visualising the Knowledge Graph

We already created our schema which describes our business domain and now we need to make our Perspective of a schema. Perspectives are the elements in a Knowledge Graph providing a 360 degrees view of the data and each user can be granted a different one - for more details you can click here but let's keep these features aside for a moment.

To create your perspective just click on the Knowledge graph icon and select Perspectives tab just next to Schema. As you can see a Perspective has been already created for you automatically - just select wines-neo4j as a resources and you are good to go for now.

Wines visualisation perspective

Now click on Visualisation tab and then Create a New Visualisation :

Create visualisation

TIP

A visualisation needs a perspective. Here, it automatically selected the previously created Main perspective as it is the only one available for now.

foo

The visualisation overview shows the Schema graph and as you can see, all the schema is greyed out. This means that there isn't any data matching our schema present in the Neo4j database resource we selected.

It's then time to populate the graph!

# Import and Transform data

We will now orchestrate the data loading and transformation of our data with Orchestra.

While Orchestra is much more than an ETL tool, we will start by setting up a basic workflow.

Click on the Orchestra tab, create a new workflow and open it in order to preview a blank canvas :

foo foo

We will now configure the workflow by doing the following :

  • Load files from a Filesystem resource
  • Process the file as a CSV file
  • Define a Cypher query that will convert a CSV row to Neo4j data
  • Write the data to a Neo4j resource

Click on "Add Component" and select the "File Reader" component :

foo

Once the component is on the canvas, double-click on it to configure it :

foo

NOTE: The resource to use is the one you created previously in this tutorial, namely wines-files.

Next you can add the CSV File Processor component located under Add Component > Processors. To understand its configuration, have a look at the content of the file we will import:

number,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco  (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled out with juicy red berry fruits and freshened with acidity. It's  already drinkable, although it will certainly be better from 2016.",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos

The field delimiter is , and the CSV file contains headers. Those specifics are represented in the configuration of the component:

foo

Next add the Cypher Processor on the canvas and configure it with the following Cypher query. Right now, just copy and paste it, we will soon explain the meaning of the variables in the query:

MERGE (winery:Winery {name: $winery})
MERGE (wine:Wine {name: $title})
SET 
  wine.description = $description, 
  wine.price = toFloat($price)
MERGE (winery)-[:PRODUCES]->(wine)
MERGE (v:Variety {name: $variety})
MERGE (wine)-[:IS_OF]->(v)
MERGE (region:Region {name: coalesce($region_1, "Unknown")})
MERGE (p:Province {name: $province})
MERGE (c:Country {name: $country})
MERGE (wine)-[:PRODUCED_IN]->(region)
MERGE (region)-[:PART_OF]->(p)
MERGE (p)-[:IN_COUNTRY]->(c)
foo

Finally, add the Neo4j Writer component and select the neo4j-wines resource created previously:

foo

Once done, you can link the components from left to right :

foo

Your workflow is now set up and ready to use. Before we start, let's explain some of the Orchestra concepts that will help you understand what will happen when we start the workflow.

# Orchestra Internals

Orchestra is a integration and enrichment workflow manager where each component passes a message to the next component(s). The content of the message will differ depending on what a specific components purpose is.

A message contains two parts : a headers part and a body part. Generally as a user you will never have to deal with headers, but advanced use cases might require some understanding of useful headers during debugging.

Taking our workflow above into consideration, here is what will happen when the workflow will start :

The Local Filesystem component will watch for files in the given directory /data/wines, and for each file present (or added after the workflow has started) it will produce one message with the following structure :

{
  "headers": 
  [
    {"#Hume_File_Location": "/data/wines/file.csv"}
  ],
  "body": {}
}

The next component, the CSV File Processor component will receive that message and will read the file in streaming mode (it will not load the full content of the file into memory) and will start sending one message per CSV row ( except the first line of the CSV in case the CSV has column headers), the content of the message produced will be the following :

{
  "headers": [],
  "body": {
    "number": 0,
    "country": "Italy",
    "description": "Aromas include tropical fruit, broom (truncated)",
    "designation": "Vulkà Bianco",
    ....
  }
}

Again, you will really only interact with the body, so let's focus on it :

{
  "number": 0,
  "country": "Italy",
  "description": "Aromas include tropical fruit, broom... (truncated)",
  "designation": "Vulkà Bianco",
  ....
}

The body is represented as a map where the key-value pairs represent the column header name / field value of the CSV row. You will in general interact with Map structure of bodies.

The next component, the Cypher Query Processor will receive the message and enrich the message with two entries in the headers, one being the Cypher query you configured, the second being a Cypher Query parameters object being the content of the previous body, so you can reference parts of the body as Cypher query parameters, like here :

MERGE (c:Country {name: $country})

The $title parameter will have value Italy as we can see in the message body.

The last component, the Neo4j writer, will open a transaction against the Neo4j database configured by the resource, send the query and the content of the body as query parameters.

# Let's go!

Now that you have more understanding of what will happen after starting the workflow, it's time to start it!

Hover the STOPPED button in order to reveal the START action and click on it, after a couple of seconds the workflow will be in RUNNING mode.

Nothing will happen since the /data/wines directory is empty. You can copy the file available in the examples using your file manager, or simply by running this command from the root directory of your Hume installation :

cp examples/wine-knowledge-graph/wines-small.csv public/wines/

You can now see that the workflow is processing the CSV file and produces data to Neo4j.

foo

# Visualise the Knowledge Graph

Go back to the visualisation you created earlier and you can now see that the overview graph shows statistics about the counts of nodes and relationships matching the schema.

foo

From here you can explore your graph or search for nodes and relationships. When you created the visualisation the first time, it took care, under the hood, of creating the necessary indexes in your Neo4j database so that you can leverage Full Text Search in the visualisation.

Let's search for Southern Italy in the search bar :

foo

It searches by default on all classes you have in your Schema and returns to you the number of matches per class as well as the top 10 results across them.

After clicking on "Southern Italy", the actual node will be displayed on the canvas. You can double-click on this node to expand all its relationships or right click to selectively expand one of them :

foo

Let's expand its regions and then we can expand the Puglia region to find the wines produced in Puglia.

NOTE: We imported originally a very small subset of the real dataset. The full dataset is available for download in our Assets Section.

foo

Let's continue to explore some of the visualisation functionalities.

# Shortest Path

To find how two wines are related to each other, we can find the paths between them. To do so, go back first to the schema overview by clicking on the Schema View foo icon.

Then search for two wines, respectively :

  • Feudi di San Marzano
  • Tarara 2010

Since it uses Full Text Search, you don't have to write the full name before being able to see it as search result.

foo

Once the two wines are on the canvas, select both of them by holding the CMD key and left-clicking on the nodes, then click on the Get Interconnections image:getting-started/visualise/shortest-path-icon.png[icon, 32, 32] icon.

NOTE: If you don't see the icon, expand the Paths dropdown in the right visualisation toolbar.

The action will not return anything, because there are no path between those two wines at the default depth 4. Try increasing the depth to 8 in the selector just above the Shortest Path icon :

foo

===== Multiple Expands

Let's say that we want to visualise all the wines produced in Italy. We would have to expand from the Country to the Provinces, then to the Regions and then to the wines, of course it would not be practical to have to expand each of those nodes.

Go back to the Schema Overview image:getting-started/visualise/schema-view-icon.png[icon, 32, 32].

Search for Italy and select it so it is pulled on the canvas and double click on it to expand the provinces.

foo

Now, in order to select all nodes, execute the CTRL + a keys together, right click on any node and select Expand All :

foo foo

NOTE: If sometimes your mouse loses the selection mode (for example trying to drag the canvas is actually drawing a selection box), enable and disable the selection mode by clicking on the Selection Mode image:getting-started/visualise/selection-icon.png[icon, 32, 32] icon

# Styling the Nodes

We can add more styling effects that will drive better insights directly on the screen, for example we might want to have a circle drawn around the wine nodes that will be filled on a scale from 0 to 100 based on his price.

To do so, select any wine node on the canvas, and on the right panel, expand the settings options and click on Add New Rule :

foo

Click on the Donut tab, select the price attribute, specify 100 for the maximum and choose a colour. Click Create Rule and then save your styling under your preferred name :

foo foo

Inspect the visualisation now - you can see that the Wine nodes have a donut around them, the more it is filled, the more pricey it is.

foo

Feel free to play around with the different styling configurations.

# Knowledge Graph Actions

This section will conclude this Getting Started tutorial.

Knowledge Graph Actions are a convenient way of making simple and complex business queries available to the user without having to change anything in the Hume codebase.

For example, let's say that you would like to retrieve the Wineries producing the most wines, their wines, regions, provinces and countries. Going over each winery in the visualisation and expanding then counting the relationships manually would be very impractical.

TIP

What if you could make this available as a single click ?

Enter Knowledge Graph Actions!

# Global Actions

The above example can easily be expressed as a Cypher query :

MATCH (n:Winery) 
WITH n, size((n)-->()) AS f <1>
ORDER BY f DESC
LIMIT 10 <2>
MATCH p=(n)-[:PRODUCES]->(w)-[r*3]->(c:Country) <3>
RETURN p <4>

<1> Find wineries with the most produced wines <2> Keep the top 10 <3> Expand until countries <4> Return results

Now, let's create an Action that will make this query executable as a single click. In the top left corner of the visualisation, click on the Settings button and select the Actions tab :

foo

We will not go over all of the possible settings since Knowledge Graph Actions is a very powerful concept and the only limit will be your imagination. We will describe here one of the settings :

  • Scope ( GLOBAL ) : There are multiple scopes where an action will be visible ** GLOBAL means the user should not select an element in the graph, so the action will be displayed in the search bar ** LOCAL means the user has to select an element in the graph and the action will be displayed in the context menu when right clicking on a node or relationship

Configure your action as the following and then click on Create Action:

foo

Now, if you open the search bar, your action will be displayed. Click on it and see the results displayed on the canvas :

foo foo

Congratulations, you have been able to customize the visualisation in less than a minute!

# Local Actions

Local actions allow users to explore context of a local scope: nodes or relationships.

Say that you found a wine that you like produced in Puglia and you would like to explore other wines of the same Variety in the same or similar region. Instead of having to click through your knowledge graph to find all such related wines, you could simply define an action:

  • choose scope LOCAL
  • in Inclusion section, select Labels and tick Wine label
  • add selection parameter of type node, set Resolution to single (TO DO: what's collect and how to use it?) and choose a parameter name (let's set it to wine_id) by which you will reference node ID (id(node)) in your Cypher query (by help of $ sign)
  • define your Cypher query by leveraging the parameter you defined in previous step; your query could start by MATCH (wine) WHERE id(wine) = $wine_id ...
  • return statement must be a path in the KG, for example: ... MATCH path=()--() WHERE ... RETURN path

And we're done! Now we have a new local action that can be accessed after right-clicking on any Wine node.

You can define an action on a relationship instead of a node in a similar fashion, except that this time it is the ID of a relationship that is passed to the Cypher query via parameter name of your choice.

In case you need to allow for additional user keyboard inputs, click on ADD INPUT PARAMETER, choose a parameter name by which you will reference it in Cypher code (with help of $ sign) and optionally adjust Label (how the field for keyboard input is displayed to a user). Now each time you invoke your action, a pop-up asking for keyboard input will appear first.

Finally, if you prefer result of an action in form of a pop-up table instead of a graph, you need to set Return Type to TABULAR and define a return statement in a format of a dictionary (map) with fields:

  • id - choose any ID of your table
  • label - choose a caption of your table
  • data - actual content of the table in a format of a dictionary (map) where keys represent values in the 1st column of a table and values are values in the 2nd column (only 2 column table is currently supported)

Example of a return statement of TABULAR Return Type:

...
RETURN {id: "1", label: "Wines of variety " + v.name + " in nearby a region of a province " + p.name, data: apoc.map.fromLists(keys, wine_names)} AS map
LIMIT 10