Visualization

The Neo4j Graph Analytics app has built-in capabilities that allow you to easily visualize your Snowflake tables as graphs inside Snowflake notebooks. The visualization is interactive, and supports features such as zooming, panning, moving nodes, and hovering over nodes and relationships to see their properties. This functionality is available via the experimental.visualize procedure, which is powered by the neo4j-viz Python library.

The Cora dataset
Figure 1. A visualization of the Cora dataset

Syntax

This section covers the syntax used to generate an interactive graph visualization from Snowflake tables.

The experimental.visualize procedure takes two mandatory parameters: the Project configuration and the Visualization configuration.

Generate a graph visualization
CALL Neo4j_Graph_Analytics.experimental.visualize(
  {...},  (1)
  {...},  (2)
);
1 Project configuration.
2 Visualization configuration.

The procedure returns a string containing HTML/JavaScript for the desired graph visualization. This string can then be rendered in various ways, for example using streamlit inside a Snowflake notebook (see example below).

Project configuration

The Project configuration is used to specify which data should be included in the visualization. It is the same configuration as the project configuration used for algorithm jobs.

Table 1. Project configuration
Name Type

nodeTables

List of node tables.

relationshipTables

Map of relationship types to relationship tables.

For more details on the Project configuration, refer to the Project documentation.

Special columns

It is possible to modify the visualization by including columns of certain specific names in the node and relationship tables.

All such special columns can be found here for nodes and here for relationships. Though listed in snake_case here, SCREAMING_SNAKE_CASE and camelCase are also supported. Some of the most commonly used special columns are:

  • Node sizes: The sizes of nodes can be controlled by including a column named "SIZE" in node tables. The values in these columns should be of a numeric type. This can be useful for visualizing the relative importance or size of nodes in the graph, for example using a computed centrality score.

  • Captions: The caption text of nodes and relationships can be controlled by including a column named "CAPTION" in the tables. The values in these columns should be of a string type. This can be useful for displaying additional information about the nodes, such as their names or labels. If no "CAPTION" column is provided, the default captions in the visualization will be the names of the corresponding node and relationship tables.

If the columns you want to use are of different names, we recommend using views to rename them to the desired names.

Visualization configuration

The Visualization configuration is used to specify how the provided graph should be rendered, and is made up of two main parts.

Table 2. Visualization configuration
Name Type Default Optional Description

nodeColoring

Map

{}

yes

Configuration for node coloring.

renderOptions

Map

{}

yes

Configuration for rendering options.

Node coloring configuration

The node coloring configuration allows you to specify how the nodes in the graph should be colored based on the values in a specific column.

Table 3. Node coloring configuration
Name Type Default Optional Description

byColumn

String

n/a

no

The column whose values make up the basis for coloring the nodes.

colorSpace

String

"discrete"

yes

The color space to use for coloring the nodes. Either "discrete" or "continuous".

If the "discrete" color space is used, each unique value in the specified byColumn column will be assigned a different color (as long as the colors last). This can be useful for visualizing categorical data, like community detection output, where each category is represented by a distinct color.

If the "continuous" color space is used, a gradient of colors will be applied based on the values in the specified byColumn column. This is useful for visualizing numerical data, such as centrality measures, where the color intensity represents the magnitude of the value.

By default, nodes are colored using the "discrete" color space and unique node captions to distinguish between colors (which in turn defaults to table names).

Render options configuration

The render options configuration allows you to specify how the graph should be rendered, including the height and width of the rendered graph, the maximum number of nodes allowed in the rendered graph, and the renderer to use.

Table 4. Render options configuration
Name Type Default Optional Description

height

String

"600px"

yes

The height of the rendered graph.

width

String

"100%"

yes

The width of the rendered graph.

maxAllowedNodes

Integer

10000

yes

The maximum number of nodes allowed in the rendered graph. The rendering will fail if the number of nodes exceeds this limit, to prevent performance issues.

renderer

String

"canvas"

yes

The renderer to use for rendering the graph. Either "webgl" or "canvas".

The WebGL renderer is optimized for performance and handles large graphs better. However, it does not render text, icons, and arrowheads on relationships. The canvas renderer is less performant than the WebGL renderer, making it less suited to render large graphs. However, it can render text, icons, and arrowheads on relationships.

Example

In this example, we will visualize a small graph representing a small group of people and what musical instruments they like, in a Snowflake notebook.

Setting up the graph

We will start by creating the three tables we need, using an SQL notebook cell. One node table for the people, one node table for the musical instruments, and one relationship table to represent the "LIKES" relationship from people to instruments.

Creating the graph as tables

Next we need to grant the application access to these tables, so that the experimental.visualize procedure can read them.

Granting table access to the application

Calling the visualize procedure

Now that we have our tables, we can proceed to visualize them. We provide the tables in the Project configuration, and leave the Visualization configuration empty to use the default settings. We call the experimental.visualize procedure within a SQL notebook cell.

Calling `experimental.visualize`

We can see that the procedure returns a string containing HTML/JavaScript for the desired graph visualization. It is inside a table of one row with a single column named "VISUALIZE".

Rendering the visualization

We can access the output of the previous cell by referencing its cell name, in this case cell7. In our next Python notebook cell, we extract the HTML/JavaScript string we want by interpreting the cell7 output as a Pandas DataFrame, then accessing the first row of the "VISUALIZE" column.

The HTML/JavaScript string we have derived can now be rendered in various ways. In this example we will use the streamlit library. We set the height to be 600 pixels, which is the same as the default height in the renderOptions configuration for experimental.visualize.

Rendering the HTML

The graph renders nicely, and we see that our two node types, PERSONS and INSTRUMENTS, are colored and captioned differently (according to the default settings). The relationships are rendered as arrows with the "LIKES" caption.

We can zoom in and out, pan around, move nodes, and hover over nodes and relationships to see their properties. The buttons on the top right also allow us to zoom, in addition to taking PNG snapshots of the graph.

Customizing node sizes

By adding a column named "SIZE" to the node tables, we can control the sizes of the nodes in the visualization.

Since we do not have any size columns in our dataset, we will generate appropriate sizes using the popular PageRank algorithm, which computes a centrality score for each node in the graph.

Running the PageRank algorithm

The output of the algorithm is written to the tables PERSONS_PAGERANK and INSTRUMENTS_PAGERANK. Let us inspect the former.

Inspecting 'PERSONS_PAGERANK'

We see that there is a "PAGERANK" column that holds the computed centrality scores.

Since the requirement for customizing node sizes in the visualization is to have columns named "SIZE", we can rename the "PAGERANK" columns to "SIZE" using views.

Creating views

Let us inspect the PERSONS_VIEW view we created, to make sure it has a "SIZE" column like we would expect.

Inspecting 'PERSONS_VIEW'

With PERSONS_VIEW and INSTRUMENTS_VIEW as our new node tables, and the same relationship table as before, we are almost ready to visualize the updated graph. Before doing so however, we need to make sure to grant the application access to the new views, so that the experimental.visualize procedure can read them. We can do so by running the SQL notebook cell above for granting again.

In this example we could actually have set the mutateProperty parameter of graph.page_rank to 'SIZE', and thus get the PageRank centrality scores in columns named 'SIZE' (instead of the default 'PAGERANK') directly in the algorithm output tables. This would have saved us the step of creating views.

We are now ready to call experimental.visualize again.

Calling `experimental.visualize` with the views as input

We can now render the HTML/JavaScript string again, but this time referring to the new cell13 output.

Rendering the HTML with node sizes

As we can see, some nodes are larger than others, depending on their PageRank centrality score that we computed earlier.

Performance considerations

The performance of the experimental.visualize procedure depends on the size and complexity of the graph being visualized, as well as the machine it runs on.

The experimental.visualize procedure runs inside a Snowflake warehouse, and as such this warehouse should be sized appropriately for the graph being visualized.

The maxAllowedNodes parameter in the renderOptions part of the Visualization configuration is used to limit the number of nodes in the rendered graph, so that the rendering does not take too long or consume too much memory by default. If you need to visualize a large graphs, you can increase this limit, but be aware that this may lead to performance issues. To limit such performance issues, make sure that you are using an appropriately sized warehouse, and consider using the webgl renderer (of renderOptions).