Visualization

The Neo4j Graph Analytics app has built-in capabilities that allow you to easily visualize your Snowflake tables as graphs inside Snowflake notebooks. The visualization is interactive, and supports features such as zooming, panning, moving nodes, and hovering over nodes and relationships to see their properties. This functionality is available via the experimental.visualize procedure, which is powered by the neo4j-viz Python library.

Figure 1. A visualization of the Cora dataset

Syntax

This section covers the syntax used to generate an interactive graph visualization from Snowflake tables.

The experimental.visualize procedure takes two mandatory parameters: the Project configuration and the Visualization configuration.

Generate a graph visualization

CALL Neo4j_Graph_Analytics.experimental.visualize(
  {...},  (1)
  {...},  (2)
);

1	Project configuration.
2	Visualization configuration.

The procedure returns a string containing HTML/JavaScript for the desired graph visualization. This string can then be rendered in various ways, for example using streamlit inside a Snowflake notebook (see example below).

Project configuration

The Project configuration is used to specify which data should be included in the visualization. It is the same configuration as the project configuration used for algorithm jobs.

Table 1. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

For more details on the Project configuration, refer to the Project documentation.

Special columns

It is possible to modify the visualization by including columns of certain specific names in the node and relationship tables.

All such special columns can be found here for nodes and here for relationships. Though listed in snake_case here, SCREAMING_SNAKE_CASE and camelCase are also supported. Some of the most commonly used special columns are:

Node sizes: The sizes of nodes can be controlled by including a column named "SIZE" in node tables. The values in these columns should be of a numeric type. This can be useful for visualizing the relative importance or size of nodes in the graph, for example using a computed centrality score.
Captions: The caption text of nodes and relationships can be controlled by including a column named "CAPTION" in the tables. The values in these columns should be of a string type. This can be useful for displaying additional information about the nodes, such as their names or labels. If no "CAPTION" column is provided, the default captions in the visualization will be the names of the corresponding node and relationship tables.

If the columns you want to use are of different names, we recommend using views to rename them to the desired names.

Visualization configuration

The Visualization configuration is used to specify how the provided graph should be rendered, and is made up of two main parts.

Table 2. Visualization configuration
Name	Type	Default	Optional	Description
nodeColoring	Map	`{}`	yes	Configuration for node coloring.
renderOptions	Map	`{}`	yes	Configuration for rendering options.

Node coloring configuration

The node coloring configuration allows you to specify how the nodes in the graph should be colored based on the values in a specific column.

Table 3. Node coloring configuration
Name	Type	Default	Optional	Description
byColumn	String	`n/a`	no	The column whose values make up the basis for coloring the nodes.
colorSpace	String	`"discrete"`	yes	The color space to use for coloring the nodes. Either "discrete" or "continuous".

If the "discrete" color space is used, each unique value in the specified byColumn column will be assigned a different color (as long as the colors last). This can be useful for visualizing categorical data, like community detection output, where each category is represented by a distinct color.

If the "continuous" color space is used, a gradient of colors will be applied based on the values in the specified byColumn column. This is useful for visualizing numerical data, such as centrality measures, where the color intensity represents the magnitude of the value.

By default, nodes are colored using the "discrete" color space and unique node captions to distinguish between colors (which in turn defaults to table names).

Render options configuration

The render options configuration allows you to specify how the graph should be rendered, including the height and width of the rendered graph, the maximum number of nodes allowed in the rendered graph, and the renderer to use.

Table 4. Render options configuration
Name	Type	Default	Optional	Description
height	String	`"600px"`	yes	The height of the rendered graph.
width	String	`"100%"`	yes	The width of the rendered graph.
maxAllowedNodes	Integer	`10000`	yes	The maximum number of nodes allowed in the rendered graph. The rendering will fail if the number of nodes exceeds this limit, to prevent performance issues.
renderer	String	`"canvas"`	yes	The renderer to use for rendering the graph. Either "webgl" or "canvas".

The WebGL renderer is optimized for performance and handles large graphs better. However, it does not render text, icons, and arrowheads on relationships. The canvas renderer is less performant than the WebGL renderer, making it less suited to render large graphs. However, it can render text, icons, and arrowheads on relationships.

Example

In this example, we will visualize a small graph representing a small group of people and what musical instruments they like, in a Snowflake notebook.

Setting up the graph

We will start by creating the three tables we need, using an SQL notebook cell. One node table for the people, one node table for the musical instruments, and one relationship table to represent the "LIKES" relationship from people to instruments.

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.PERSONS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.PERSONS VALUES
  ('Alice'),
  ('Bob'),
  ('Carol'),
  ('Dave'),
  ('Eve');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS VALUES
  ('Guitar'),
  ('Synthesizer'),
  ('Bongos'),
  ('Trumpet');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LIKES (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LIKES VALUES
  ('Alice', 'Guitar'),
  ('Alice', 'Synthesizer'),
  ('Alice', 'Bongos'),
  ('Bob',   'Guitar'),
  ('Bob',   'Synthesizer'),
  ('Carol', 'Bongos'),
  ('Dave',  'Guitar'),
  ('Dave',  'Trumpet'),
  ('Dave',  'Bongos');

Next we need to grant the application access to these tables, so that the experimental.visualize procedure can read them.

-- Use a role with granting privileges
USE ROLE ACCOUNTADMIN;

-- Change the app name below if you don't have the default one
GRANT USAGE ON DATABASE EXAMPLE_DB TO APPLICATION Neo4j_Graph_Analytics;
GRANT USAGE ON SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT SELECT ON ALL TABLES IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;
GRANT SELECT ON ALL VIEWS IN SCHEMA EXAMPLE_DB.DATA_SCHEMA TO APPLICATION Neo4j_Graph_Analytics;

Calling the `visualize` procedure

Now that we have our tables, we can proceed to visualize them. We provide the tables in the Project configuration, and leave the Visualization configuration empty to use the default settings. We call the experimental.visualize procedure within a SQL notebook cell.

-- Change the app name here if you don't have the default one
CALL Neo4j_Graph_Analytics.experimental.visualize(
    {
        'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.PERSONS', 'EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS'],
        'relationshipTables': {
            'EXAMPLE_DB.DATA_SCHEMA.LIKES': {
                'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.PERSONS',
                'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS'
            }
        }
    },
    {}
);

Output:

We can see that the procedure returns a string containing HTML/JavaScript for the desired graph visualization. It is inside a table of one row with a single column named "VISUALIZE".

Rendering the visualization

We can access the output of the previous notebook cell by referencing its cell name, which we here refer to as <my-cell>. In our next Python notebook cell, we extract the HTML/JavaScript string we want by interpreting the <my-cell> output as a Pandas DataFrame, then accessing the first row of the "VISUALIZE" column.

The HTML/JavaScript string we have derived can now be rendered in various ways. In this example we will use the streamlit library. We set the height to be 600 pixels, which is the same as the default height in the renderOptions configuration for experimental.visualize.

import streamlit.components.v1 as components

components.html(
    <my-cell>.to_pandas().loc[0]["VISUALIZE"],
    height=600
)

And the output:

The graph renders nicely, and we see that our two node types, PERSONS and INSTRUMENTS, are colored and captioned differently (according to the default settings). The relationships are rendered as arrows with the "LIKES" caption.

We can zoom in and out, pan around, move nodes, and hover over nodes and relationships to see their properties. The buttons on the top right also allow us to zoom, in addition to taking PNG snapshots of the graph.

Customizing node sizes

By adding a column named "SIZE" to the node tables, we can control the sizes of the nodes in the visualization.

Since we do not have any size columns in our dataset, we will generate appropriate sizes using the popular PageRank algorithm, which computes a centrality score for each node in the graph.

-- Change the app name here if you don't have the default one
CALL Neo4j_Graph_Analytics.graph.page_rank('CPU_X64_XS', {
    'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
    'project': {
        'nodeTables': [ 'PERSONS', 'INSTRUMENTS' ],
        'relationshipTables': {
            'LIKES': {
                'sourceTable': 'PERSONS',
                'targetTable': 'INSTRUMENTS',
                'orientation': 'UNDIRECTED'
            }
        }
    },
    'compute': {},
    'write': [
        {
            'nodeLabel': 'PERSONS',
            'outputTable': 'PERSONS_PAGERANK2'
        },
        {
            'nodeLabel': 'INSTRUMENTS',
            'outputTable': 'INSTRUMENTS_PAGERANK2'
        }
    ]
});

The output of the algorithm is written to the tables PERSONS_PAGERANK and INSTRUMENTS_PAGERANK. Let us inspect the former.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_PAGERANK;

Table 5. Results
NODEID	PAGERANK
Alice	1.207
Bob	0.8466
Carol	0.5104
Dave	1.281
Eve	0.15

We see that there is a "PAGERANK" column that holds the computed centrality scores.

Since the requirement for customizing node sizes in the visualization is to have columns named "SIZE", we can rename the "PAGERANK" columns to "SIZE" using views.

USE DATABASE EXAMPLE_DB;

CREATE OR REPLACE VIEW EXAMPLE_DB.DATA_SCHEMA.PERSONS_VIEW AS
SELECT
    NODEID AS NODEID,
    PAGERANK AS SIZE
FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_PAGERANK;

CREATE OR REPLACE VIEW EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS_VIEW AS
SELECT
    NODEID AS NODEID,
    PAGERANK AS SIZE
FROM EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS_PAGERANK;

Let us inspect the PERSONS_VIEW view we created, to make sure it has a "SIZE" column like we would expect.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PERSONS_VIEW;

Table 6. Results
NODEID	SIZE
Alice	1.207
Bob	0.8466
Carol	0.5104
Dave	1.281
Eve	0.15

With PERSONS_VIEW and INSTRUMENTS_VIEW as our new node tables, and the same relationship table as before, we are almost ready to visualize the updated graph. Before doing so however, we need to make sure to grant the application access to the new views, so that the experimental.visualize procedure can read them. We can do so by running the SQL notebook cell above for granting again.

In this example, we could actually have set the mutateProperty parameter of graph.page_rank to 'SIZE', and thus get the PageRank centrality scores in columns named 'SIZE' (instead of the default 'PAGERANK') directly in the algorithm output tables. This would have saved us the step of creating views.

We are now ready to call experimental.visualize again.

-- Change the app name here if you don't have the default one
CALL Neo4j_Graph_Analytics.experimental.visualize(
    {
        'nodeTables': ['EXAMPLE_DB.DATA_SCHEMA.PERSONS_VIEW', 'EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS_VIEW'],
        'relationshipTables': {
            'EXAMPLE_DB.DATA_SCHEMA.LIKES': {
                'sourceTable': 'EXAMPLE_DB.DATA_SCHEMA.PERSONS_VIEW',
                'targetTable': 'EXAMPLE_DB.DATA_SCHEMA.INSTRUMENTS_VIEW'
            }
        }
    },
    {}
);

This yields:

We can now render the HTML/JavaScript string again, making sure to use the correct cell name reference.

import streamlit.components.v1 as components

components.html(
    <my-cell>.to_pandas().loc[0]["VISUALIZE"],
    height=600
)

And the output:

As we can see, some nodes are larger than others, depending on their PageRank centrality score that we computed earlier.

Performance considerations

The performance of the experimental.visualize procedure depends on the size and complexity of the graph being visualized, as well as the machine it runs on.

The experimental.visualize procedure runs inside a Snowflake warehouse, and as such this warehouse should be sized appropriately for the graph being visualized.

The maxAllowedNodes parameter in the renderOptions part of the Visualization configuration is used to limit the number of nodes in the rendered graph, so that the rendering does not take too long or consume too much memory by default. If you need to visualize a large graphs, you can increase this limit, but be aware that this may lead to performance issues. To limit such performance issues, make sure that you are using an appropriately sized warehouse, and consider using the webgl renderer (of renderOptions).

Visualization

Syntax

Project configuration

Special columns

Visualization configuration

Node coloring configuration

Render options configuration

Example

Setting up the graph

Calling the visualize procedure

Rendering the visualization

Customizing node sizes

Performance considerations

Calling the `visualize` procedure