Skip to content
Generic filters
Exact matches only

a street fighter’s guide to build a graphCNN

How to build neural networks with custom structure and layers: Graph Convolutional Neural Network (GCNN) in Keras.

Shuyi Yang

At a certain point in our lives, predefined layers in Tensorflow Keras are not enough anymore! We want more! We want to build custom neural networks with creative structures and bizarre layers! Luckily for us, we can easily perform this task within Keras by defining our custom layers and models. In this step-by-step tutorial we are going to build a neural network with parallel layers including graph convolutional one. Wait a minute! What is the convolution on a graph?

Graph convolutional neural network

In a traditional neural network layer we perform a matrix multiplication between the layer input matrix X and the trainable weights matrix W. Then we apply an activation function f. Hence, the input of the next layer (output of the current layer) can be represented as f(XW). In a graph convolutional neural network, we suppose that similar instances are connected in a graph (eg. citation network, distance-based networks, etc.) and the features coming from the neighborhood could be useful in a (un)supervised task. Let A be the adjacency matrix of the graph, then the operation we are going to perform in a convolutional layer is f(AXW). For each node of the graph, we are going to aggregate the features from other connected nodes and then multiply this aggregation by the weights matrix and then apply the activation. This formulation of graph convolution is the simplest one. It’s fine for our tutorial but graphCNN is much more!

Ok! Now, we are ready!

Step 1. Preparations

Firstly, we need to import some packages.

# Import packages
from tensorflow import __version__ as tf_version, float32 as tf_float32, Variable
from tensorflow.keras import Sequential, Model
from tensorflow.keras.backend import variable, dot as k_dot, sigmoid, relu
from tensorflow.keras.layers import Dense, Input, Concatenate, Layer
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.utils import plot_model
from tensorflow.random import set_seed as tf_set_seed
from numpy import __version__ as np_version, unique, array, mean, argmax
from numpy.random import seed as np_seed, choice
from pandas import __version__ as pd_version, read_csv, DataFrame, concat
from sklearn import __version__ as sk_version
from sklearn.preprocessing import normalize
print("tensorflow version:", tf_version)
print("numpy version:", np_version)
print("pandas version:", pd_version)
print("scikit-learn version:", sk_version)

You should receive as output the versions of the imported packages. In my case the output is:

tensorflow version: 2.2.0 
numpy version: 1.18.5
pandas version: 1.0.4
scikit-learn version: 0.22.2.post1

In this tutorial, we are going to use the CORA dataset:

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Let’s load the data, create the adjacency matrix and prepare the features matrix.

# Load cora datadtf_data = read_csv("").sort_values(["paper_id"], ascending=True)
dtf_graph = read_csv("")
# Adjacency matrix
array_papers_id = unique(dtf_data["paper_id"])
dtf_graph["connection"] = 1
dtf_graph_tmp = DataFrame({"cited_paper_id": array_papers_id, "citing_paper_id": array_papers_id, "connection": 0})
dtf_graph = concat((dtf_graph, dtf_graph_tmp)).sort_values(["cited_paper_id", "citing_paper_id"], ascending=True)
dtf_graph = dtf_graph.pivot_table(index="cited_paper_id", columns="citing_paper_id", values="connection", fill_value=0).reset_index(drop=True)
A = array(dtf_graph)
A = normalize(A, norm='l1', axis=1)
A = variable(A, dtype=tf_float32)
# Feature matrix
data = array(dtf_data.iloc[:, 1:1434])
# Labels
labels = array(
'Case_Based': 0,
'Genetic_Algorithms': 1,
'Neural_Networks': 2,
'Probabilistic_Methods': 3,
'Reinforcement_Learning': 4,
'Rule_Learning': 5,
'Theory': 6
# Check dimensions
print("Features matrix dimension:", data.shape, "| Label array dimension:", labels.shape, "| Adjacency matrix dimension:", A.shape)

Lastly, let’s define some parameters useful for the training of neural networks.

# Training parameters
input_shape = (data.shape[1], )
output_classes = len(unique(labels))
iterations = 50
epochs = 100
batch_size = data.shape[0]
labeled_portion = 0.10

As you can deduce from the code above, for each model, we are going to perform 50 iterations and in each iteration we will randomly choose a 10% labeled set (training set) and train the model for 100 epochs.

It is important to point out that the scope of this tutorial is not training the most accurate model on CORA dataset. Instead, we just want to provide an example of implementing custom models with keras custom layers!

Model 1: neural network with sequential layers

As baseline, we use a standard neural network with sequential layers (a familiar keras sequential model).

# Model 1: standard sequential neural networktf_set_seed(1102)
model1 = Sequential([
Dense(32, input_shape=input_shape, activation='relu'),
Dense(16, activation='relu'),
Dense(output_classes, activation='softmax')
], name="Model_1")
plot_model(model1, 'model1.png', show_shapes=True)

We can plot the model to see the sequential structure.

Image by author. Structure of the Model 1: sequential dense layers.

Let’s see how this model perform.

# Testing model 1tf_set_seed(1102)
acc_model1 = []for _ in range(iterations): mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])
labeled_data = data[mask, :]
unlabeled_data = data[~mask, :]
labeled_data_labels = labels[mask]
unlabeled_data_labels = labels[~mask]
model1.load_weights("model1_initial_weights.h5") model1.compile(
), labeled_data_labels, epochs=epochs, batch_size=batch_size, verbose=0) acc_model1.append(sum(argmax(model1.predict(unlabeled_data, batch_size=batch_size), axis=1) == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)print("nAverage accuracy on unlabeled set:", mean(acc_model1), "%")

You should obtain an average accuracy of 55%.

Model 2: neural network with parallel layers

Let’s introduce a small modification to the previous model. This time we want to have a network with two parallel hidden layers. We use Keras Functional API. With the functional API we can build models with non-linear topology, models with shared layers, and models with multiple inputs or outputs. Basically, we need to assign each layer to a variable and then refer to the variable to concatenate different layers in order to create a directed acyclic graph (DAG). Then the model can be built by passing the input layer(s) and the output layer(s).

# Model 2: neural network with parallel layerstf_set_seed(1102)
m2_input_layer = Input(shape=input_shape)
m2_dense_layer_1 = Dense(32, activation='relu')(m2_input_layer)
m2_dense_layer_2 = Dense(16, activation='relu')(m2_input_layer)
m2_merged_layer = Concatenate()([m2_dense_layer_1, m2_dense_layer_2])
m2_final_layer = Dense(output_classes, activation='softmax')(m2_merged_layer)
model2 = Model(inputs=m2_input_layer, outputs=m2_final_layer, name="Model_2")
plot_model(model2, 'model2.png', show_shapes=True)

The parallel layers m2_dense_layer_1 and m2_dense_layer_2 depend on the same input layer m2_input_layer, and are then concatenated to form a unique layer in m2_merged_layer. This neural network should look like:

Image by author. Structure of the Model 2: parallel dense layers.

Let’s test this model.

# Testing model 2tf_set_seed(1102)
acc_model2 = []for _ in range(iterations): mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])
labeled_data = data[mask, :]
unlabeled_data = data[~mask, :]
labeled_data_labels = labels[mask]
unlabeled_data_labels = labels[~mask]
), labeled_data_labels, epochs=epochs, batch_size=batch_size, shuffle=False, verbose=0) acc_model2.append(sum(argmax(model2.predict(unlabeled_data, batch_size=batch_size), axis=1) == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)print("nAverage accuracy on unlabeled set:", mean(acc_model2), "%")

The average accuracy is nearly 60% (+5)!

Model 2: neural network with graph conv layer

So far, we have seen how to create custom network structure with Keras Functional API. What if we need to define custom layers with user-defined operations? In our case, we would like to define a simple graph convolutional layer as explained at the beginning of this tutorial. To this end, we need to create a subclass from the class Layer and define the methods __init__, build and call.

# Graph convolutional layerclass GraphConv(Layer):    def __init__(self, num_outputs, A, activation="sigmoid", **kwargs):
super(GraphConv, self).__init__(**kwargs)
self.num_outputs = num_outputs
self.activation_function = activation
self.A = Variable(A, trainable=False)
def build(self, input_shape):
# Weights
self.W = self.add_weight("W", shape=[int(input_shape[-1]), self.num_outputs])
# bias
self.bias = self.add_weight("bias", shape=[self.num_outputs])
def call(self, input):
if self.activation_function == 'relu':
return relu(k_dot(k_dot(self.A, input), self.W) + self.bias)
return sigmoid(k_dot(k_dot(self.A, input), self.W) + self.bias)

During the inititialization, you can require and save any useful parameter (eg. activation function, number of output neurons). In our example, we require also the adjacency matrix A. In the build method, the trainable weights of the layer are initialized. In the call method, the forward pass computation is declared.

As in the previous model, we define a network with parallel layers.

# Model 3: neural network with graph convolutional layertf_set_seed(1102)
m3_input_layer = Input(shape=input_shape)
m3_dense_layer = Dense(32, activation='relu')(m3_input_layer)
m3_gc_layer = GraphConv(16, A=A, activation='relu')(m3_input_layer)
m3_merged_layer = Concatenate()([m3_dense_layer, m3_gc_layer])
m3_final_layer = Dense(output_classes, activation='softmax')(m3_merged_layer)
model3 = Model(inputs=m3_input_layer, outputs=m3_final_layer, name="Model_3")model3.save_weights("model3_initial_weights.h5")model3.summary()
plot_model(model3, 'model3.png', show_shapes=True)

It looks like the previous model but one layer is convolutional: intrinsict features of each instance are concatenated with the aggregated features computed from the neighbourhood.

Image by author. Structure of the Model 3: convolutional layer and custom structure.

Further attention should be paid when compiling this model. Since the convolutional layer requires the entire adjacency matrix, we need to pass the entire features matrix (labeled and unlabeled instances) but the model should be trained only on labeled instances. Therefore, we define a custom loss function where the sparse categorical cossentropy is computed only on the labeled instances. Additionally, we randomize the labels of unlabaled instances in order to be sure that they are not used during the training.

# Testing model 3tf_set_seed(1102)
acc_model3 = []for i in range(iterations):
mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])
unlabeled_data_labels = labels[~mask]
# Randomize the labels of unlabeled instances
masked_labels = labels.copy()
masked_labels[~mask] = choice(range(7), size=sum(~mask), replace=True)
loss=lambda y_true, y_pred: SparseCategoricalCrossentropy(from_logits=False)(y_true[mask], y_pred[mask]),
), masked_labels, epochs=epochs, batch_size=batch_size, shuffle=False, verbose=0) predictions = argmax(model3.predict(data, batch_size=batch_size), axis=1)
acc_model3.append(sum(predictions[~mask] == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)
print("nAverage accuracy on unlabeled set:", mean(acc_model3), "%")

This experiment produces an average accuracy of 63% (+3).

Interestingly, in this last experiment we are basically performing a semi-supervised learning with graphCNN: information from the unlabeled instances are used altogether with the labeled ones to build a graph-based transductive model.

The complete Jupyter Notebook containing the code can be found here.