Contents

- 1 How to build neural networks with custom structure and layers: Graph Convolutional Neural Network (GCNN) in Keras.
- 2 Graph convolutional neural network
- 3 Step 1. Preparations
- 4 Model 1: neural network with sequential layers
- 5 Model 2: neural network with parallel layers
- 6 Model 2: neural network with graph conv layer
- 7 References

## How to build neural networks with custom structure and layers: Graph Convolutional Neural Network (GCNN) in Keras.

At a certain point in our lives, predefined layers in Tensorflow **Keras** are not enough anymore! We want more! We want to build custom neural networks with creative structures and bizarre layers! Luckily for us, we can easily perform this task within Keras by defining our custom layers and models. In this step-by-step tutorial we are going to build a neural network with parallel layers including graph convolutional one. Wait a minute! What is the convolution on a graph?

## Graph convolutional neural network

In a traditional neural network layer we perform a matrix multiplication between the layer input matrix *X* and the trainable weights matrix *W*. Then we apply an activation function *f. *Hence, the input of the next layer (output of the current layer) can be represented as *f(XW). *In a graph convolutional neural network, we suppose that similar instances are connected in a graph (eg. citation network, distance-based networks, etc.) and the features coming from the neighborhood could be useful in a (un)supervised task. Let *A* be the adjacency matrix of the graph, then the operation we are going to perform in a convolutional layer is *f(AXW)*. For each node of the graph, we are going to aggregate the features from other connected nodes and then multiply this aggregation by the weights matrix and then apply the activation. This formulation of graph convolution is the simplest one. It’s fine for our tutorial but graphCNN is much more!

Ok! Now, we are ready!

## Step 1. Preparations

Firstly, we need to import some packages.

# Import packages

from tensorflow import __version__ as tf_version, float32 as tf_float32, Variable

from tensorflow.keras import Sequential, Model

from tensorflow.keras.backend import variable, dot as k_dot, sigmoid, relu

from tensorflow.keras.layers import Dense, Input, Concatenate, Layer

from tensorflow.keras.losses import SparseCategoricalCrossentropy

from tensorflow.keras.utils import plot_model

from tensorflow.random import set_seed as tf_set_seed

from numpy import __version__ as np_version, unique, array, mean, argmax

from numpy.random import seed as np_seed, choice

from pandas import __version__ as pd_version, read_csv, DataFrame, concat

from sklearn import __version__ as sk_version

from sklearn.preprocessing import normalizeprint("tensorflow version:", tf_version)

print("numpy version:", np_version)

print("pandas version:", pd_version)

print("scikit-learn version:", sk_version)

You should receive as output the versions of the imported packages. In my case the output is:

`tensorflow version: 2.2.0 `

numpy version: 1.18.5

pandas version: 1.0.4

scikit-learn version: 0.22.2.post1

In this tutorial, we are going to use the CORA dataset:

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Let’s load the data, create the adjacency matrix and prepare the features matrix.

# Load cora datadtf_data = read_csv("https://raw.githubusercontent.com/ngshya/datasets/master/cora/cora_content.csv").sort_values(["paper_id"], ascending=True)

dtf_graph = read_csv("https://raw.githubusercontent.com/ngshya/datasets/master/cora/cora_cites.csv")# Adjacency matrix

array_papers_id = unique(dtf_data["paper_id"])

dtf_graph["connection"] = 1

dtf_graph_tmp = DataFrame({"cited_paper_id": array_papers_id, "citing_paper_id": array_papers_id, "connection": 0})

dtf_graph = concat((dtf_graph, dtf_graph_tmp)).sort_values(["cited_paper_id", "citing_paper_id"], ascending=True)

dtf_graph = dtf_graph.pivot_table(index="cited_paper_id", columns="citing_paper_id", values="connection", fill_value=0).reset_index(drop=True)

A = array(dtf_graph)

A = normalize(A, norm='l1', axis=1)

A = variable(A, dtype=tf_float32)# Feature matrix

data = array(dtf_data.iloc[:, 1:1434])# Labels

labels = array(

dtf_data["label"].map({

'Case_Based': 0,

'Genetic_Algorithms': 1,

'Neural_Networks': 2,

'Probabilistic_Methods': 3,

'Reinforcement_Learning': 4,

'Rule_Learning': 5,

'Theory': 6

})

)# Check dimensions

print("Features matrix dimension:", data.shape, "| Label array dimension:", labels.shape, "| Adjacency matrix dimension:", A.shape)

Lastly, let’s define some parameters useful for the training of neural networks.

`# Training parameters`

input_shape = (data.shape[1], )

output_classes = len(unique(labels))

iterations = 50

epochs = 100

batch_size = data.shape[0]

labeled_portion = 0.10

As you can deduce from the code above, for each model, we are going to perform 50 iterations and in each iteration we will randomly choose a 10% labeled set (training set) and train the model for 100 epochs.

It is important to point out that the scope of this tutorial is not training the most accurate model on CORA dataset. Instead, we just want to provide an example of implementing custom models with keras custom layers!

## Model 1: neural network with sequential layers

As baseline, we use a standard neural network with **sequential layers** (a familiar **keras sequential model**).

# Model 1: standard sequential neural networktf_set_seed(1102)

np_seed(1102)model1 = Sequential([

Dense(32, input_shape=input_shape, activation='relu'),

Dense(16, activation='relu'),

Dense(output_classes, activation='softmax')

], name="Model_1")

model1.save_weights("model1_initial_weights.h5")model1.summary()

plot_model(model1, 'model1.png', show_shapes=True)

We can plot the model to see the sequential structure.

Let’s see how this model perform.

# Testing model 1tf_set_seed(1102)

np_seed(1102)acc_model1 = []for _ in range(iterations): mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])

labeled_data = data[mask, :]

unlabeled_data = data[~mask, :]

labeled_data_labels = labels[mask]

unlabeled_data_labels = labels[~mask] model1.load_weights("model1_initial_weights.h5") model1.compile(

optimizer='adam',

loss=SparseCategoricalCrossentropy(from_logits=False),

metrics=['accuracy']

) model1.fit(labeled_data, labeled_data_labels, epochs=epochs, batch_size=batch_size, verbose=0) acc_model1.append(sum(argmax(model1.predict(unlabeled_data, batch_size=batch_size), axis=1) == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)print("nAverage accuracy on unlabeled set:", mean(acc_model1), "%")

You should obtain an average accuracy of 55%.

## Model 2: neural network with parallel layers

Let’s introduce a small modification to the previous model. This time we want to have a network with two parallel hidden layers. We use **Keras Functional API**. With the functional API we can build models with non-linear topology, models with shared layers, and models with multiple inputs or outputs. Basically, we need to assign each layer to a variable and then refer to the variable to concatenate different layers in order to create a directed acyclic graph (DAG). Then the model can be built by passing the input layer(s) and the output layer(s).

# Model 2: neural network with parallel layerstf_set_seed(1102)

np_seed(1102)m2_input_layer = Input(shape=input_shape)

m2_dense_layer_1 = Dense(32, activation='relu')(m2_input_layer)

m2_dense_layer_2 = Dense(16, activation='relu')(m2_input_layer)

m2_merged_layer = Concatenate()([m2_dense_layer_1, m2_dense_layer_2])

m2_final_layer = Dense(output_classes, activation='softmax')(m2_merged_layer)model2 = Model(inputs=m2_input_layer, outputs=m2_final_layer, name="Model_2")

model2.save_weights("model2_initial_weights.h5")model2.summary()

plot_model(model2, 'model2.png', show_shapes=True)

The parallel layers *m2_dense_layer_1* and *m2_dense_layer_2 *depend on the same input layer *m2_input_layer*, and are then concatenated to form a unique layer in *m2_merged_layer*. This neural network should look like:

Let’s test this model.

# Testing model 2tf_set_seed(1102)

np_seed(1102)acc_model2 = []for _ in range(iterations): mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])

labeled_data = data[mask, :]

unlabeled_data = data[~mask, :]

labeled_data_labels = labels[mask]

unlabeled_data_labels = labels[~mask] model2.load_weights("model2_initial_weights.h5")

model2.compile(

optimizer='adam',

loss=SparseCategoricalCrossentropy(from_logits=False),

metrics=['accuracy']

) model2.fit(labeled_data, labeled_data_labels, epochs=epochs, batch_size=batch_size, shuffle=False, verbose=0) acc_model2.append(sum(argmax(model2.predict(unlabeled_data, batch_size=batch_size), axis=1) == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)print("nAverage accuracy on unlabeled set:", mean(acc_model2), "%")

The average accuracy is nearly 60% (+5)!

## Model 2: neural network with graph conv layer

So far, we have seen how to create custom network structure with Keras Functional API. What if we need to define **custom layers** with user-defined operations? In our case, we would like to define a simple **graph convolutional layer** as explained at the beginning of this tutorial. To this end, we need to create a subclass from the class **Layer** and define the methods **__init__**, **build** and **call**.

# Graph convolutional layerclass GraphConv(Layer): def __init__(self, num_outputs, A, activation="sigmoid", **kwargs):

super(GraphConv, self).__init__(**kwargs)

self.num_outputs = num_outputs

self.activation_function = activation

self.A = Variable(A, trainable=False) def build(self, input_shape):

# Weights

self.W = self.add_weight("W", shape=[int(input_shape[-1]), self.num_outputs])

# bias

self.bias = self.add_weight("bias", shape=[self.num_outputs]) def call(self, input):

if self.activation_function == 'relu':

return relu(k_dot(k_dot(self.A, input), self.W) + self.bias)

else:

return sigmoid(k_dot(k_dot(self.A, input), self.W) + self.bias)

During the inititialization, you can require and save any useful parameter (eg. activation function, number of output neurons). In our example, we require also the adjacency matrix *A*. In the build method, the trainable weights of the layer are initialized. In the call method, the forward pass computation is declared.

As in the previous model, we define a network with parallel layers.

# Model 3: neural network with graph convolutional layertf_set_seed(1102)

np_seed(1102)m3_input_layer = Input(shape=input_shape)

m3_dense_layer = Dense(32, activation='relu')(m3_input_layer)

m3_gc_layer = GraphConv(16, A=A, activation='relu')(m3_input_layer)

m3_merged_layer = Concatenate()([m3_dense_layer, m3_gc_layer])

m3_final_layer = Dense(output_classes, activation='softmax')(m3_merged_layer)model3 = Model(inputs=m3_input_layer, outputs=m3_final_layer, name="Model_3")model3.save_weights("model3_initial_weights.h5")model3.summary()

plot_model(model3, 'model3.png', show_shapes=True)

It looks like the previous model but one layer is convolutional: intrinsict features of each instance are concatenated with the aggregated features computed from the neighbourhood.

Further attention should be paid when compiling this model. Since the convolutional layer requires the entire adjacency matrix, we need to pass the entire features matrix (labeled and unlabeled instances) but the model should be trained only on labeled instances. Therefore, we define a custom loss function where the sparse categorical cossentropy is computed only on the labeled instances. Additionally, we randomize the labels of unlabaled instances in order to be sure that they are not used during the training.

# Testing model 3tf_set_seed(1102)

np_seed(1102)acc_model3 = []for i in range(iterations):

mask = choice([True, False], size=data.shape[0], replace=True, p=[labeled_portion, 1-labeled_portion])

unlabeled_data_labels = labels[~mask]

# Randomize the labels of unlabeled instances

masked_labels = labels.copy()

masked_labels[~mask] = choice(range(7), size=sum(~mask), replace=True) model3.load_weights("model3_initial_weights.h5")

model3.compile(

optimizer='adam',

loss=lambda y_true, y_pred: SparseCategoricalCrossentropy(from_logits=False)(y_true[mask], y_pred[mask]),

metrics=['accuracy']

) model3.fit(data, masked_labels, epochs=epochs, batch_size=batch_size, shuffle=False, verbose=0) predictions = argmax(model3.predict(data, batch_size=batch_size), axis=1)

acc_model3.append(sum(predictions[~mask] == unlabeled_data_labels) / len(unlabeled_data_labels) * 100)print("nAverage accuracy on unlabeled set:", mean(acc_model3), "%")

This experiment produces an average accuracy of 63% (+3).

Interestingly, in this last experiment we are basically performing a **semi-supervised learning**** with graphCNN**: information from the unlabeled instances are used altogether with the labeled ones to build a **graph-based transductive model**.

The complete Jupyter Notebook containing the code can be found here.