图注意力网络(GAT)

@TOC

一、INTRODUCTION

图注意力网络(GAT)Graph attention network缩写为GAT，若按照首字母大写，会与对抗生成网络GAN混淆。所以后面GAT即本文的图注意力网络。
论文地址 https://arxiv.org/abs/1710.10903

代码地址： https://github.com/Diego999/pyGAT

1.1 相关研究

Semi-Supervised Classification with Graph Convolutional Networks，ICLR 2017，图卷积网络
Graph Attention Networks，ICLR 2018. 图注意力网络
Relational Graph Attention Networks ，ICLR2019 关联性图注意力网络，整合了GCN+Attention+Relational

1.2. attention 引入目的

为每个节点分配不同权重
关注那些作用比较大的节点，而忽视一些作用较小的节点
在处理局部信息的时候同时能够关注整体的信息，不是用来给参与计算的各个节点进行加权的，而是表示一个全局的信息并参与计算

2 GAT ARCHITECTURE

本文作者提出GATs方法，利用一个隐藏的self-attention层，来处理一些图卷积中的问题。不需要复杂的矩阵运算或者对图结构的事先了解，通过叠加self-attention层，在卷积过程中将不同的重要性分配给邻域内的不同节点，同时处理不同大小的邻域。作者分别设计了inductive setting和transductive setting的任务实验，GATs模型在基线数据集Cora、Citeseer、Pubmed citation和PPI数据集上取得了state-of-the-art的结果。

2.1 Graph Attentional Layer

和所有的attention mechanism一样，GAT的计算也分为两步:计算注意力系数（attention coefficient）和加权求和（aggregate）
在这里插入图片描述

2.2. 计算相互关注

在这里插入图片描述

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

def forward(self, x):
# [B_batch,N_nodes,C_channels]
B, N, C = x.size()
# h = torch.bmm(x, self.W.expand(B, self.in_features, self.out_features)) # [B,N,C]
h = torch.matmul(x, self.W) # [B,N,C]
a_input = torch.cat([h.repeat(1, 1, N).view(B, N * N, C), h.repeat(1, N, 1)], dim=2).view(B, N, N,
2 * self.out_features) # [B,N,N,2C]
# temp = self.a.expand(B, self.out_features * 2, 1)
# temp2 = torch.matmul(a_input, self.a)
attention = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(3)) # [B,N,N]

attention = F.softmax(attention, dim=2) # [B,N,N]
attention = F.dropout(attention, self.dropout, training=self.training)
h_prime = torch.bmm(attention, h) # [B,N,N]*[B,N,C]-> [B,N,C]
out = F.elu(h_prime + self.beta * h)
return out

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

import torch
import torch.nn as nn
import torch.nn.functional as F

class GATLayer(nn.Module):
def __init__(self, g, in_dim, out_dim):
super(GATLayer, self).__init__()
self.g = g
# equation (1)
self.fc = nn.Linear(in_dim, out_dim, bias=False)
# equation (2)
self.attn_fc = nn.Linear(2 * out_dim, 1, bias=False)

def edge_attention(self, edges):
# edge UDF for equation (2)
z2 = torch.cat([edges.src['z'], edges.dst['z']], dim=1)
a = self.attn_fc(z2)
return {'e': F.leaky_relu(a)}

def message_func(self, edges):
# message UDF for equation (3) & (4)
return {'z': edges.src['z'], 'e': edges.data['e']}

def reduce_func(self, nodes):
# reduce UDF for equation (3) & (4)
# equation (3)
alpha = F.softmax(nodes.mailbox['e'], dim=1)
# equation (4)
h = torch.sum(alpha * nodes.mailbox['z'], dim=1)
return {'h': h}

def forward(self, h):
# equation (1)
z = self.fc(h)
self.g.ndata['z'] = z
# equation (2)
self.g.apply_edges(self.edge_attention)
# equation (3) & (4)
self.g.update_all(self.message_func, self.reduce_func)
return self.g.ndata.pop('h')

加权求和（aggregate）

在这里插入图片描述

2.3.multi-head attention机制

在这里插入图片描述

Conclusion

本文提出了一种基于self-attention的图模型。总的来说，GAT的特点主要有以下两点：

与GCN类似，GAT同样是一种局部网络。因此，（相比于GNN或GGNN等网络）训练GAT模型无需了解整个图结构，只需知道每个节点的邻节点即可。

在这里插入图片描述

3 利用相似度

在这里插入图片描述
论文：Attention-based Graph Neural Network for semi-supervised learning

代码：dawnranger/pytorch-AGNN

参考：
1、DGL博客 | 深入理解图注意力机制