Post

PyTorch Toolbox

Nets

Linear / MLP

PyTorch Document - Linear

  • Initialization Parameters
    • in_features
    • out_features
    • bias=Ture
  • input.shape: (*, in_features)
  • output.shape: (*, out_features)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import torch
import torch.nn as nn


if __name__ == '__main__':
    in_features = 64
    out_features = 32

    linear_layer = nn.Linear(in_features, out_features)

    batch_size = 128
    input = torch.rand(batch_size, in_features)    
    output = linear_layer(input)

    print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
input.shape: torch.Size([128, 64])
ouput.shape: torch.Size([128, 32])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import torch.nn as nn


if __name__ == '__main__':
    in_features = 64
    out_features = 32

    linear_layer = nn.Linear(in_features, out_features)

    batch_size1 = 256
    batch_size2 = 128
    input = torch.rand(batch_size1, batch_size2, in_features)
    output = linear_layer(input)

    print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")

1
2
input.shape: torch.Size([256, 128, 64])
ouput.shape: torch.Size([256, 128, 32])

Convolutional Neural Network (CNN)

PyTorch Document - Conv2d

  • Initialization Parameters
    • in_channels / input_channels
    • out_channels / n_filters
    • kernel_size: 5 <=> (5,5)
    • stride: 5 <=> (5,5)
    • padding=int((kernel - 1) / 2): padding for keeping the width and height of input unchanged: kernel=3, padding=1; kernel=5, padding= 2; …
  • input.shape
    • batch_size
    • input_channels
    • input_height
    • input_width
  • output.shape
    • batch_size
    • out_channels / n_filters
    • input_height
    • input_width
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn as nn


if __name__ == '__main__':

    input_channels = 26
    n_filters      = 3  # out_channels
    kernel_size    = 5
    stride         = 1

    # padding for keeping the width and height of input unchanged: kernel = 3, padding = 1; kernel = 5, padding = 2; ...
    conv_layer = nn.Conv2d(input_channels, n_filters, kernel_size, stride,
                           padding=int((kernel_size - 1) / 2))

    batch_size = 128
    map_height = 4
    map_width  = 13

    input = torch.rand(batch_size, input_channels,
                       map_height, map_width)

    output = conv_layer(input)

    print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
input.shape: torch.Size([128, 26, 4, 13])
ouput.shape: torch.Size([128, 3, 4, 13])

Flatten

PyTorch Document - Flatten

  • Initialization Parameters
    • start_dim=1
    • end_dim=-1
  • input.shape: $(*, S_{start-dim}, …, S_{end-dim}, *)$
  • output.shape: $(*, \prod_{i=start}^{end}(S_i), *)$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn as nn


if __name__ == '__main__':
    batch_size = 256
    input_channels = 4
    input_width = 64
    input_height = 128

    flatten_layer = nn.Flatten()

    input = torch.rand(batch_size, input_channels, input_width, input_height)
    output = flatten_layer(input)

    print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")

    flatten_layer = nn.Flatten(0, 2)

    input = torch.rand(batch_size, input_channels, input_width, input_height)
    output = flatten_layer(input)

    print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
3
4
input.shape: torch.Size([256, 4, 64, 128])
ouput.shape: torch.Size([256, 32768])
input.shape: torch.Size([256, 4, 64, 128])
ouput.shape: torch.Size([65536, 128])

Dropout

Generated by ChatGPT-4

Dropout的原理

Dropout是一种正则化技巧,用于防止神经网络过拟合。它在训练期间随机地”丢弃”或“关闭”一部分神经元,即将它们的输出设置为0。这样做可以减少神经元之间的相互依赖,从而鼓励每个神经元独立地学习特征。

具体来说,Dropout操作如下:

  1. 对于每个训练样本,在前向传播时,每个神经元都有概率p被设置为0。
  2. 在反向传播时,被设置为0的神经元不会更新其权重。
  3. 在测试或验证时,不使用dropout,但为了平衡因dropout导致的输出变化,我们将神经元的输出乘以(1-p)进行缩放。

PyTorch中的Dropout代码示例

以下是使用PyTorch的nn.Dropout模块的简单示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleNN, self).__init__()
        
        # 定义一个简单的三层神经网络,其中包含一个dropout层
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.dropout = nn.Dropout(p=0.5)  # 设置dropout概率为0.5
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)  # 在隐藏层后应用dropout
        x = self.fc2(x)
        return x

# 创建一个简单的模型实例
model = SimpleNN(input_dim=10, hidden_dim=20, output_dim=2)
input_tensor = torch.randn(5, 10)  # 创建一个5x10的随机输入张量
output = model(input_tensor)
print(output)

Utils

Set Seed

1
2
3
4
5
6
7
8
9
10
11
def all_seed(env, seed=1):
    env.seed(seed)  # env config
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)  # config for CPU
    torch.cuda.manual_seed(seed)  # config for GPU
    os.environ['PYTHONHASHSEED'] = str(seed)  # config for python scripts
    # config for cudnn
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = False

Mask

As indices

1
2
3
4
5
import torch

data = torch.arange(5)  # tensor([0, 1, 2, 3, 4])
mask = data <= 2  # tensor([ True,  True,  True, False, False]); any condition is ok
data[mask] = 0  # tensor([0, 0, 0, 3, 4])

Retain gradients

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import torch

data_shape = 5, 3
data = torch.arange(15, dtype=torch.float64).view(data_shape).requires_grad_(True)

mask = data <= 6  # any condition is ok
data_masked = data * mask

loss = data_masked.sum()
loss.backward()
grad1 = data_masked.grad
grad2 = data.grad

'''
data_masked: 
tensor([[0., 1., 2.],
        [3., 4., 5.],
        [6., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64, grad_fn=<MulBackward0>)

data_masked.grad: None

data.grad:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)
'''

Per-Sample Gradient

  • $\mathrm{batch_size} = n$,
  • $\boldsymbol x \to \mathrm{net}(\boldsymbol w) \to \boldsymbol y \to \boldsymbol L \to L_{scalar}$
    • $\boldsymbol w \gets \boldsymbol w + \frac{\alpha}{n}\cdot \frac{\partial L_{scalar}}{L_i} \cdot \frac{L_i}{\partial \boldsymbol w}$
  • Accomplishing it by for costs lots of time.

Hook

  • PyTorch中,可以自己定一个hook函数,给nn.Module登记
    • 登记完后,nn.Module在forward的时候会触发这个hook函数
    • 也可以选择让其在backward的时候触发hook函数
  • hook函数的参数是固定的:(module, grad_input, grad_output)
    • hook函数被触发后,自动搜集当前触发状态下的这3个参数,因此可以用hook实现搜集一些中间量
    • grad_input是反向传播的量对module的input的梯度
  • $\frac{\partial L}{\partial w} = \sum\limits_i \frac{\partial L}{\partial L_i} \cdot\frac{\partial L_i}{\partial y_i}\cdot\frac{\partial y_i}{\partial w}$
    • $\frac{\partial L}{\partial L_i} \cdot\frac{\partial L_i}{\partial y_i}=\mathrm{grad_output}$

Opacus

  • 让PyTorch训练模型时能做差分隐私的一个库
  • DP-SGD (Differentially-Private Stochastic Gradient Descent)
    • 要让Loss对每个sample的grad都做一个clip,再加个噪声
    • 所以要求per-sample gradient
  • 他们也是用的hook来做的,但是是封装好了,可以直接用

vmap

  • v = vectorization
  • 新函数 = vmap(要做的批量操作的函数,输入的量按哪个维度作分割)
  • 批量操作的结果 = 新函数(批量的原函数的输入)
  • 现在要批量求梯度,那么要给vmap传入个求梯度的函数
  • vmap不支持autograd,但有函数代替
  • 具体写在了22.9.14的实验进展里

Memo

Reshape v.s. Transpose

1
2
3
4
5
6
7
8
9
10
import torch

if __name__ == '__main__':
    x = torch.tensor([[1, 2, 3, 4],
                      [5, 6, 7, 8]])

    shape0 = x.shape
    x2 = x.reshape(shape0[1], shape0[0])
    x3 = x.transpose(0, 1)
    print(f"x2: {x2}\nx3: {x3}")
1
2
3
4
5
6
7
8
x2: tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])
x3: tensor([[1, 5],
        [2, 6],
        [3, 7],
        [4, 8]])

to()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import torch
import torch.nn as nn


class mynet_class(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_layer = nn.Sequential(
            nn.Conv2d(27, 4, 6, 1, padding=2,), nn.ReLU(),)

        self.action_mlp_layer = nn.Sequential(
            nn.Linear(1, 32), nn.ReLU(),
            nn.Linear(32, 32), nn.ReLU(),
        )

        # self.to(device, torch.double) # This works, too.


if __name__ == '__main__':
    mynet = mynet_class()

    conv_layer_dtype = next(mynet.conv_layer.parameters()).dtype
    action_mlp_layer_dtype = next(mynet.action_mlp_layer.parameters()).dtype
    print('conv_layer dtype:', conv_layer_dtype)
    print('action_mlp_layer dtype:', action_mlp_layer_dtype)

    mynet.to(torch.double)

    conv_layer_dtype = next(mynet.conv_layer.parameters()).dtype
    action_mlp_layer_dtype = next(mynet.action_mlp_layer.parameters()).dtype
    print('conv_layer dtype:', conv_layer_dtype)
    print('action_mlp_layer dtype:', action_mlp_layer_dtype)

    # mynet.to(device='cpu:1')
    # device = next(mynet.parameters()).device
    # print(device)
    print('haha')
1
2
3
4
5
conv_layer dtype: torch.float32
action_mlp_layer dtype: torch.float32
conv_layer dtype: torch.float64
action_mlp_layer dtype: torch.float64
haha

当你在继承nn.Module的类中使用self.to(torch.double)时,这个操作会把模块(包括它的所有参数和缓存)转换到指定的数据类型,这里是torch.double。这种情况下,self.to(torch.double)实际上是对模型内所有参数和缓存进行了就地(in-place)转换,因为nn.Moduleto方法被设计为遍历模块内的所有参数和缓存,并将它们转换到指定的设备或数据类型。

这种行为与单个tensor的.to()方法不同。在模块的上下文中,self.to(torch.double)不仅仅返回一个转换后的副本,它实际上修改了模块本身内部的所有参数和缓存的数据类型。因此,在你的模型初始化过程中调用self.to(torch.double)能够确保模型中的所有参数都被转换到torch.double数据类型。

1
2
3
4
5
# a.dtype: torch.int64
a.to(torch.float64)
# a.dtype: torch.int64
a = a.to(torch.float64)
# a.dtype: torch.float64
This post is licensed under CC BY 4.0 by the author.