PyTorch Toolbox
Nets
Linear / MLP
- Initialization Parameters
in_features
out_features
bias=Ture
input.shape
:(*, in_features)
output.shape
:(*, out_features)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import torch
import torch.nn as nn
if __name__ == '__main__':
in_features = 64
out_features = 32
linear_layer = nn.Linear(in_features, out_features)
batch_size = 128
input = torch.rand(batch_size, in_features)
output = linear_layer(input)
print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
input.shape: torch.Size([128, 64])
ouput.shape: torch.Size([128, 32])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
import torch.nn as nn
if __name__ == '__main__':
in_features = 64
out_features = 32
linear_layer = nn.Linear(in_features, out_features)
batch_size1 = 256
batch_size2 = 128
input = torch.rand(batch_size1, batch_size2, in_features)
output = linear_layer(input)
print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
input.shape: torch.Size([256, 128, 64])
ouput.shape: torch.Size([256, 128, 32])
Convolutional Neural Network (CNN)
- Initialization Parameters
in_channels
/input_channels
out_channels
/n_filters
kernel_size
: 5 <=> (5,5)stride
: 5 <=> (5,5)padding=int((kernel - 1) / 2)
: padding for keeping the width and height of input unchanged: kernel=3, padding=1; kernel=5, padding= 2; …
input.shape
batch_size
input_channels
input_height
input_width
output.shape
batch_size
out_channels
/n_filters
input_height
input_width
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn as nn
if __name__ == '__main__':
input_channels = 26
n_filters = 3 # out_channels
kernel_size = 5
stride = 1
# padding for keeping the width and height of input unchanged: kernel = 3, padding = 1; kernel = 5, padding = 2; ...
conv_layer = nn.Conv2d(input_channels, n_filters, kernel_size, stride,
padding=int((kernel_size - 1) / 2))
batch_size = 128
map_height = 4
map_width = 13
input = torch.rand(batch_size, input_channels,
map_height, map_width)
output = conv_layer(input)
print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
input.shape: torch.Size([128, 26, 4, 13])
ouput.shape: torch.Size([128, 3, 4, 13])
Flatten
- Initialization Parameters
start_dim=1
end_dim=-1
input.shape
: $(*, S_{start-dim}, …, S_{end-dim}, *)$output.shape
: $(*, \prod_{i=start}^{end}(S_i), *)$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn as nn
if __name__ == '__main__':
batch_size = 256
input_channels = 4
input_width = 64
input_height = 128
flatten_layer = nn.Flatten()
input = torch.rand(batch_size, input_channels, input_width, input_height)
output = flatten_layer(input)
print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
flatten_layer = nn.Flatten(0, 2)
input = torch.rand(batch_size, input_channels, input_width, input_height)
output = flatten_layer(input)
print(f"input.shape: {input.shape}\nouput.shape: {output.shape}")
1
2
3
4
input.shape: torch.Size([256, 4, 64, 128])
ouput.shape: torch.Size([256, 32768])
input.shape: torch.Size([256, 4, 64, 128])
ouput.shape: torch.Size([65536, 128])
Dropout
Generated by ChatGPT-4
Dropout的原理:
Dropout是一种正则化技巧,用于防止神经网络过拟合。它在训练期间随机地”丢弃”或“关闭”一部分神经元,即将它们的输出设置为0。这样做可以减少神经元之间的相互依赖,从而鼓励每个神经元独立地学习特征。
具体来说,Dropout操作如下:
- 对于每个训练样本,在前向传播时,每个神经元都有概率
p
被设置为0。 - 在反向传播时,被设置为0的神经元不会更新其权重。
- 在测试或验证时,不使用dropout,但为了平衡因dropout导致的输出变化,我们将神经元的输出乘以
(1-p)
进行缩放。
PyTorch中的Dropout代码示例:
以下是使用PyTorch的nn.Dropout
模块的简单示例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn as nn
class SimpleNN(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(SimpleNN, self).__init__()
# 定义一个简单的三层神经网络,其中包含一个dropout层
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.dropout = nn.Dropout(p=0.5) # 设置dropout概率为0.5
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.dropout(x) # 在隐藏层后应用dropout
x = self.fc2(x)
return x
# 创建一个简单的模型实例
model = SimpleNN(input_dim=10, hidden_dim=20, output_dim=2)
input_tensor = torch.randn(5, 10) # 创建一个5x10的随机输入张量
output = model(input_tensor)
print(output)
Utils
Set Seed
1
2
3
4
5
6
7
8
9
10
11
def all_seed(env, seed=1):
env.seed(seed) # env config
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed) # config for CPU
torch.cuda.manual_seed(seed) # config for GPU
os.environ['PYTHONHASHSEED'] = str(seed) # config for python scripts
# config for cudnn
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False
Mask
As indices
1
2
3
4
5
import torch
data = torch.arange(5) # tensor([0, 1, 2, 3, 4])
mask = data <= 2 # tensor([ True, True, True, False, False]); any condition is ok
data[mask] = 0 # tensor([0, 0, 0, 3, 4])
Retain gradients
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import torch
data_shape = 5, 3
data = torch.arange(15, dtype=torch.float64).view(data_shape).requires_grad_(True)
mask = data <= 6 # any condition is ok
data_masked = data * mask
loss = data_masked.sum()
loss.backward()
grad1 = data_masked.grad
grad2 = data.grad
'''
data_masked:
tensor([[0., 1., 2.],
[3., 4., 5.],
[6., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], dtype=torch.float64, grad_fn=<MulBackward0>)
data_masked.grad: None
data.grad:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], dtype=torch.float64)
'''
Per-Sample Gradient
- $\mathrm{batch_size} = n$,
- $\boldsymbol x \to \mathrm{net}(\boldsymbol w) \to \boldsymbol y \to \boldsymbol L \to L_{scalar}$
- $\boldsymbol w \gets \boldsymbol w + \frac{\alpha}{n}\cdot \frac{\partial L_{scalar}}{L_i} \cdot \frac{L_i}{\partial \boldsymbol w}$
- Accomplishing it by
for
costs lots of time.
Hook
- PyTorch中,可以自己定一个hook函数,给nn.Module登记
- 登记完后,nn.Module在forward的时候会触发这个hook函数
- 也可以选择让其在backward的时候触发hook函数
- hook函数的参数是固定的:(module, grad_input, grad_output)
- hook函数被触发后,自动搜集当前触发状态下的这3个参数,因此可以用hook实现搜集一些中间量
- grad_input是反向传播的量对module的input的梯度
- $\frac{\partial L}{\partial w} = \sum\limits_i \frac{\partial L}{\partial L_i} \cdot\frac{\partial L_i}{\partial y_i}\cdot\frac{\partial y_i}{\partial w}$
- $\frac{\partial L}{\partial L_i} \cdot\frac{\partial L_i}{\partial y_i}=\mathrm{grad_output}$
Opacus
- 让PyTorch训练模型时能做差分隐私的一个库
- DP-SGD (Differentially-Private Stochastic Gradient Descent)
- 要让Loss对每个sample的grad都做一个clip,再加个噪声
- 所以要求per-sample gradient
- 他们也是用的hook来做的,但是是封装好了,可以直接用
vmap
- v = vectorization
- 新函数 =
vmap
(要做的批量操作的函数,输入的量按哪个维度作分割) - 批量操作的结果 = 新函数(批量的原函数的输入)
- 现在要批量求梯度,那么要给vmap传入个求梯度的函数
- vmap不支持autograd,但有函数代替
- 具体写在了22.9.14的实验进展里
Memo
Reshape v.s. Transpose
1
2
3
4
5
6
7
8
9
10
import torch
if __name__ == '__main__':
x = torch.tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])
shape0 = x.shape
x2 = x.reshape(shape0[1], shape0[0])
x3 = x.transpose(0, 1)
print(f"x2: {x2}\nx3: {x3}")
1
2
3
4
5
6
7
8
x2: tensor([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
x3: tensor([[1, 5],
[2, 6],
[3, 7],
[4, 8]])
to()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import torch
import torch.nn as nn
class mynet_class(nn.Module):
def __init__(self):
super().__init__()
self.conv_layer = nn.Sequential(
nn.Conv2d(27, 4, 6, 1, padding=2,), nn.ReLU(),)
self.action_mlp_layer = nn.Sequential(
nn.Linear(1, 32), nn.ReLU(),
nn.Linear(32, 32), nn.ReLU(),
)
# self.to(device, torch.double) # This works, too.
if __name__ == '__main__':
mynet = mynet_class()
conv_layer_dtype = next(mynet.conv_layer.parameters()).dtype
action_mlp_layer_dtype = next(mynet.action_mlp_layer.parameters()).dtype
print('conv_layer dtype:', conv_layer_dtype)
print('action_mlp_layer dtype:', action_mlp_layer_dtype)
mynet.to(torch.double)
conv_layer_dtype = next(mynet.conv_layer.parameters()).dtype
action_mlp_layer_dtype = next(mynet.action_mlp_layer.parameters()).dtype
print('conv_layer dtype:', conv_layer_dtype)
print('action_mlp_layer dtype:', action_mlp_layer_dtype)
# mynet.to(device='cpu:1')
# device = next(mynet.parameters()).device
# print(device)
print('haha')
1
2
3
4
5
conv_layer dtype: torch.float32
action_mlp_layer dtype: torch.float32
conv_layer dtype: torch.float64
action_mlp_layer dtype: torch.float64
haha
当你在继承nn.Module
的类中使用self.to(torch.double)
时,这个操作会把模块(包括它的所有参数和缓存)转换到指定的数据类型,这里是torch.double
。这种情况下,self.to(torch.double)
实际上是对模型内所有参数和缓存进行了就地(in-place)转换,因为nn.Module
的to
方法被设计为遍历模块内的所有参数和缓存,并将它们转换到指定的设备或数据类型。
这种行为与单个tensor的.to()
方法不同。在模块的上下文中,self.to(torch.double)
不仅仅返回一个转换后的副本,它实际上修改了模块本身内部的所有参数和缓存的数据类型。因此,在你的模型初始化过程中调用self.to(torch.double)
能够确保模型中的所有参数都被转换到torch.double
数据类型。
1
2
3
4
5
# a.dtype: torch.int64
a.to(torch.float64)
# a.dtype: torch.int64
a = a.to(torch.float64)
# a.dtype: torch.float64
This post is licensed under CC BY 4.0 by the author.