【全球新视野】从零开始学Pytorch(四)softmax及其实现
发布时间:2023-01-06 11:50:20 来源:

softmax的基本概念


(资料图)

分类问题

softmax函数主要是用于分类问题,一般在全连接层后面使用。

权重矢量

因此softmax运算不改变预测类别输出。softmax回归对样本分类的矢量计算表达式为

交叉熵损失函数

模型训练与预测

获取Fashion-MNIST训练集和读取数据

图像分类数据集中最常用的是手写数字识别数据集MNIST[1]。但大部分模型在MNIST上的分类精度都超过了95%。为了更直观地观察算法之间的差异,我们将使用一个图像内容更加复杂的数据集Fashion-MNIST[2]。

我这里我们会使用torchvision包,主要用来构建计算机视觉模型。torchvision主要由以下几部分构成:

torchvision.datasets:一些加载数据的函数及常用的数据集接口;

torchvision.models:包含常用的模型结构(含预训练模型),例如AlexNet、VGG、ResNet等;

torchvision.transforms:常用的图片变换,例如裁剪、旋转等;

torchvision.utils:其他的一些有用的方法。

%matplotlibinlinefromIPythonimportdisplayimportmatplotlib.pyplotaspltimporttorchimporttorchvisionimporttorchvision.transformsastransformsimporttimeimportsyssys.path.append("/home/input")importd2lzh1981asd2l#获取数据mnist_train=torchvision.datasets.FashionMNIST(root="/home/input/FashionMNIST2065",train=True,download=True,transform=transforms.ToTensor)mnist_test=torchvision.datasets.FashionMNIST(root="/home/input/FashionMNIST2065",train=False,download=True,transform=transforms.ToTensor)

classtorchvision.datasets.FashionMNIST(root,train=True,transform=None,target_transform=None,download=False)

root(string)–数据集的根目录,其中存放processed/training.pt和processed/test.pt文件。

train(bool,可选)–如果设置为True,从training.pt创建数据集,否则从test.pt创建。

download(bool,可选)–如果设置为True,从互联网下载数据并放到root文件夹下。如果root目录下已经存在数据,不会再次下载。

transform(可被调用,可选)–一种函数或变换,输入PIL图片,返回变换之后的数据。如:transforms.RandomCrop。

target_transform(可被调用,可选)–一种函数或变换,输入目标,进行变换。

#显示结果print(type(mnist_train))print(len(mnist_train),len(mnist_test))

输出:6000010000

#我们可以通过下标来访问任意一个样本feature,label=mnist_train[0]print(feature.shape,label)#ChannelxHeightxWidth输出torch.Size([1,28,28])9mnist_PIL=torchvision.datasets.FashionMNIST(root="/home/kesci/input/FashionMNIST2065",train=True,download=True)PIL_feature,label=mnist_PIL[0]#本函数已保存在d2lzh包中方便以后使用defget_fashion_mnist_labels(labels):text_labels=["t-shirt","trouser","pullover","dress","coat","sandal","shirt","sneaker","bag","ankleboot"]return[text_labels[int(i)]foriinlabels]defshow_fashion_mnist(images,labels):d2l.use_svg_display#这里的_表示我们忽略(不使用)的变量_,figs=plt.subplots(1,len(images),figsize=(12,12))forf,img,lblinzip(figs,images,labels):f.imshow(img.view((28,28)).numpy)f.set_title(lbl)f.axes.get_xaxis.set_visible(False)f.axes.get_yaxis.set_visible(False)plt.showX,y=[],[]foriinrange(10):X.append(mnist_train[i][0])#将第i个feature加到X中y.append(mnist_train[i][1])#将第i个label加到y中show_fashion_mnist(X,get_fashion_mnist_labels(y))

输出:

#读取数据batch_size=256num_workers=4train_iter=torch.utils.data.DataLoader(mnist_train,batch_size=batch_size,shuffle=True,num_workers=num_workers)test_iter=torch.utils.data.DataLoader(mnist_test,batch_size=batch_size,shuffle=False,num_workers=num_workers)

softmax从零开始的实现

importtorchimporttorchvisionimportnumpyasnpimportsyssys.path.append("/home/kesci/input")importd2lzh1981asd2l#获取数据batch_size=256train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,root="/home/input/FashionMNIST2065")#模型参数初始化num_inputs=784num_outputs=10W=torch.tensor(np.random.normal(0,0.01,(num_inputs,num_outputs)),dtype=torch.float)b=torch.zeros(num_outputs,dtype=torch.float)W.requires_grad_(requires_grad=True)b.requires_grad_(requires_grad=True)

输出:tensor([0.,0.,0.,0.,0.,0.,0.,0.,0.,0.],requires_grad=True)

对多维Tensor按维度操作

X=torch.tensor([[1,2,3],[4,5,6]])print(X.sum(dim=0,keepdim=True))#dim为0,按照相同的列求和,并在结果中保留列特征print(X.sum(dim=1,keepdim=True))#dim为1,按照相同的行求和,并在结果中保留行特征print(X.sum(dim=0,keepdim=False))#dim为0,按照相同的列求和,不在结果中保留列特征print(X.sum(dim=1,keepdim=False))#dim为1,按照相同的行求和,不在结果中保留行特征

输出:tensor([[5,7,9]])tensor([[6],[15]])tensor([5,7,9])tensor([6,15])

定义softmax操作

defsoftmax(X):X_exp=X.exppartition=X_exp.sum(dim=1,keepdim=True)#print("Xsizeis",X_exp.size)#print("partitionsizeis",partition,partition.size)returnX_exp/partition#这里应用了广播机制X=torch.rand((2,5))X_prob=softmax(X)print(X_prob,"\n",X_prob.sum(dim=1))

输出:tensor([[0.2767,0.1386,0.1364,0.1738,0.2746],[0.1855,0.1690,0.1513,0.3168,0.1774]])tensor([1.0000,1.0000])

softmax回归模型

defnet(X):returnsoftmax(torch.mm(X.view((-1,num_inputs)),W)+b)

定义损失函数

y_hat=torch.tensor([[0.1,0.3,0.6],[0.3,0.2,0.5]])y=torch.LongTensor([0,2])y_hat.gather(1,y.view(-1,1))#1表示按行相加defcross_entropy(y_hat,y):return-torch.log(y_hat.gather(1,y.view(-1,1)))

定义准确率

defaccuracy(y_hat,y):return(y_hat.argmax(dim=1)==y).float.mean.item#y_hat按行取最大的值与y比较#本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进:它的完整实现将在“图像增广”一节中描述defevaluate_accuracy(data_iter,net):#data_iter是取数据的,net是网络acc_sum,n=0.0,0forX,yindata_iter:acc_sum+=(net(X).argmax(dim=1)==y).float.sum.itemn+=y.shape[0]returnacc_sum/n

训练模型

num_epochs,lr=5,0.1#本函数已保存在d2lzh_pytorch包中方便以后使用deftrain_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params=None,lr=None,optimizer=None):forepochinrange(num_epochs):train_l_sum,train_acc_sum,n=0.0,0.0,0forX,yintrain_iter:y_hat=net(X)l=loss(y_hat,y).sum#梯度清零ifoptimizerisnotNone:optimizer.zero_gradelifparamsisnotNoneandparams[0].gradisnotNone:forparaminparams:param.grad.data.zero_l.backwardifoptimizerisNone:d2l.sgd(params,lr,batch_size)else:optimizer.steptrain_l_sum+=l.itemtrain_acc_sum+=(y_hat.argmax(dim=1)==y).sum.itemn+=y.shape[0]test_acc=evaluate_accuracy(test_iter,net)print("epoch%d,loss%.4f,trainacc%.3f,testacc%.3f"%(epoch+1,train_l_sum/n,train_acc_sum/n,test_acc))train_ch3(net,train_iter,test_iter,cross_entropy,num_epochs,batch_size,[W,b],lr)

模型预测

X,y=iter(test_iter).nexttrue_labels=d2l.get_fashion_mnist_labels(y.numpy)pred_labels=d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy)titles=[true+"\n"+predfortrue,predinzip(true_labels,pred_labels)]d2l.show_fashion_mnist(X[0:9],titles[0:9])

softmax的简洁实现

#加载各种包或者模块importtorchfromtorchimportnnfromtorch.nnimportinitimportnumpyasnpimportsyssys.path.append("/home/input")importd2lzh1981asd2l#初始化batch_size=256train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,root="/home/input/FashionMNIST2065")#定义网络模型num_inputs=784num_outputs=10classLinearNet(nn.Module):def__init__(self,num_inputs,num_outputs):super(LinearNet,self).__init__self.linear=nn.Linear(num_inputs,num_outputs)defforward(self,x):#x的形状:(batch,1,28,28)y=self.linear(x.view(x.shape[0],-1))returny#net=LinearNet(num_inputs,num_outputs)classFlattenLayer(nn.Module):def__init__(self):super(FlattenLayer,self).__init__defforward(self,x):#x的形状:(batch,*,*,...)returnx.view(x.shape[0],-1)fromcollectionsimportOrderedDictnet=nn.Sequential(#FlattenLayer,#LinearNet(num_inputs,num_outputs)OrderedDict([("flatten",FlattenLayer),("linear",nn.Linear(num_inputs,num_outputs))])#或者写成我们自己定义的LinearNet(num_inputs,num_outputs)也可以)#初始化模型参数init.normal_(net.linear.weight,mean=0,std=0.01)init.constant_(net.linear.bias,val=0)loss=nn.CrossEntropyLossoptimizer=torch.optim.SGD(net.parameters,lr=0.1)num_epochs=5d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,None,None,optimizer)

参考文献

[1]《动手深度学习》李沐

[2]伯禹教育课程

上一篇:

下一篇: