python——即使使用相同的数据，训练损失也远低于测试损失

大佬教我写程序 lv.1

发布时间：2022-05-13 08:47:52 297

相关标签： # node.js

我使用相同的数据进行培训和测试（这不是最佳实践），理论上损失应该完全相同。然而，在训练时，我的损失通常在1e-07左右，而在测试期间，它实际上是0.05。

理想情况下，训练和测试之间的训练损失应该非常相似，但它们却非常不同。

以下是我的训练循环：

    losses = []
    try:
        for epoch in range(args.epochs):
            for (index, img) in trainDataloader:
                for i, image in enumerate(img):
                    # circlenet.zero_grad()
                    optimizer.zero_grad()
                    output = circlenet(image)
                    # print(index[i])
                    loss = nn.functional.mse_loss(output, torch.tensor(
                        index[i]
                    , device=device, dtype=torch.float))
                    loss = loss.to(device)
                    loss.backward()
                    optimizer.step()

            if epoch % 50 == 0:
                if args.cuda:
                    GPUs = GPUtil.getGPUs()
                    print(GPUs[0].temperature, "C")
            
            if epoch % args.saveevery == 0:
                circlenet.cpu()
                torch.save({"model": circlenet.state_dict(), "optimizer": optimizer.state_dict()}, f"{args.save_dir}/weights.pth")
                circlenet.to(device)

            losses.append(loss.item())
            print(f"Epoch: {epoch + 1: <6} Loss: {loss.item()}") 
    except KeyboardInterrupt:
        torch.save({"model": circlenet.state_dict(), "optimizer": optimizer.state_dict()}, f"{args.save_dir}/weights.pth")
    import matplotlib.pyplot as plt
    plt.plot(losses)
    plt.show()

以下是我的测试方法：

    rand = random.randint(0, len(os.listdir("data/imgs/")) - 1)
    import cv2
    # use the network
    circlenet.eval()
    img = (PIL.Image.open(f"data/imgs/img{rand}.jpg"))
    img = transforms.functional.pil_to_tensor(img).to(device)
    img = img.type(torch.FloatTensor)
    img = img.to(device)
    with torch.no_grad():
        out = circlenet(img)
    out = out.cpu().numpy()
    out = out.tolist()
    imgcv = cv2.imread(f"data/imgs/img{rand}.jpg")
    print("Output: ", out)
    print(rand)
    # remove first and last characters
    ans = data[rand - 1] 
    print("Answer: ", ans)
    loss = nn.functional.mse_loss(torch.tensor(out, dtype=torch.float, device=device), torch.tensor(ans, dtype=torch.float, device=device))
    print("Loss: ", loss.item())
    cv2.circle(imgcv, (round(ans[1] * 256), round(ans[2] * 144)), 2, (255, 255, 0), 2) # answer
    color = (0, 255, 0) if round(out[0]) == 1 else (0, 0, 255)
    cv2.circle(imgcv, (round(out[1] * 256), round(out[2] * 144)), 4, color, 2)
    imgcv = cv2.resize(imgcv, (480, 270))
    cv2.imshow("output", imgcv)
    cv2.waitKey(0)

培训的一些产出：

Epoch: 94     Loss: 7.115558560144564e-07
Epoch: 95     Loss: 5.9022491768701e-05
Epoch: 96     Loss: 2.5865596398944035e-05
Epoch: 97     Loss: 9.173281227958796e-07
Epoch: 98     Loss: 8.050536962400656e-06
Epoch: 99     Loss: 8.39896165416576e-06
Epoch: 100    Loss: 7.107677788553701e-07

测试输出：

You are running on device: NVIDIA GeForce RTX 3050 Ti Laptop GPU
Current statistics:
| ID | GPU | MEM |
------------------
|  0 | 40% | 12% |
55.0 C
Output:  [0.9986587166786194, 0.6712906360626221, 0.6456944346427917]
870
Answer:  [1.0, 0.3328125, 0.8268518518518518]
Loss:  0.04912909120321274

特别声明：以上内容（图片及文字）均为互联网收集或者用户上传发布，本站仅提供信息存储服务！如有侵权或有涉及法律问题请联系我们。