FortisCK Blog

记录研究生活

李宏毅ML2022-HW1:COVID_19_Cases_Prediction

李宏毅ML2022-HW1:COVID 19 Cases Prediction

作业内容

  1. 目标
  • Solve a regression problem with deep neural networks (DNN).
  • Understand basic DNN training tips.
  • Familiarize yourself with PyTorch.
  1. 任务描述
    Given survey results in the past 5 days in a specific state in U.S., then predict the percentage of new tested positive cases in the 5 th day.
    具体内容:这里

    改进方案

  2. 选择更有效的特征
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import f_regression

    features = pd.read_csv('./covid.train.csv')
    x_data, y_data = features.iloc[:, 0:117], features.iloc[:, 117]

    #try choose your k best features
    k = 24
    selector = SelectKBest(score_func=f_regression, k=k)
    result = selector.fit(x_data, y_data)

    #result.scores_ inclues scores for each features
    #np.argsort sort scores in ascending order by index, we reverse it to make it descending.
    idx = np.argsort(result.scores_)[::-1]

    selected_idx = list(np.sort(idx[:k]))
  3. 修改模型
    class My_Model(nn.Module):
    def __init__(self, input_dim):
    super(My_Model, self).__init__()
    # TODO: modify model's structure, be aware of dimensions.
    #input_dim is the num of features we selected
    self.layers = nn.Sequential(
    nn.Linear(input_dim, 64),
    nn.LeakyReLU(0.2),
    nn.BatchNorm1d(64),
    nn.Dropout(0.2),

    nn.Linear(64, 16),
    nn.LeakyReLU(0.2),
    #nn.BatchNorm1d(10),
    nn.Dropout(0.1),

    nn.Linear(16, 1)
    )

    def forward(self, x):
    x = self.layers(x)
    x = x.squeeze(1) # (B, 1) -> (B)
    return x

torch.nn.BatchNorm1d()是为了保持深度神经网络训练过程中每一层神经网络的输入同分布的方法。训练深度网络的时候经常发生训练困难的问题:因为,每一次参数迭代更新后,上一层网络的输出数据经过这一层网络计算后,数据的分布会发生变化,为下一层网络的学习带来困难,这被称为Internal Covariate Shift。为了解决Internal Covariate Shift,我们使用Batch Normalization。
将SGD优化器换成Adam

结果

结果

相关资料

训练集:这里
测试集:这里