训练量对结果的影响.docx
调整训练集占数据集的占比;1%时,即训练样本数为7;测试样本数为687;其中AB504,NB190个预测AB409;出错95个(即将AB中的95个错认为NB)预测NB163;出错27个(即将NB中的27个错认为AB)准确度:83.47%* 测试20次随机选取训练样本0,842,0.524,0.742,0.505,0.841,0.896,0.5,0.746,0,516,0.85,0,841,0,846,0,827,0,874,0.875,0.84,0.49,0,507,0.893,0.857平均准确度:74.6%可以明显观察到,在低样本数时,模型稳定性较差,有时选取的训练图片可能毫无“代表性”,训练预测结果将在50%附近,基本在乱猜;而有时可能随机选取的样本能够很好的代表整体,训练预测结果将接近90机2%时,即训练样本数为13;测试样本数为687;其中AB499lNB188个预测AB476;出错23个(即将AB中的23个错认为NB)预测NB161;出错27个(即将NB中的27个错认为AB)准确度:90.51%* 测试20次随机选取训练样本0,858,0.908,0.896,0.91,0.897,0.913,0.897,0.87,0,854,0,909,0,909,0,943,0.83,0,896,0.928,0,887,0,906,0.892,0,912,0.899平均准确度:89.57%当训练集样本数为13时,不难发现,仅仅多增加了6个样本,就使得训练的模型稳定性大幅提高,也说明数据集多样性并不复杂,能够用很少的样本“代表”整体,得到较好的结论。5%时,即训练样本数为35;测试样本数为665;其中AB484,NB182个预测AB481;出错3个(即将AB中的3个错认为NB)预测NB149;出错33个(即将NB中的33个错认为AB)准确度:90.62%* 测试20次随机选取训练样本0.96,0.895,0,925,0.866,0.921,0.92,0.907,0.92,0,934,0.937,0,935,0,957,0.921,0.94,0.936,0,904,0,899,0.957,0,916,0,925平均准确度:92.38%10%时,即训练样本数为70;测试样本数为630;其中AB459,NB172个预测AB440;出错19个(即将AB中的19个错认为NB)预测NB162;出错10个(即将NB中的10个错认为AB)准确度:95.02%*测试20次随机选取训练样本0.967l0.92610.964,0.967,0.948,0.942,0.942,0.959,0.961,0.949,0.952,0.946,0.929,0.949,0.937,0,942,0,961,0.93,0.944,0,939平均准确度:94.77%20%时,即训练样本数为140;测试样本数为560;其中AB4081NB153个预测AB401;出错7个(即将AB中的7个错认为NB)预测NB150;出错3个(即将NB中的10个错认为AB)准确度:98.16%0.973,0,974,0.944,0.945,0.962,0.957,0.955,0.978,0,953,0,969,0,973,0,971,0.947,0.954,0.944,0,956,0.98,0,987,0.962,0,972平均准确度:96.27%30%时,即训练样本数为210;测试样本数为490;其中AB357,NB134个预测AB349;出错8个(即将AB中的8个错认为NB)预测NB131;出错3个(即将NB中的3个错认为AB)准确度:97.76%0.96,0.974,0,983,0.985,0.981,0,958,0,989,0.975,0.981,0,954,0,972,0.971,0.968,0,979,0.944,0,947,0,985,0.947,0,972,0,969平均准确度:96.97%50%时,即训练样本数为350;测试样本数为350;其中AB255,NB96个预测AB255;出错0个(即将AB中的。个错认为NB)预测NB95;出错1个(即将NB中的1个错认为AB)准确度:99.48%0.986,0,989,0.996,0.989,0.984,0.978,0.973,0.989,0,976,0,986,0,991,0,968,0.969,0.989,0.991,0,984,0,977,0.98,0.97,0.988平均准确度:98.26%