分类理论下的文章

论神经网络的光谱偏差

作者: wyli
时间: 2024-07-18
分类: 理论,AI理论
413 次阅读
评论

深度神经网络对自然数据泛化上的成功与经典的模型复杂性概念不一致，且实验表明可拟合任意随机数据。论文On the Spectral Bias of Neural Networks通过傅立叶分析，研究深度神经网络的表达性，发现深度神经网络倾向于学习低频函数，也即是函数全局的变化无局部浮动。该特性与过参数化网络优先学习简单模式而泛化性强的特点一致。这种现象被称为场域偏差，不仅仅表现在学习过程，也表现在模型的参数化。

- 阅读剩余部分 -

SigLIP：语言图片预训练的Sigmoid损失

作者: wyli
时间: 2024-07-17
分类: 理论,AI理论
2392 次阅读
评论

CLIP中对比学习损失，需要计算两次softmax标准化。同时，softmax的实现数值不稳定，通常在softmax计算之前减去最大输入，从而稳定化计算。与之不同，sigmoid损失属于对称的，只需要在图片-文本对上计算，不需要计算所有对的相似度用于标准化。若把该损失函数与CLIP相结合，那么模型被称为SigLIP。与LiT相结合，只需要利用4张TPUv4芯片，训练SigLiP模型两天可在ImageNet上实现84.5%的零样本准确率。同时，这种batch size与损失的解耦合，从而可使作者们研究正负样本比例的影响，即batch size对性能的影响。

- 阅读剩余部分 -

The Bitter Lesson

作者: wyli
时间: 2024-06-13
分类: 理论,AI理论
443 次阅读
评论

Rich Sutton
March 13, 2019

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available.

- 阅读剩余部分 -