Recommended papers

  • Learning rate - pretty much solved the issue of adapting learning rate
  • Bag of tricks - Interesting paper about CNNs, which are the bulding block of resnets and other architectures used in vision
  • Attention is all you need - A very important paper about the self attention mechanism which is at the base of the transformers architecture