BERT

The authors of the BERT (Bidirectional Encoder Representations from Transformers) model proposed an innovative architecture designed to pretrain deep bidirectional representations from unlabeled text. This was achieved by jointly conditioning on both left and right...

GPT-1

The authors of the GPT-1 paper demonstrated that substantial improvements in various natural language processing tasks, such as textual entailment, question answering, and semantic similarity assessment, can be achieved through a two-stage process. First, they...

Transformer

The seminal Transformer paper introduced a groundbreaking neural network architecture specifically designed for processing sequential data, such as natural language. Unlike traditional recurrent neural networks (RNNs) and their variants, which handle input sequences...

ResNet

Addressed the degradation problem in deep neural networks through residual learning with shortcut connections. This approach enables the training of extremely deep networks by allowing gradients to bypass layers, facilitating better gradient flow and improving...

Highway networks

Highway networks represent a significant advancement in the training of very deep neural networks, serving as a precursor to the widely successful ResNet. By introducing gated shortcuts, highway networks enabled the flow of information across layers without...