Additional Resources 

Groping and Experiment

Alexander Bain's concept of learning by “groping and experiment” is a pivotal aspect of his work in psychology and philosophy, particularly in his approach to understanding how individuals acquire knowledge and develop skills.

read more

Perceptron – the earliest learning neural network model

Inspired by the McCulloch-Pitts neuron, the perceptron was developed by Frank Rosenblatt in 1958. The perceptron is a type of artificial neuron that can learn to classify inputs into different categories by adjusting its weights based on error feedback. This learning capability marked a significant advancement from the fixed McCulloch-Pitts model, enabling the perceptron to adapt to new data.

read more

Conditioned Reflexes

The term “reinforcement” first appeared in an english translation of the work of Pavlov on conditioned reflexes. This introduction of the term was significant because it helped bridge Pavlov's pioneering research in Russia with the growing field of behaviorism in the English-speaking world.

read more

Perceptron – the earliest learning neural network model. ​

Inspired by the McCulloch-Pitts neuron, the perceptron was developed by Frank Rosenblatt in 1958. The perceptron is a type of artificial neuron that can learn to classify inputs into different categories by adjusting its weights based on error feedback. This learning capability marked a significant advancement from the fixed McCulloch-Pitts model, enabling the perceptron to adapt to new data.

read more


The Gemini paper by Google DeepMind introduces advanced multimodal AI models capable of processing text, images, audio, and video within a single framework.

read more

Human or Not

An online game inspired by the Turing test, measuring AI chatbots' ability to mimic humans and humans' ability to identify bots, attracted over 1.5 million users in a month.

read more


The LLaMA (Large Language Model Meta AI) paper outlines Meta AI's development of large language models that excel in various NLP tasks.

read more


Launched in Novemeber, 2022, it quickly became the fastest growing application in the human history, with 100 milion unique users in a record time. It also marks the beginning of the LLM race.

read more


An upgrade to GPT-2. The GPT-3 model, with 175 billion parameters, achieved state-of-the-art performance on many NLP tasks by leveraging few-shot, one-shot, and even zero-shot learning.

read more


An upgrade to GPT-1. It demonstrated that large-scale language models, such as GPT-2, can perform a variety of tasks without explicit supervision by leveraging vast amounts of text data for training.

read more

ImageNet challenge

The creation of a big dataset with images that allowed the machine learning community to shift their focus from data acqusition and its labelling to the development of algorithms.

read more

Le Net

An early instance of a successful gradient-based learning technique. It was also employed commercially for reading bank checks (several milion checks per day). It provides a detailed description of the used neural architecture, with step-by-step derivations.

read more

First version of dropout

The generalization of dropout, a popular optimization technique. It can be shown that dropout is a particular case of this method. However, it was not applied for optimization of neural networks.

read more

Boltzman Machines

Inspired by their construction in statistical physics, they were popularized in the cognitive sciences by Geoffrey Hinton, Terry Sejnowski and Yann Lecunn. Although their practical usability remained limited, under some conditions they remain useful.

read more


The GPT-4 Technical Report by OpenAI describes GPT-4 as a multimodal model that processes text and image inputs to generate text outputs.

read more


The authors proposed an architecture designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

read more


The authors demonstrate that large gains on tasks such as textual entailment, question answering, semantic similarity assessment etc. can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text.

read more


The Transformer model employs an attention mechanism to process entire sequences simultaneously, enabling better parallelization of computation.

read more


The authors extended the results presented in Highway networks paper to networks with 1000 layers. Until today, this is the most cited paper in machine learning.

read more

Highway networks

First successful training of very deep neural networks, a precursor for ResNet. By introducing gated shortcuts, highway networks enabled the flow of information across layers without degradation, allowing for the effective training of networks with unprecedented depth.

read more


The Seq2Seq paper by Sutskever, Vinyals, and Le introduced a model that maps sentences to vector representations and back using LSTM networks, setting a new standard for machine translation.

read more


Alex Krizhevsky et al. managed to significantly improve on the previous results on the ImageNet challenge through parallelization of the training of their network on multiple GPUs.

read more

Ising’s model

The first non-learning recurrent NN architecture (the Ising model or Lenz-Ising model) was introduced and analyzed by physicists Ernst Ising and Wilhelm Lenz in the 1920s. It settles into an equilibrium state in response to input conditions, and is the foundation of the first well-known learning RNNs.

read more

McCulloch-Pitts Neuron

The McCulloch-Pitts neuron, introduced in 1943, is one of the earliest formal models of a neuron, using binary outputs and weighted inputs to mimic neural processing and decision-making, and it laid the foundation for the development of artificial neural networks, logical function computation, and modern neural network architectures used in various applications such as pattern recognition and control systems.

read more

Dartmouth Workshop on AI

The Dartmouth Workshop on AI is considered the founding event of Artificial Intelligence as a field (that's also when the term AI was proposed). A group of around a dozen scientists made important contributions to the then-nascent field throughout this summer workshop.

read more

First Deep Learning (8 layer networks)

The use of neural networks with more than 2 layers was unthinkable due to the lack of training methods. Networks in the 60s and 70s were not trained with the current backpropagation method, and while it was possible to tune weights in 1-layer networks, it was prohibitevely difficult to do so in deeper ones.

read more

Adaptive Ising Model

Introduced a method for pattern recognition and sequence learning using self-organizing networks composed of threshold elements. This work laid the foundation for the development of self-organizing neural networks and unsupervised learning algorithms.

read more

Deep Blue Defeats Garry Kasparov

Chess was considered an activity that requires intelligence and where only humans can excel. The defeat of Garry Kasparov by Deep Blue showed that it is possible to make a highly specialised machine that can defeat the best of humans, even though by performing greedy tree search of possible moves.

read more


A shallow two-layer neural network that is trained to reconstruct linguistic contexts of words. It was the first word embedding capable of preserving the context: it turned out that, for example (France – Paris = Italy – Rome).

read more

The Turing Machine

The Turing machine, introduced by Alan Turing in 1936, is a theoretical model that formalizes the concept of computation, using an infinitely long tape and a set of rules to simulate any computer algorithm, thereby defining the limits and capabilities of what can be computed.

read more

Experimental Psychology

“Experimental Psychology” by Robert Sessions Woodworth and Harold Schlosberg, published in 1954, describe animal behavior using the terms “trial-and-error”, which first have been introduced by Conway Lloyd 1894.

read more

Linear regression

Linear regression is still a frequently used approximation for explaining linear dependancies between phenomenas. It is the first out-of-shelf method in, for example, econometrics. Linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables.

read more

First analysis of neuronal activity (non-learning)

Provided the first mathematical analysis of the neural activity which was proposed in 1943. S.C. Kleene explored how nerve nets and finite automata can represent events. This work significantly contributes to the fields of computer science and neuroscience by establishing foundational concepts in automata theory and neural networks.

read more

Stochastic Gradient Descent.

Stochastic Gradient Descent (SGD) updates model parameters using single data points or small batches, making it computationally more efficient and faster for large datasets compared to traditional gradient descent, and its inherent randomness aids in escaping local minima and improving generalization, with enhancements like mini-batch, momentum, and adaptive methods further optimizing its performance in applications such as deep learning, reinforcement learning, and online learning.

read more

Turing Proposes his Test for thinking

In this work, Alan Turing pondered the question whether machines can think. The novelty of his approach was to avoid a quicksand of thinking, and instead propose an equivalent 'test'. The idea was to include a human interrogator that would communicate with the interogees through a machine written text, and based on that interaction is supposed to judge who of the interogees is a man and who is a woman (in the original proposition). The same idea translates immediately to distinguishing between a human and a machine. Our approach in the Turing Game expands Turing's idea.

read more

Intelligent Machinery

Alan Turing, in his exploration of artificial intelligence and machine learning, introduced the concept of a system that operates based on principles similar to the “Law of Effect,” which he referred to as the “pleasure-pain system”.

read more

Gradient descent technique

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for finding a local minimum of a differentiable multivariate function.

read more