1. Reinforcement learning (RL)
RL is a paradigm for learning by trial-and-error inspired by the way humans learn new tasks. In a typical RL setup, an agent is tasked with observing its current state in a digital environment and taking actions that maximise accrual of a long-term reward it has been set. The agent receives feedback from the environment as a result of each action such that it knows whether the action promoted or hindered its progress. An RL agent must therefore balance the exploration of its environment to find optimal strategies of accruing reward with exploiting the best strategy it has found to achieve the desired goal. This approach was made popular by Google DeepMind in their work on Atari games and Go. An example of RL working in the real world is the task of optimising energy efficiency for cooling Google data centers. Here, an RL system achieved a 40% reduction in cooling costs. An important native advantage of using RL agents in environments that can be simulated (e.g. video games) is that training data can be generated in troves and at very low cost. This is in stark contrast to supervised deep learning tasks that often require training data that is expensive and difficult to procure from the real world.
Applications: Multiple agents learning in their own instance of an environment with a shared model or by interacting and learning from one another in the same environment, learning to navigate 3D environments like mazes or city streets for autonomous driving, inverse reinforcement learning to recapitulate observed behaviours by learning the goal of a task (e.g. learning to drive or endowing non-player video game characters with human-like behaviours).
Companies using AI are : Google DeepMind, Prowler.io, Osaro, MicroPSI, Maluuba/Microsoft, NVIDIA, Mobileye, OpenAI.
2. Generative models
In contrast to discriminative models that are used for classification or regression tasks, generative models learn a probability distribution over training examples. By sampling from this high-dimensional distribution, generative models output new examples that are similar to the training data. This means, for example, that a generative model trained on real images of faces can output new synthetic images of similar faces. For more details on how these models work, see Ian Goodfellow’s awesome NIPS 2016 tutorial write up. The architecture he introduced, generative adversarial networks (GANs), are particularly hot right now in the research world because they offer a path towards unsupervised learning. With GANs, there are two neural networks: a generator, which takes random noise as input and is tasked with synthesising content (e.g. an image), and a discriminator, which has learned what real images look like and is tasked with identifying whether images created by the generator are real or fake. Adversarial training can be thought of as a game where the generator must iteratively learn how to create images from noise such that the discriminator can no longer distinguish generated images from real ones. This framework is being extended to many data modalities and task.
Applications: Simulate possible futures of a time-series (e.g. for planning tasks in reinforcement learning); super-resolution of images; recovering 3D structure from a 2D image; generalising from small labeled datasets; tasks where one input can yield multiple correct outputs (e.g. predicting the next frame in a vide0; creating natural language in conversational interfaces (e.g. bots); cryptography; semi-supervised learning when not all labels are available; artistic style transfer; synthesising music and voice; image in-painting.
Companies using G: Twitter Cortex, Adobe, Apple, Prisma, Jukedeck*, Creative.ai, Gluru*, Mapillary*, Unbabel.
3. Networks with memory
In order for AI systems to generalise in diverse real-world environments just as we do, they must be able to continually learn new tasks and remember how to perform all of them into the future. However, traditional neural networks are typically incapable of such sequential task learning without forgetting. This shortcoming is termed catastrophic forgetting. It occurs because the weights in a network that are important to solve for task A are changed when the network is subsequently trained to solve for task B.
There are, however, several powerful architectures that can endow neural networks with varying degrees of memory. These include long-short term memory networks (a recurrent neural network variant) that are capable of processing and predicting time series, DeepMind’s differentiable neural computer that combines neural networks and memory systems in order to learn from and navigate complex data structures on their own, the elastic weight consolidation algorithm that slows down learning on certain weights depending on how important they are to previously seen tasks, and progressive neural networks that learn lateral connections between task-specific models to extract useful features from previously learned networks for a new task.
Applications: Learning agents that can generalise to new environments; robotic arm control tasks; autonomous vehicles; time series prediction (e.g. financial markets, video, IoT); natural language understanding and next word prediction.
Companies: Google DeepMind, NNaisense (?), SwiftKey/Microsoft Research, Facebook AI Research.
4. Learning from less data and building smaller models
Deep learning models are notable for requiring enormous amounts of training data to reach state-of-the-art performance. For example, the ImageNet Large Scale Visual Recognition Challenge on which teams challenge their image recognition models, contains 1.2 million training images hand-labeled with 1000 object categories. Without large scale training data, deep learning models won’t converge on their optimal settings and won’t perform well on complex tasks such as speech recognition or machine translation. This data requirement only grows when a single neural network is used to solve a problem end-to-end; that is, taking raw audio recordings of speech as the input and outputting text transcriptions of the speech. This is in contrast to using multiple networks each providing intermediate representations (e.g. raw speech audio input ? phonemes ? words ? text transcript output; or raw pixels from a camera mapped directly to steering commands). If we want AI systems to solve tasks where training data is particularly challenging, costly, sensitive, or time-consuming to procure, it’s important to develop models that can learn optimal solutions from less examples (i.e. one or zero-shot learning). When training on small data sets, challenges include overfitting, difficulties in handling outliers, differences in the data distribution between training and test. An alternative approach is to improve learning of a new task by transferring knowledge a machine learning model acquired from a previous task using processes collectively referred to as transfer learning.
A related problem is building smaller deep learning architectures with state-of-the-art performance using a similar number or significantly less parameters. Advantages would include more efficient distributed training because data needs to be communicated between servers, less bandwidth to export a new model from the cloud to an edge device, and improved feasibility in deploying to hardware with limited memory.
Applications: Training shallow networks by learning to mimic the performance of deep networks originally trained on large labeled training data; architectures with fewer parameters but equivalent performance to deep models (e.g. SqueezeNet); machine translation.
Companies: Geometric Intelligence/Uber, DeepScale.ai, Microsoft Research, Curious AI Company, Google, Bloomsbury AI.
5. Hardware for training and inference
A major catalyst for progress in AI is the repurposing of graphics processing units (GPUs) for training large neural network models. Unlike central processing unit (CPUs) that compute in a sequential fashion, GPUs offer a massively parallel architecture that can handle multiple tasks concurrently. Given that neural networks must process enormous amounts of (often high dimensional data), training on GPUs is much faster than with CPUs. This is why GPUs have veritably become the shovels to the gold rush ever since the publication of AlexNet in 2012?—?the first neural network implemented on a GPU. NVIDIA continues to lead the charge into 2017, ahead of Intel, Qualcomm, AMD and more recently Google.