Introduction to Generative Adversarial Networks (GAN) and how they evolved.

The Generative Adversarial Networks could be one of the most powerful algorithms in AI. The emergence of GAN, the AI technique that makes computers creative has been called one of the most significant successes in the recent development of AI, which could make AI application more creative and powerful. The idea of pitting two algorithms against each other originated with Arthur Samuel, a prominent researcher in the field of computer science who is credited for popularising the term, “Machine Learning.” When he was at IBM, he devised a checkers game called Samuel Checkers, a playing program which was among the first to successfully self-learn, in part by estimating the chance of each side’s victory at a given position. However, In a seminal 2014 research paper titled “Generative Adversarial Networks,” Ian Goodfellow and colleagues described the first-ever working implementation of a generative model based on adversarial networks. The technique has sparked tremendous excitement in the field of machine learning and turned its creator into an AI celebrity.

In the past few years, AI researchers have made significant progress using the technique of deep learning. When supplied a deep learning system with sufficient images, the machine learns to, recognize a pedestrian who is about to cross a road. This approach has made it possible, for self-driving cars and the conversational technology that powers the likes of Siri, Echo, Alexa and other virtual assistants. Whereas deep learning AIs can learn to recognize things, but they have not been good at creating them. The goal of GAN is to provide the machines similar to the imagination. However, doing so wouldn’t merely enable them to draw pretty pictures or compose music. It will make them less reliant on humans to instruct them about the world and the way it functions. 

Today, AI programmers often need to instruct a machine precisely what is in the training data which is being fed – which of a million pictures contain a pedestrian crossing a road, and which don’t. This method is not only expensive and labour intensive, but it limits how well the system deals with even slight deviation from what it was trained on. In the future computers will become better at analyzing the raw-data and working out what they need to learn from the data without being told. This will mark a leapfrogging in what is known in Artificial Intelligence as “Unsupervised Learning.” A self-driving car can teach itself about various road conditions without leaving the garage. A robot will be able to anticipate the obstacles it might encounter in a busy warehouse without needing to be taken around it. 

A novel approach for pitting algorithms against each other, called generative adversarial networks, or GANs, is showing promise for improving AI accuracy and for automatically generating objects that typically require human creativity. A Generative adversarial network (GAN) is a machine learning (ML) model in which two neural networks compete with each other to become more accurate in their predictions. The two neural networks that make up a GAN are referred to as the generator and the discriminator. The generator is a convolutional neural network, and the discriminator is a deconvolutional neural network. The objective of the generator is to artificially produce outputs that could be easily mistaken for real data. The goal of the discriminator is to identify the ones from the output it receives have been artificially developed. Essentially, GANs create their own training datasets, as the feedback loop between the adversarial networks continues, and the generator will begin to produce higher-quality output. The discriminator will become better at flagging data that has been artificially created.

Both the generator and discriminator improve in their respective abilities until the discriminator is unable to identify the real examples from the synthesized ones with better than the 50% accuracy expected out of chance. GANs train in an unsupervised manner, meaning that they infer the patterns within data sets without reference to labelled, known, or annotated results. Interestingly, the discriminator’s work informs that of the generator. Every time the discriminator correctly identifies a synthesized result, it tells the generator how to tweak its output so that it could be more realistic in the future. 

The magic of GANs lies in the rivalry between the two neural networks. It mimics the back-and-forth between a picture forger and an art detective who repeatedly try to outwit one another. Both networks are trained on the same datasets. The first one, known as the generator, is charged with producing artificial outputs, such as handwriting or photos, that are as realistic as possible. The second, known as the discriminator, compares these with genuine images from the original datasets and tries to determine the real and fake. Based on these results, the generator adjusts its parameters for creating new images. And so it goes on until the discriminator can no longer tell what’s genuine and what’s fake.

Examples of GAN Applications:

  1. Generative adversarial networks (GANs) have already shown their worth in creating and modifying imagery. Nvidia (which has undoubtedly taken a keen interest in this new AI technique) recently unveiled a new research project which uses GAN to correct images and reconstruct obscure parts. There are many practical applications for GAN. StyleGAN is a model developed by Nvidia. It has generated high-resolution headshots of fictional people by learning attributes such as facial pose, hair, and freckles. A newly released version StyleGAN 2 — makes improvements for both architecture and training methods, redefining state-of-the-art concerning the quality perceived.
  2. In June 2019, Microsoft researchers detailed ObjGAN, a novel GAN which could understand sketch layouts, captions, and refine the details based on the wordings. The co-authors of a related study proposed a system – StoryGAN that synthesizes storyboards from paragraphs. Similarly, it can be used to create random interior designs to give decorators fresh ideas. It can also be used in the music industry, where artificial intelligence has already made inroads, by creating new compositions in various styles, which musicians can later adjust and perfect.
  3. Predicting the future events from only a few video frames – a task once considered impossible, is nearly within grasp, thanks to the state-of-the-art approaches involving GANs and the novel data sets. One of the newest papers on the subject from DeepMind details recent advancements in the budding field of AI clip generation. Thanks to the “computationally efficient” techniques, components and new custom-tailored data sets. Researchers say their best performing model – Dual Video Discriminator GAN (DVD-GAN) — can generate coherent 256 x 256-pixel videos of “notable fidelity” up to 48 frames in length. In a twist on the video synthesis formula. Last year Consultants from Cambridge Consultants demoed a model called DeepRay which invents video frames to mitigate distortion caused by dirt, smoke, rain, and other debris.
  4. GANs are capable of more than generating images and video footage. When trained on the right data sets, they can produce de novo works of art. Researchers at the Indian Institute of Technology (IIT) Hyderabad and the Sri Sathya Sai Institute of Higher Learning devised a GAN, dubbed SkeGAN, that generates stroke-based vector sketches of firetrucks, cats, mosquitoes, and yoga pose.
  5. GANs are architecturally well-suited to generating media, that includes music. In a paper published in August, researchers from the National Institute of Informatics (NII) in Tokyo describe a system that can generate “lyrics-conditioned” melodies from learned relationships between syllables and notes. Not to be outdone, in December, Amazon Web Services detailed DeepComposer, a cloud-based service that taps a GAN to fill in compositional gaps in songs.
  6. Besides, In the medical field, GANs have been used to produce data on which other AI models — in some cases, other GANs — might train and invent treatments for rare diseases that to date haven’t received much attention. In April, the Imperial College London, University of Augsburg, and the Technical University of Munich sought to synthesize data to fill in gaps in real data with a model dubbed Snore-GAN. In a similar vein, researchers from Nvidia, the Mayo Clinic, and the MGH and BWH Center for Clinical Data Science proposed a model that generates synthetic magnetic resonance images (MRIs) of brains with cancerous tumours.

However, in practice, the GANs has proven to be a brilliant idea; they are not without their limitations. Firstly, GANs show a form of pseudo-imagination depending on the task they are performing. GANs still need a wealth of training datasets to get started. For instance, without sufficient pictures of human faces, the celebrity-generating GAN will not be able to come up with new faces which mean that areas where there is no data it wouldn’t be able to use GAN. GANs cannot invent entirely new things. We can only expect them to combine what they already know in new ways.

To summarize, despite the leaps and bounds made in the past decade of research, GANs are still missing the fine-grained control, which is a big challenge. Every new technological advancement will come with its challenges, and the risks are well known to Ian Goodfellow, which his invention poses. He is working with a team of researchers whose task is to find ways in making the deep learning and machine learning more secure, ethical, and trusted by focussing on Security, Privacy, and other risks which weren’t given serious consideration early on. Therefore, comparing image generation in 2014 to today, there has been tremendous improvement in the quality. If the progress continues at the same pace, the GANs will become the critical research project. Let us wait to see how the AI GAN will be adapted by different industry verticals, especially in the Gaming Industry, which has a vast potential and opportunity.