Skip to main content

Perturbations To The Artificial Intelligence Success Story

Artificial Intelligence (AI) seems to be almost constantly in the news, eliciting awe, fear or some combination of the two in the journalists reporting on the subject. It’s hard to overstate quite how much has been achieved in this field, especially in the last decade. But all technology has flaws and limits that engineers must compromise with in order to deliver cost-effective solutions to real-world problems and AI is not exempt. In this article, we will examine the flip-side of the AI success story and consider some of these limitations, which will hopefully allow us a little space to more dispassionately evaluate what is, ultimately, still just a computational tool.

Origins

So, what exactly do we mean when we talk about ‘Artificial Intelligence’? To appreciate the nuances, we need to take a little wander back in time. Feel free to imagine the ‘vertical wobble’ visual effect from low-budget 1980’s sci-fi at this point.

We find ourselves in the summer of 1956, at Dartmouth College in New Hampshire. If we head over to the maths department, we will find the top floor occupied by a group of mathematicians and computer scientists who are considering a proposal by a young professor called John McCarthy that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."

Dartmouth Conference Participants 1956

August 1956. From left to right: Oliver Selfridge, Nathaniel Rochester, Ray Solomonoff, Marvin Minsky, Trenchard More, John McCarthy, Claude Shannon.

Eight weeks of discussion and deep thought gave rise to the broad-brush ideas that constitute AI as we currently know it. Two main research camps emerged over the next few years. The first was the ‘expert systems’ approach where an ‘Inference Engine’ is used to apply a heuristic form of symbol-based logical reasoning to a data pool about the task the system is designed to carry out, known as the ‘Knowledge Base’. The second camp looked to emulate the operation of biological brains with artificial neural networks.

Expert Systems

Expert systems got their first big break with the Stanford Heuristic Programming Project in 1965, under the leadership of the ‘father of expert systems’ Edward Feigenbaum.

Two separate early expert systems were built - 'Dendral' specialised in analysing and identifying chemical compounds and is widely considered the first expert system. ‘MYCIN’ was derived from Dendral and was used for identifying infection-causing bacteria and recommending appropriate antibiotics. By focusing on a limited (but deep) foundation of knowledge, they became the first successful attempts at AI software.

The glory days of expert systems came in the 1980's when up to two-thirds of Fortune 500 companies used expert systems in some form. Interest in expert systems was international and a lot of funding went into research across the globe. However, there were difficulties both in managing and maintaining the knowledge base and writing rules that could properly utilise the knowledge of experts. Eventually, expert systems technology fell too far behind the hype and with no obvious way to expand into a more general form of AI, expert systems were discarded and the research found itself in an AI winter by the early 1990's.

Artificial Neural Networks

In the artificial neural network camp, the U.S. Office of Naval Research unveiled the first example in July 1958. Cornell University psychologist Frank Rosenblatt had used funding from the U.S. Navy to create the perceptron, which he described as a "pattern-recognizing device". The 5-ton machine had 400 light sensors acting together as a retina, feeding data to around 1,000 "neurons" that did the processing to produce a single output. But Rosenblatt's ambitions were stunted by the feeble computing capabilities of his era and he rapidly hit the ceiling, admitting in his inaugural paper that "as the number of connections in the network increases...the burden on a conventional digital computer soon becomes excessive."

Frank Rosenblatt with the perceptron

Frank Rosenblatt with the perceptron in 1958

Optimism through the 1960's saw government agencies in the United States and the United Kingdom pour money into speculative research, but by the 1970's this had dried up as AI research failed to live up to the hype. AI was in its first winter.

A Third Wave

Roll on three decades of Moore’s Law and the number of computations computers can perform per second has increased by roughly 10 million times. In the early 2000's all this extra horsepower started to be tapped for a new class of algorithms using symbolic reasoning for applications like simultaneous localisation and mapping (SLAM) which allows mobile robots to incrementally build maps (that they can place themselves within) as they explore their world.

Momentum gathered in the following decade with the rise of new, deep neural networks learning from massive data sets. Along with the hype we got some staggering achievements, some highlights of which include: in 2005, a Stanford team won the DARPA Grand Challenge autonomously driving 211km on an unrehearsed off-road trail; in 2011, IBM's Watson won at "Jeopardy!"; in 2015, DeepMind's Alphago beat a champion at the complex game Go;  Since then DeepMind has gone on to beat humans at StarCraft II and predict how proteins fold while others offer superior analysis of medical scans.

DeepMind's AlphaGo beating world champion Lee Sedol

DeepMind's AlphaGo beating world champion Lee Sedol

Despite all the success, there are clouds on the horizon that may soon put AI once again into the trough of the Gartner hype cycle and, depending how deep the tough, maybe another winter.

Looming Dark Clouds

What could possibly cast a shadow over AI’s land of kittens and unicorns? I think there are two main vectors that will meet in the very near future that will severely vex AI research. The first vector is the very real boundaries to what deep learning is capable of being applied to without failure. The second is the spiralling computational cost for diminishing improvement returns. What compounds these problems is the hyped public expectation that everything really is kittens and unicorns and there is nowhere to go but up.

Boundaries

One key aspect of current AI that is almost always overlooked by breathless journalists is that in any successful deployment, one of two things is true: either there is a human somewhere in the control loop or the cost of failure is low enough that no one cares.

If your robot lawnmower or vacuum cleaner misses a patch of ground, it's a minor irritation. If the AI system deciding what advertisements to show you when you land on a web page shows you something well outside your range of interests, again it's no biggie.

But where life and limb are at stake should something go wrong, a human needs to be in the loop. This is why all deployed self-driving systems on production vehicles are (despite marketing hype) level 2, which means a human driver is meant to keep their hands on the wheel and be attentive enough to take over should the system fail. There have already been multiple fatalities where a human driver failed to play their part and just left the vehicle to its own devices.

The point is, failures happen and the increasing ubiquity of AI means that failures could detrimentally affect not just an individual but millions of people. It is concerning enough that the AI community is cataloguing these failures with a view toward understanding the full breadth of the risks they may pose. While failures are embarrassing and may even have legal ramifications, it is in the industry's best interest to be open about them, as the main failure threat to AI as a whole is a failure of public trust in the technology.

One of the main conundrums is that the neural network technology driving most AI systems often fails in ways that obfuscate the cause so the failure mode remains a mystery to researchers. With that in mind, let’s consider some of the documented failure types.

Bias

Of great concern to civil liberties campaigners is the increased use of AI to offload major decisions, such as who receives a loan, who goes to jail, and who gets health care. Any bias in these decisions can have profound social effects on large numbers of people.

Perhaps one of the most well-documented biases is facial recognition systems’ poor accuracy with people who have darker skin tones, which can lead to very real negative consequences.

In healthcare too, racial bias stemming from flawed assumptions about patients has caused black patients to miss out on intensive-care programs to less sick white patients. The algorithm assumed that people with high health care costs were also the sickest patients and therefore most in need of care. However, outside the algorithm, black patients tend to have less insurance coverage so are less likely to accumulate high costs.

It's pretty rare for malignant intent to be behind this bias but the effects can be so far-reaching that UC Berkely drafted a playbook outlining a few basic steps that governments, businesses, and other groups can put in place to detect and correct any biases in the AI software they use.

Catastrophic Forgetting

Catastrophic forgetting (aka catastrophic interference) is the tendency of an AI to completely forget information it previously learned, rather abruptly, after training on new information. Essentially the new knowledge overwrites the previous knowledge.

The most obvious solution is to retrain on old data when learning something new. Although 'replay' somewhat solves catastrophic forgetting, constantly retraining previously learned tasks is highly inefficient and the amount of data stored quickly becomes unmanageable.

Finding an elegant solution to this problem is essential for the development of continual learning machines.

Brittleness

Take an image with an object (e.g. a car or kitten) that a neural network can correctly identify and rotate it 90 degrees. Run the image by the AI again. One study that did this found that the AI failed to recognise the object 97 per cent of the time. Without being explicitly taught, AIs are not capable of mental rotation tasks any human 3-year-old would find obvious. It’s an example of AI brittleness, where AIs can excel at their task until, when taken into unfamiliar territory (known as adversarial input), they break in unpredictable ways.

Brittle failures can be caused by remarkably small changes (known in the community as 'perturbations'), to the point where changing a single pixel on an image can make an AI think a ship is a car or a horse is a frog. One area where this could have major implications is medical imaging where there has been a lot of coverage about how accurate deep learning can be but the subtlest modifications to scan images - imperceptible to the human eye - can result in misdiagnosed cancer 100 per cent of the time.

Obscurity

The explanation for why an AI thinks a patient has cancer or has fingered someone for a crime can have many legal, medical, and other consequences. As we mentioned earlier, even when it is working as expected, the way AI reaches conclusions is an enigmatic black box. This could have a profound effect on the public's ability to trust AII conclusions. However, work is in progress to try to shed light on what’s in the box.

Computational Cost

Like Frank Rosenblatt before them, today's deep-learning researchers may be nearing the frontier of what their tools can achieve, as they chase diminishing returns. As an example, in 2012 AlexNet, the first model to really show the power of training deep-learning image recognition systems on graphics processing units (GPUs), was trained for five days using two GPUs. By 2018, another model, NASNet-A, had cut the error rate of AlexNet in half, but used more than 1,000 times the computing power to do it.

Deep-learning models tend to be overparameterized, meaning they have more parameters than there are data points for training. Deep learning avoids overfitting data by initializing the parameters randomly and then iteratively adjusting sets of them to better fit the data using a method called stochastic gradient descent. This ensures that the model created generalizes well. Unfortunately, this flexibility is computationally expensive.

Part of the reason for this cost is true of all statistical models: To improve performance by a factor of k, at least k2 more data points must be used to train the model. But the larger part of the computational cost comes explicitly from overparameterization. When accounted for, this means a computational cost for improvement of at least k4. That exponent is very expensive: A 10-fold improvement requires at least a 10,000-fold increase in computation.

Moore’s Law can’t help us here. To get the 1,000-fold difference between the computing used by AlexNet and NASNet-A, only a six-fold improvement came from better hardware; the rest came from using more processors and running them longer.

To put this into monetary terms, consider that OpenAI trained the highly acclaimed deep-learning language system called GPT-3 at the cost of more than $4 million. Despite knowing they made a mistake when they implemented the system, they didn't fix it: "due to the cost of training, it wasn't feasible to retrain the model."

Then there was Google subsidiary DeepMind. When they trained AlphaGo to play Go, it was estimated to have cost $35 million. These kinds of costs for incremental improvements are not easily carried by businesses generally, and we haven’t even considered the environmental costs of powering these monster systems.

Final Thoughts

Many in the AI industry know things are going to have to change, if only because the current trajectory is rapidly advancing on the buffers at the end of the line. This is already leading to research into more efficient algorithms and different approaches.

For example, while deep learning has made incredible advances, appearing to definitively trounce the expert system approach, in reality, the outcome is not that simple. Consider the robotic hand from OpenAI that made headlines for manipulating and solving a Rubik's cube. The robot used neural nets and symbolic AI. It's one of a new breed of neuro-symbolic systems that use neural nets for perception and symbolic AI for reasoning, a hybrid approach that may offer gains in both efficiency and explainability.

Maybe the pendulum will swing back and upcoming systems will rely more on experts once again to identify what needs to be learned in order to reduce the ruinous cost of training.

I am pretty confident that new, more efficient methods will be developed. The question is when and will it be soon enough to prevent a stagnation that leads to another AI winter? The answer is playing out as we watch.

Mark completed his Electronic Engineering degree in 1991 and worked in real-time digital signal processing applications engineering for a number of years, before moving into technical marketing.