visit
Earlier this year, Elon Musk-backed artificial intelligence laboratory, , released its latest, much anticipated , the Generative Pre-trained Transformer 3 (GPT-3). Emerging to much fanfare and slated as the usherer of a new age of artificial intelligence, the number of articles, blog posts, and news pieces about this language model, perhaps match only the number of parameters the GPT-3 learned; 175 billion (Ok, this may be an exaggeration, but you get my point).
This blog post will not present "cool" conversations I had with GPT-3, nor will it review the countless (commendable) , , and authored by this highly advanced robo-Hemingway.
So what will I be covering?
An artificial neuron features incoming , a cell body activated when the inputs cross a certain threshold, and an output. To learn a new task, the artificial neural network is exposed to a vast number of examples.
For instance, if a neural network is tasked with recognizing images of cats and dogs, we would need to expose it to a multitude of images in order to train it to ascertain correctly which of the images is of a dog and which is of a cat, continually updating the weights (parameters) until the desired output is produced. Although these algorithms have catapulted us to new heights in artificial intelligence, they do have some crucial shortcomings.By and large, neural networks are colossal in size, which means that rather than learning anything, the neural network can simply use its weights or certain values to store the data inputted. If we showed the network very few examples of cats and dogs, it would hold the data in its weights, thus always retrieving the correct answer. This can often become a significant problem, as we want the network to not only memorize the data but also to generalize from it into new data, the same way humans do with extreme ease.
The second term we must first establish before diving into GPT-3 is language models.
In short, a language model is a model trained to see a sentence in natural language and output the probabilities of the actual next word or character in the sentence. Let's take a look at the example below. This language model is tasked with predicting the probability of the possible next words, in this sample; dog, mouse, squirrel, boy, and house. As simple as this task is for us humans, for a language model to complete the sentence correctly, it would need to undergo rigorous training and iteration.What makes GPT-3 so unique as a language model powered by neural networks is its sheer size. The chart below compares different language models by the number of parameters (roughly, weights) that they have learned.
As you can see, GPT-3 learned 1029% more parameters than runner up , at 175 billion compared with 17 billion. Conservative estimates place the cost of one training run of GPT-3 at .
The specific architecture of the GPT-3 is mostly identical to its predecessor, GPT-2, but training this gargantuan-sized model is an engineering feat for the history books. OpenAI used an astronomical swath of the internet to train the model, which is a slight exaggeration but not too removed from reality. The data used to train GPT-3 comprises several corpora that include (a depository of the internet filtered for quality), the entire Wikipedia dump, and several other coding and math databases.So what can GPT-3 do? Well, for one thing, it can answer—in easily understood natural language—an expansive array of questions on any topic while retaining the context of previous questions asked. Every single item in the snippet below was answered correctly by the language model, and it was able to make the connection between the individual referred to in one answer (Dwight D. Eisenhower was president of the United States in 1955) to the following question (He belonged to the Republican Party).
Despite the exuberant resources and brain-power invested in GPT-3, it is not without its fatal flaws. Let's examine the Q&A session below. GPT-3 answers all questions correctly except for one: "Which is heavier, a toaster or a pencil?" to which it replied wrongly: "A pencil is heavier than a toaster." Now, this may seem like a small chink in GPT -3's armor, but in fact, it's incredibly revealing. If you recall, I previously stated that one of the critical shortcomings of any neural network is that rather than learning, generalizing, and inferring the same way humans do, it will often just memorize, storing the data in its weights.
In the case study below, we can deduct that GPT-3 was able to provide answers that were saved somewhere on the internet, but when faced with a question that, as it would seem, has no answer online, the GPT-3 provides an incorrect output.GPT-3 may write like a human, but it cannot yet reason like one.
Well, for starters, the independent United States did not exist until the year 1776. As humans, when we do not know the answer to a question or, in this example, a plausible answer does not exist, we can communicate that we do not know the answer. In the case of GPT-3 (and many other AI language models for that matter), it doesn't know that it doesn't know, and will still opt to retrieve an answer even if it’s incorrect.
One could argue that Elizabeth I did indeed rule the United States in 1600, as she was the rightful monarch of what was then a British colony, but in no way was she the president of the United States. This again hints at the fact that GPT-3, with its massive cache of data, is able to memorize an amount of information that would put any encyclopedia to shame. Still, it cannot generalize or infer the way the average person can.
A Question of Dynamics
Businesses are living, breathing organisms, forever developing, evolving, and renewing. If nothing else, 2020 has demonstrated quite exquisitely how quickly things can change in the 21st century and how rapidly new information becomes stale. GPT-3 was trained on data that was current up to October 2019; thus, it can name any dinosaur from the Mesozoic era, but it cannot tell you who the newly elected president of the United States is.To demonstrate how problematic this can become let's use a real-world example. uses to empower its patients to easily find physicians by different attributes such as location, insurance, and specialties, schedule appointments online, troubleshoot portal issues and get the latest updates on COVID-19.
In stark contrast, existing conversational AI solutions regularly updated either manually or automatically, provide users, whether concerned patients or discerning motorists, with fresh, relevant, and helpful answers to their queries. As a business continues to grow and expand, conversational AI solutions grow and scale with it, serving as the first point of contact for new and existing customers.
Prohibitive Pricing
Ultimately, there’s no such thing as a free lunch. While OpenAI may have launched a complimentary version of GTP-3 for a two-month private beta on July 11, October saw early-adopters having to choose between a steep scale of pricing plans, including four different offers based on a token system. That’s in line with their overall transformation from beloved non-profit to revenue-producing startup. One of the first researchers to gain access to the beta version, Gwern Branwen, :
While tokens might sound friendly, they actually represent a tricky pricing structure that could potentially break the bank. Let’s dig into tokens as a unit of measurement; a token is the conversion of a sequence of text into smaller semantic units such as characters, and includes both prompt and completion phases.
Robust NLP needs to be fed an alarmingly high amount of these tokens. Take the GPT-3 model itself, which consumed 499 billion tokens in order to achieve its current quality threshold. According to Branwen, a GTP-3 enthusiast, “a 2 million token of the 2nd tier will correspond to 3000 pages of text. And to relate it better: Shakespear’s entire work consists of ~900,000 words or 1.2 million tokens.” Murat Ayfer, creator of PhilosopherAI.com, a website that generates philosophical strings from user queries, for his business. With an average of 750,000 prompts per month that generate 400 million tokens in 2 or 3 weeks, Ayfer is on track to face charges of $4000 per month, minimum. Keep in mind that PhilosopherAI.com’s use case is not considered particularly extensive.Outside of tech juggernauts and larger players, it’s infeasible to believe that startups and smaller entities will be able to offer the NLP experience that OpenAI originally promised based on the current pricing structure. Within the natural language community, one fear is that OpenAI customers will be forced to offload the costs on to their users, mixing ads with conversational AI.At the end of the day, the "holy grail" of artificial intelligence is explainability and control. There may be a time in the not so distant future when language models can explain themselves and even argue or dissect their internal processes; but at present, businesses looking to provide their audiences with engaging, timely, and helpful conversational experiences will continue to rely on existing conversational AI solutions.
Also published at: