Skip to main content

Artificial Intelligence has been around for several decades. It powers Web search engines, foreign language translations, and most apps within your smartphone.

What went viral in late 2022 was the sudden popularity of Generative AI (GenAI), with ChatGPT emerging as the first popular tool. Initially, ChatGPT and its soon-to-emerge competitors focused on generated text, and so these models were known as Large Language Models (LLMs).

How Do LLMs Work?

Large Language Models learn by training on enormous amounts of text, including books, articles, websites, and more. During training, the model repeatedly tries to guess the next piece of text in a sequence, compares its guesses to the real text, and adjusts itself to make better predictions next time. Over billions of these attempts, it internalizes patterns about how language is structured and how ideas tend to flow.

To process text, the model breaks everything into tokens, which are small pieces such as whole words, parts of words, or sometimes punctuation. You can think of tokens as the “atoms” of language for the model. Each token is converted into numbers. These numbers represent patterns learned during training, things like the relationships between words, meanings, or contexts. When the model generates text, it uses these numerical representations to predict what token should come next based on the ones that came before. Thus, an LLM can functionally be considered a “word predictor.”

LLMs are probabilistic, which means they never choose the next token with absolute certainty. Instead, they produce a list of possible next tokens along with probabilities for each one. Higher probabilities mean the model believes those tokens are more likely to make sense in the current context.

Because of this probabilistic nature, the same prompt can produce different outputs, and the model can balance between predictable, safe answers and more imaginative or exploratory ones. You can influence what the output looks like, its length, and even its tone by prompting it effectively.

They are often called chatbots, because they are optimized to hold conversations the same way humans do. Some people find the interactions so compelling, they interact with LLMs for companionship, not only to get answers or explanations.

Hallucinations

While LLMs are impressive as a technology, their nature as word predictors can result in incorrect guesses about what word comes next. This kind of error, when performed by a chatbot, is called a hallucination. This was more common in the earliest models of LLMs, but can still occur with the basic (free) models, especially when the prompt is to analyze anything larger than a dataset of only a few pages, or numerical calculations, or citing published books or articles.

One of the biggest problems with hallucinations is that they are unmarked. LLMs boldly assert their claims without hesitation or hedging language. Instead, each explanation or answer is proclaimed with confidence and certainty. Specific facts like dates, locations, or citations should be double-checked, rather than assuming they are true.

Differences Between Models

Most LLMs are free to use by visiting their websites or downloading their smartphone apps, often without even needing to create a free account. This type of access leads, not surprisingly, to the most superficial and weakest versions of that LLM. Outputs will be shorter, more likely to contain hallucinations, and will “sound like AI.”

Creating a free account anchored by a contact you verify (such as a mobile number or email address) unlocks a slightly better version of that same AI. Remember the old adage, however: if you aren’t paying for the service, you are not the customer. You are the product. They might sell your contact information to advertisers, for example.

Both types of free versions limit how much AI output can be generated each day. This is done partly because generating AI output costs money, so the company logically wants to limit how much it gives away for free. They also have a vested interest in making the paid versions of their models perform better so people are willing to pay for subscriptions.

Many LLMs have at least one paid subscription model, often priced as a relatively inexpensive charge per month. Some chatbots also have a much more expensive model (10x the price of the inexpensive one, also per month, is one common pricing scheme).

The paid subscription models unlock historical releases of the primary LLM model, and often include completely new versions that are trained differently. The default is often the same as the free accounts, but is here labeled “fast.” This is because the AI output is short, simple, and shallow. Often, this is enough for everyday queries. But the paid tiers include models that can analyze more uploaded data, can provide more thorough and deeper analysis, and can avoid most hallucinations by slowing down, thinking about its output from multiple angles, and even performing double-checks before delivering the output. These models are sometimes called “thinking” or “reasoning” models, and, although the output is more trustworthy, are actually slower to deliver the output. The most expensive subscriptions sometimes include models that further increase the analysis and accuracy, and may be labeled “deep thinking” or “deep research.”

Multi-Modal AI

Initially, generative AI models focused on just one type of output. ChatGPT, which went viral first, initially provided text as its output. Other types of GenAI provided images as output, and, a few years later, some provided video as their AI output.

Over time, the largest LLM models began also including image and video output options, so that one website or app could create all types of GenAI, not just text.

Originally, image and video generation was limited to text inputs only. Later, other types of files could be uploaded and included in the prompt. One could upload an image for instance, and include in the prompt directions for how the AI should alter the image, and the output would be an altered image. The multi-modal nature of such AI models (including text, images, or video in both the input and the output) also led to models continuing to expand the types of files that could be used for input and generated for output. One example might be uploading a PDF and requesting a PowerPoint file based on the PDF content as the output.