Large Language Models (LLMs) like ChatGPT have shown impressive results in the past two years. However, people have realized that while these models are incredibly knowledgeable, they often lack humor. Ask any mainstream LLM to tell you a joke, and you’ll likely receive a dull, dad-joke-level response.
For example, here is an joke from ChatGPT
The problem goes beyond just being funny. LLMs have failed to create memorable personalities, resulting in shallow interactions. This is the reason why most of the AI companion products feel like role-playing games. Because people get bored quickly with one character, platforms need to encourage users to create a lot of characters to keep them engaged.
Why does this happen? There are two main reasons:
The first one is that LLMs lack capability of deep reasoning. Creating humor is a challenging task. The two key ingredients for humor are surprise and contradiction, which demand a profound understanding of how things logically work and then intentionally deviating from the norm.
However, LLMs struggle to understand deep logical connections and context, which are essential for humor. They tend to focus on literal interpretations, missing the subtleties that make language humorous.
The second one is the limitation of dataset and evaluation: Many models are trained to excel on specific benchmarks and tasks that are are outdated. The existing LLM evaluation focus heavily on question answering or academic tests because researchers can easily access those. This has resulted in an overemphasis on one particular subdomain at the expense of more nuanced language understanding and creative expression. Consequently, responses generated by these models lack personality.
What is likely the path ahead for it? Here are my take:
1. Better Human-AI collaboration. As current LLMs struggle with understanding the deeper logic of language, making them truly funny might require significant advancements in their reasoning capabilities.
Some progress will be made as LLMs keep getting more parameters. However, it is a long way to go as it is hard to turn LLM into a reasoning machine. A more realistic approach is to harness human wisdom and creativity to help LLMs bypass complex logical reasoning and directly generate funny content.
Humans actually are very good at capturing nuances. It is easier to develop a AI that can that leverage human’s capability build the capability in LLM itself. For example, when creating funny comments for videos, using existing human comments from the video can boost the quality of the jokes. This falls into the domain of Human-based Computation . One famous example for this is CAPTCHA that is used to verify if a human is a bot. At the same time it is also teaching machines to solve hard computer vision tasks.
2. Online Learning: Most current LLMs are offline, taking months to train and then remaining frozen. This makes it nearly impossible for models to adapt based on real-time human feedback. One would argue that retrieval augmented generation (RAG) is just a poor man’s solution to make LLMs to be able to learn the facts in the real-time. However, simple RAG doesn’t have the ability to capture nuances. We need to design ways for online learning, allowing models to capture and incorporate human feedback in near real-time.
3. Better Evaluation: Current datasets for evaluating AI-generated humor are too narrowly focused. The AI community needs to overcome this limitation to create more comprehensive assessment tools.
The way we interact with Large Language Models (LLMs) is just as important as the answers they provide. Engaging with AI shouldn’t be a dull and boring experience. By following the directions described above, I think we will be able to create truly funny AI systems that can engage in witty, personable interactions with humans in the foreseeable future.
