Categories
English

Context Modeling: The Future of Personalized AI

Andrej Karpathy, a prominent voice in the AI community, recently brought the term “Context Engineering” to the forefront. It describes the intricate art of manually crafting prompts and data to guide Large Language Models. While the concept is gaining significant attention, I believe it points us in the wrong direction.

The future of personal AI isn’t about endlessly engineering context, but requires a radical shift to what I call ‘context modeling.’

This isn’t just semantics—it’s the difference between a temporary patch and a real solution.

The Limitations of Current RAG Systems

Today’s Retrieval-Augmented Generation (RAG) systems follow a relatively straightforward paradigm. They retrieve relevant information using rule-based systems—typically employing cosine similarity to find the top-k most relevant results—and then present this context to a large language model for processing. While this approach has proven effective in many scenarios, it suffers from significant limitations.

Think of current LLMs as exceptionally intelligent but stubborn team members. They excel at processing whatever information is presented to them, but they interpret data through their own fixed worldview. As these models become larger and more complex, they also become increasingly “frozen” in their approaches, making it difficult for developers to influence their internal decision-making processes.

From Engineering to Modeling: A Paradigm Shift

The conventional approach of context engineering focuses on creating more sophisticated rules and algorithms to manage context retrieval. However, this misses a crucial opportunity. Instead of simply engineering better rules, we need to move toward context modeling—a dynamic, adaptive system that generates specialized context based on the current situation.

Context modeling introduces a personalized model that works alongside the main LLM, serving as an intelligent intermediary that understands both the user’s needs and the optimal way to present information to the large language model. This approach recognizes that effective AI systems require more than just powerful models; they need intelligent context curation.

Learning from Recommendation Systems

The architecture for context modeling draws inspiration from the well-established two-stage recommendation systems that power many of today’s most successful platforms. These systems consist of:

  • Retrieval Stage: A fast, efficient system that processes large amounts of data with a focus on recall and speed.
  • Ranking Stage: A more sophisticated system that focuses on accuracy, distilling signal from noise to produce the best results.

RAG systems fundamentally mirror this architecture, with one key difference: they replace the traditional ranking component with large language models. This substitution enables RAG systems to solve open-domain problems through natural language interfaces, moving beyond the limited ranking problems that traditional recommendation systems address.

However, current RAG implementations have largely overlooked the potential for model-based retrieval in the first stage. While the industry has extensively explored rule-based retrieval systems, the opportunity for intelligent, adaptive context modeling remains largely untapped.

The Context Modeling Solution

Context modeling addresses this gap by introducing a specialized model dedicated to generating context dynamically. This model doesn’t need to be large or computationally expensive—it can be a focused, specialized system trained on relevant data that understands the specific domain and user needs.

The key advantages of context modeling include:

  • Adaptability: Unlike rule-based systems, context models can learn and adapt to new patterns and user behaviors over time.
  • Personalization: These models can be trained on user-specific data, creating truly personalized AI experiences that understand individual contexts and preferences.
  • Efficiency: By using smaller, specialized models for context generation, the system maintains efficiency while providing more intelligent context curation.
  • Developer Control: Context modeling provides agent developers with a trainable component they can influence and improve, creating opportunities for continuous learning and optimization.

The Ideal Architecture: Speed and Specialization

For context modeling to be viable, it must satisfy one critical requirement: speed. The latency of the core LLM is already a significant bottleneck in user experience.

Right now, the main workaround is streaming the response. However, the latency to the first token cannot be mitigated by streaming. The end-to-end latency of the retrieval model contributes to the latency of the first token. Any context modeling system must be exceptionally fast to avoid compounding this delay.

This brings us to the concept of “thinking” models, which use their own internal mechanisms to retrieve and reason over context before generating a final answer. In a sense, these models perform a specialized form of context modeling. However, their primary challenge is that this “thinking” process is slow and computationally expensive.

I argue that these monolithic “thinking” models are an intermediate step. The optimal, long-term architecture will decouple the two primary tasks. It will feature two distinct models working in tandem, mirroring the two-stage systems that have been so successful in recommendations:

  1. A Fast Context Model: A highly optimized, specialized model dedicated solely to retrieving and generating the most relevant context at incredible speed.
  2. A Powerful Core Model: The large language model that receives this curated context and focuses on the complex task of reasoning, synthesis, and final response generation.

This dual-model approach allows for specialization, where each component can be optimized for its specific task, delivering both speed and intelligence without compromise.

The Infrastructure Opportunity

Context modeling represents a common infrastructure need across the AI industry. As more organizations deploy RAG systems and AI agents, the demand for sophisticated context management will only grow. This presents an opportunity to build foundational infrastructure that can support a wide range of applications and use cases.

The development of context modeling systems requires expertise in both machine learning and system design, combining the lessons learned from recommendation systems with the unique challenges of natural language processing and generation.

Looking Forward

The future of personalized AI lies not in building ever-larger language models, but in creating intelligent systems that can effectively collaborate with these powerful but inflexible models. Context modeling represents a crucial step toward this future, enabling AI systems that are both powerful and adaptable.

As we move forward, the organizations that successfully implement context modeling will have a significant advantage in creating AI systems that truly understand and serve their users. The shift from context engineering to context modeling isn’t just a technical evolution—it’s a fundamental reimagining of how we build intelligent systems that can adapt and personalize at scale.

The question isn’t whether context modeling will become the standard approach, but how quickly the industry will recognize its potential and begin building the infrastructure to support it. The future of personalized AI depends on our ability to move beyond static rules and embrace dynamic, intelligent context generation.

Questions or feedback? I’d love to hear your thoughts.

Want more insights? Follow me:

🎙️ Founder Interviewshttps://www.youtube.com/@FounderCoHo
Conversations with successful founders and leaders.

🚀 My Journeyhttps://www.youtube.com/@jingconan
Building DeepVista from the ground up.

Categories
English

Think Wider: AI as Perspective Partners

Everyone’s obsessed with making AI reason deeper — train models to solve complex mathematics and master intricate proofs. But in this race for artificial intelligence, we’ve forgotten something fundamental about human intelligence – how we actually think and work with others.

Deep reasoning is important, but it’s only half the story. What if, instead of treating AI as tools to be prompted, we can engage with them as naturally as we do with brilliant colleagues – each bringing their unique perspective to the conversation? This isn’t just about making AI think deeper – it’s about unlocking its breadth of knowledge through better questions.

Asking good questions is harder than finding answers. Try this mental exercise: give yourself two minutes to write down ten meaningful questions about different topics. Hard, isn’t it? The difficulty lies in where to even begin. With regular problems, we at least have a starting point – a puzzle to solve, a goal to reach. But with questions, we’re creating the map before we know the territory. It’s not about finding the right path; it’s about imagining what paths might exist in the first place.

And this is where LLMs shine: they’re incredible brainstorming partners. Not just because they can process vast amounts of information, but because they can consider countless angles that might never occur to us. Humans are limited by our experiences and cognitive biases. LLMs don’t have these limitations. They can draw connections between seemingly unrelated concepts and suggest possibilities we might never have considered.

The challenge isn’t that LLMs lack knowledge – they have plenty. The real problem is that we don’t know how to extract it effectively. We initially thought prompt engineering was the magic key, but that turned out to be too simplistic. It works, but it feels mechanical and constrained. We don’t think about conversation frameworks or prompt strategies when chatting with friends. We just… talk.

What we need is AI that can match this natural fluidity of human conversation. Imagine an AI system that could seamlessly shift between different types of expertise, like a conversation with a renaissance polymath who can speak authoritatively about both quantum physics and Renaissance art. This isn’t just about role-playing – it’s about having access to different ways of thinking about problems.

Imagine an AI system that could seamlessly shift between different types of expertise, allowing you to converse with a diverse range of experts. For instance, you could discuss quantum physics with a renowned physicist like Albert Einstein, then transition to a conversation about Renaissance art with an art historian like Giorgio Vasari. This isn’t merely about role-playing; it’s about accessing diverse perspectives and specialized knowledge to enhance your understanding and problem-solving abilities.

I’ve started calling this concept “persona-driven discovery,” and I believe it could revolutionize how we learn and solve problems by acting as a catalyst for serendipity.

We’ve all had those magical moments in libraries where we stumble upon exactly the book we needed but weren’t looking for. These AI systems could create those moments deliberately, suggesting unexpected perspectives and prompting us to explore unfamiliar territories. It’s like having a brilliant friend who knows when to push you out of your intellectual comfort zone.

All of this points toward a future where AI tools aren’t just answering our questions but actively participating in our thinking process. They could help us prototype ideas faster, facilitate group brainstorming sessions, and create personalized learning experiences that adapt to our individual ways of thinking.

The real breakthrough will come when we stop thinking about these systems as tools and start thinking about them as thought partners. This shift isn’t just semantic – it’s fundamental to how we might solve problems in the future. Instead of asking an AI to complete a task, we might engage it in a genuine dialogue that helps us see our challenges from new angles.

The building blocks are already there: we have models that can process and generate human-like text, we have systems that can maintain context in conversations, and we’re developing better ways to keep AI knowledge current and relevant. 

There are still unsolved problems.  When we make AI systems more specialized (like training them to be history experts), they often lose their broader capabilities. It’s the same trade-off you get with human experts – deep knowledge in one area often comes at the cost of breadth. The trick will be creating systems that can maintain both depth and breadth, switching between different modes of thinking without losing their fundamental capabilities.

This is what role-aware AI systems could offer – not just rigid question-and-answer sessions, but fluid conversations where different perspectives emerge organically as needed. Each AI “participant” would bring their unique expertise and viewpoint while staying current with the latest developments in their field. They would build on each other’s insights, challenge assumptions, and help you see problems from angles you might never have considered on your own. The key isn’t just having access to different types of knowledge, but having them work together in a way that mirrors the natural give-and-take of human conversation.

The potential impact of this shift could be profound. Just as the internet changed how we access information, these AI thought partners could change how we process and use that information. They could help us break out of our mental ruts, see connections we might have missed, and approach problems from angles we might never have considered.

This is the future I’m excited about – not one where AI replaces human thinking, but one where it enhances and expands it in ways we’re just beginning to imagine.

BTW, this essay is written after a long brainstorming session with the Hachi of “Paul Graham”. You can check it here: https://go.hachizone.ai/pg-think-wider

Please message me at jing AT hachizone.ai if you have any feedback on this essay or Hachi in general.

Categories
English

Rethinking Digital Discovery: From Algorithms to Human Perspectives

In our quest to organize the world’s information, we’ve created two dominant systems: search engines and recommendation algorithms. Both promised to make discovery easier, yet each has introduced its own set of challenges. Let’s examine why these systems fall short and how we might find a better way forward.

The Consensus Trap of Search Engines

In 1998, Google revolutionized the internet with PageRank, an algorithm that organized information through collective wisdom. The premise was elegant: websites with more backlinks were probably more important and trustworthy. It was democracy in action – the internet voting on itself through links.

While this approach works beautifully for factual queries like “what is the speed of light,” it struggles with nuanced topics where diversity of perspective matters more than consensus. The very nature of PageRank creates a self-reinforcing cycle: popular sites become more visible, leading to more backlinks, leading to even greater visibility.

This system inadvertently flattens the richness of human knowledge into a popularity contest. It’s as if we’re asking the entire world to vote on the best restaurant in your neighborhood – the results might reflect broad appeal, but they’re unlikely to match your specific tastes or needs.

The Echo Chamber of Recommendation Systems

On the other side, we have recommendation systems that promise personalization but often trap us in what we call “rabbit holes.” These algorithms study our behavior and serve us more of what we’ve liked before, creating increasingly narrow feedback loops.

Start watching a few cooking videos, and suddenly your entire feed becomes culinary content. Click on a political article, and your recommendations quickly become an echo chamber of similar viewpoints. While this approach maximizes engagement, it does so at the cost of serendipity – those unexpected discoveries that broaden our horizons.

The problem isn’t just that these systems can be limiting; it’s that they operate as black boxes. Users have little understanding of why they’re seeing certain content and even less control over steering their discovery journey.

Looking Back to Move Forward

Interestingly, the solution to these modern challenges might lie in how we discovered information before these technologies existed. Think back to how we naturally sought out knowledge: through conversations with friends, colleagues, and mentors.

When we wanted to discover new books, we didn’t poll the entire world or rely on an algorithm to analyze our past reading habits. Instead, we talked to friends whose taste in literature we trusted. When we needed restaurant recommendations, we asked colleagues who shared our culinary preferences.

This system worked because:

  1. We understood exactly why we valued each person’s perspective
  2. We could actively choose whose recommendations to seek out
  3. Different friends offered different viewpoints, naturally creating diversity
  4. Serendipitous discoveries happened organically through conversation

The Power of Personal Perspective

What if we bring this human-centered approach to digital discovery? Imagine a system that doesn’t try to replace human judgment with algorithms, but instead helps you find and follow the curators whose perspectives you value.

This isn’t just personalization based on your past behavior – it’s about actively choosing whose lens you want to view the world through. A food critic might have thousands of followers, but you might prefer your friend’s hole-in-the-wall recommendations because they understand your particular palate.

The beauty of this approach is that it preserves what makes human curation special:

  • Natural serendipity through the diverse interests of your trusted curators
  • Full transparency about why you’re seeing certain content
  • Control over whose perspectives influence your discovery
  • The ability to step out of your comfort zone by following curators with different viewpoints

A New Path Forward

The future of information discovery isn’t about achieving perfect consensus through PageRank, nor is it about increasingly sophisticated recommendation algorithms. It’s about recognizing that people – with their unique perspectives, expertise, and ability to surprise us – are the ultimate curators of information.

By bringing the human element back to discovery, we can create a system that offers both personalization and serendipity, both efficiency and understanding. Most importantly, we can build a system that puts users back in control of their discovery journey.

The future of discovery isn’t about finding what algorithms think is best – it’s about connecting with the human perspectives that truly resonate with you.

Categories
English

Beyond the Hype: Three Lessons from a Startup Rollercoaster

My first startup journey was a rollercoaster – a wild ride that began with a spark of an idea.

Drawing from my experience at Google Brain, I had a strong instinct that combining human feedback using reinforcement learning would significantly improve the experience of LLMs in dialogue. I started to explore this idea to build an assistant for knowledge workers in 2021. My gut told me there was a big opportunity in the space, but I wasn’t sure when the mass market would recognize it. Later, it turned out that this was one of the fundamental ideas behind ChatGPT.

However, everyone said the space was tiny back then, and potential investors and peers saw little value in my concept. As a first-time founder, the unanimous skepticism I made me doubt my vision. I ended up wasting a lot of time and getting distracted by other directions.

I eventually pushed forward, though not fast enough. We launched in September of 2022, right before ChatGPT was released. When we launched, we were amazed by the positive feedback and delight we got from customers. However, within months, ChatGPT’s release completely transformed the landscape. Suddenly, customers began to have much higher expectations and hesitated to sign contracts.

It was obvious that we had to build more to make ourselves stand out. However, the process felt like a constant chase of whimsical hope. We tried to tackle different parts of our customers’ workflow. Customers told us they liked our AI capability but missed features they longed for from their existing vendors. We tried to build those, but existing incumbents would quickly add AI capabilities similar to ours, making our new solutions seem redundant.

I ended up starting a new startup in a completely different space. But I learned three crucial lessons from the first startup journey.

You need to trust your gut.

I wasted a precious year that could have helped us build a more defensible moat and establish ourselves as the market leader, better preparing us for when ChatGPT’s storm hit. A few months of lead time is not enough—you often need more. When everyone says your idea is impractical, it is the best time to build your competitive advantage. Innovation rarely comes from following the crowd. It emerges from the courage to pursue ideas that seem impossible—until they aren’t.

Unique insight means nothing without a defensible moat.

Consider your competitive moat early on. While talking to customers helps identify valuable problems to solve, it doesn’t guarantee that your solution will be the only one customers choose. You need to figure out why customers would pick your solution over alternatives.

Technology is not a silver bullet; it is actually a very poor moat because ideas diffuse naturally.

At the beginning, the only competitive advantage of a startup is the time you get because of the ignorance of big players. But you need to turn it into an actual moat — reasons why customers should use you, not other players. For B2B companies, the reason is often data or customer relationships. For B2C, the reason is often branding or better user experiences.

Plus, you need to make sure you have the resources to build your moat, which requires strategic planning if you are already in fierce competition. (Unfortunately, it often creates conflicts with the customer-first culture in B2B).

Don’t be afraid to restart from zero.

If you’re facing strong headwinds and haven’t had time to build a moat, take a step back. Rethink what other valuable problems exist that you believe in but others haven’t yet recognized.

My story isn’t unique – it’s a microcosm of the startup ecosystem. Innovation isn’t about having the perfect idea from day one. It’s about resilience, adaptability, and the willingness to transform setbacks into insights.

Ultimately, I started a new venture in a different sector, carrying these lessons like a compass. Each “failure” was actually a sophisticated learning experience, and helped me transform into a true entrepreneur.

Categories
English

Goodbye Faceless Algorithms, Hello Hachi!

It’s no secret that boredom and loneliness is an epidemic. The average American spends 3 hours a day scrolling through online contents — usually solo.

What’s more, your content stream is controlled by a faceless “algorithm” that feeds you content over which you have very little control or knowledge.

I know this game inside out. I was one of the Google Brain Researchers who worked on the world’s most popular YouTube AI engine.

We built this with good intent: to surface better content to users and keep them engaged – and it worked!

However, there’s one important problem: there exists only one algorithm for everyone in the world. Users have no control over what information they see, and creators have to tailor their content to “game” the algorithm.

Let’s be real, no one’s a fan of this singular, faceless algorithm. That’s why I’m building something new, with the belief that people should be able to choose how they discover information, and be able to put a face to a name.

I’m excited to announce that we are working on Hachi — unique personas that help you search for information, whether it’s text, images, or videos. With Hachi, you will be able to:

  1. Choose your muse. Just like how you choose your friends, you also choose which Hachi you want to spend time with. Imagine having one Hachi for #vanlife, another for the latest Taylor and Travis gossip, and another for minecraft.
  2. Stay Trendy. Hachis are constantly discovering new creators and content that are trending based on your common interests.
  3. Never Alone: Hachis explore with you, share their own insights, and are ready to chat anytime you would like to.

To make this a reality, my co-founder Nancy and I have been hard at work on building out Hachi in the last few months. If this sounds interesting to you, please join our mailing list: https://go.hachizone.ai/mailinglist and our discord community: https://go.hachizone.ai/discord .

We’d love to hear your early thoughts and feedback!

– Jing Conan Wang

Categories
English

Why Is It So Hard to Create a Funny AI?

Large Language Models (LLMs) like ChatGPT have shown impressive results in the past two years. However, people have realized that while these models are incredibly knowledgeable, they often lack humor. Ask any mainstream LLM to tell you a joke, and you’ll likely receive a dull, dad-joke-level response.

For example, here is an joke from ChatGPT

The problem goes beyond just being funny. LLMs have failed to create memorable personalities, resulting in shallow interactions. This is the reason why most of the AI companion products feel like role-playing games. Because people get bored quickly with one character, platforms need to encourage users to create a lot of characters to keep them engaged.

Why does this happen? There are two main reasons:

The first one is that LLMs lack capability of deep reasoning. Creating humor is a challenging task. The two key ingredients for humor are surprise and contradiction, which demand a profound understanding of how things logically work and then intentionally deviating from the norm.

However, LLMs struggle to understand deep logical connections and context, which are essential for humor. They tend to focus on literal interpretations, missing the subtleties that make language humorous.

The second one is the limitation of dataset and evaluation: Many models are trained to excel on specific benchmarks and tasks that are are outdated. The existing LLM evaluation focus heavily on question answering or academic tests because researchers can easily access those. This has resulted in an overemphasis on one particular subdomain at the expense of more nuanced language understanding and creative expression. Consequently, responses generated by these models lack personality.

What is likely the path ahead for it? Here are my take:

1. Better Human-AI collaboration. As current LLMs struggle with understanding the deeper logic of language, making them truly funny might require significant advancements in their reasoning capabilities.

Some progress will be made as LLMs keep getting more parameters. However, it is a long way to go as it is hard to turn LLM into a reasoning machine. A more realistic approach is to harness human wisdom and creativity to help LLMs bypass complex logical reasoning and directly generate funny content.

Humans actually are very good at capturing nuances. It is easier to develop a AI that can that leverage human’s capability build the capability in LLM itself. For example, when creating funny comments for videos, using existing human comments from the video can boost the quality of the jokes. This falls into the domain of Human-based Computation . One famous example for this is CAPTCHA that is used to verify if a human is a bot. At the same time it is also teaching machines to solve hard computer vision tasks.

2. Online Learning: Most current LLMs are offline, taking months to train and then remaining frozen. This makes it nearly impossible for models to adapt based on real-time human feedback. One would argue that retrieval augmented generation (RAG) is just a poor man’s solution to make LLMs to be able to learn the facts in the real-time. However, simple RAG doesn’t have the ability to capture nuances.  We need to design ways for online learning, allowing models to capture and incorporate human feedback in near real-time.

3. Better Evaluation: Current datasets for evaluating AI-generated humor are too narrowly focused. The AI community needs to overcome this limitation to create more comprehensive assessment tools.

The way we interact with Large Language Models (LLMs) is just as important as the answers they provide. Engaging with AI shouldn’t be a dull and boring experience. By following the directions described above, I think we will be able to create truly funny AI systems that can engage in witty, personable interactions with humans in the foreseeable future. 

Categories
English

Why You Should Build a Consumer GenAI Startup and How to Make it Happen

While conventional wisdom holds that B2B startups are the safer choice, is this really the case? Let’s delve into why a consumer-focused GenAI startup might actually be your golden ticket.

In 2023, the startup landscape of GenAI applications experienced a remarkable surge, propelled by the advent of ChatGPT and foundational models such as GPT-4 and Anthropic. Over the past year, venture capital has invested at least $21 billion into GenAI, and most GenAI applications have primarily targeted on B2B, particularly productivity improvement. In the latest Y Combinator batch, 65% of the startups fall within the B2B SaaS and enterprise sectors, whereas only 11% are focused on consumer-oriented verticals. The most popular product form is AI assistant.

Current Challenges in B2B GenAI

However, as we transition into 2024, it has become evident that a lot of startups in the domain are facing significant challenges. A majority of these B2B GenAI companies are grappling with financial losses and are frequently pivoting in an attempt to find product market fit.

Many startup founders struggle to convert Proof-of-Concept contracts into full annual agreements, often facing significant limitations in their bargaining power over pricing. Despite the $21 billion VC investment, GenAI startup only generated around $1 billion in revenue.

Heavy competition is one of the main challenges for startups in converting Proof-of-Concept contracts. But why is there such a strong focus on productivity improvement applications? The reasons are multifaceted and stem from various technology and market dynamics:

First, it is related to the nature of the current foundational models. Foundation models such as GPT-4 are the result of significant research breakthroughs and depend extensively on benchmarks that have been established within the academic community. Historically, these benchmarks have predominantly focused on knowledge-based tasks. For example, the benchmarks used in the GPT-4 technical report primarily consist of academic tests. Essentially, what we are creating with these models are entities akin to exceptionally skilled students or professors. This orientation naturally steers generative AI applications toward productivity enhancements. Consequently, it’s not surprising that students are the primary users of many AI-assisted products like copilots.

Second, there is a B2B-first culture in the American startup ecosystem. The American startup ecosystem has predominantly favored B2B ventures, with the consumer sector receiving significantly less investment over the past decade. Startup founders in US are afraid to build consumer startups. Although other countries such as China do not exhibit this fixed mindset, the U.S. has been a global leader in generative AI research and substantially influencing trends worldwide.

Third, the GenAI infrastructure boom levels the playing field for everyone. In 2023, the majority of investments were directed towards GenAI infrastructure, with many investment firms likening it to a “gold rush.” There’s a prevailing belief that, much like the merchants who sold supplies during a gold rush, those who provide the essential tools and services will profit first. The following figure shows that $16.9B out of the $21B billion VC money was spent on GenAI infrastructure. Newer players can always leverage better infrastructure.

Source: Sequoia Capital’s AI Ascent 2024 opening remarks

Due to the factors mentioned above, competition among productivity-focused GenAI applications is intense, undermining the ability of startups in this space to extract value from customers. As a result, the entire ecosystem remains predominantly financed by venture capital.

The Untapped Potential of Consumer GenAI

History often repeats itself. During the Internet boom of the 1990s, emphasis was initially placed on B2B applications. However, it turned out that the integration of the Internet into business contexts would take longer than anticipated. Salesforce pioneered the SaaS model, but it took nearly a decade to reach the $1 billion revenue milestone. In contrast, consumer applications have proven to be a quicker avenue for both creating and capturing value.

Google, Facebook, and Amazon have each developed consumer products that serve billions of people, discovering unique methods to monetize the internet by reaching vast audiences cost-effectively. Additionally, this approach has proven to be an effective strategy for building strong competitive advantages, or moats.

Strategies for Success

The 7-power framework is a crucial tool for analyzing business opportunities, identifying seven key levers: Scale Economies, Network Economies, Counter-Positioning, Switching Costs, Branding, Cornered Resource, and Process Power. For B2B GenAI startups,

Counter-Positioning and Process Power are typically the only levers B2B GenAI startups can pull due to incumbents holding advantages in the other areas. In contrast, Consumer GenAI startups have the potential to develop competitive moats across almost all these powers, providing numerous strategic advantages — especially if your founding team has strong technical capability in AI models and infrastructure.

It’s crucial for Consumer GenAI companies to own their AI models and infrastructure. This ownership not only fosters the development of Scale and Network Economies but also secures Cornered Resources, enhancing competitive advantage and market positioning.

On the one hand, to create a successful consumer app, controlling costs is crucial. Historical trends in developing larger and more powerful models have made them unsuitable for consumer applications due to high costs as the lifetime value (LTV) of consumer use-cases is typically much lower. For example, the LTV of a user is often just $20-30 but might ask hundreds of questions. However, utilizing all the tokens in GPT-4 can cost approximately $1.28 for a single call. Developing in-house expertise to create models that are both powerful and cost-effective is crucial to bridge the gap.

The good thing is that consumer applications are usually much more tolerant to hallucination, and might not need the most powerful model. In addition, the evolution of open-source models has enabled startups to develop their own models cost-effectively. With the recent launch of LLaMa 3, its 8B small model has outperformed the largest model from LLaMa 2. Additionally, there is anticipation that the 400B model, currently in training, will match the performance of GPT-4. These advancements make it feasible for startups to create high-performing models at a fraction of the cost associated with proprietary models. While significant investment is still necessary to reduce costs sufficiently to support large-scale consumer applications.

On the other hand, current foundational models are not ideally suited for creating robust consumer applications, as most large language models lack personalization and long-term memory capabilities. Developing new foundational models or adapting existing ones to better suit consumer needs is a critical challenge that Consumer GenAI startups must address.

Despite these challenges, startups that successfully tackle these issues can secure a significant competitive edge and establish long-lasting market dominance.

Thanks for reading this article and hope the article is useful for you. If you have any questions or thoughts, please don’t hesitate to comment or message me at jing@jingconan.com 🤗

Categories
English

Unlocking the Wonders of Imaginative Play: A Journey into the Magic of Childhood

One day, my two-year-old daughter, Adalyn, approached me with a desk lamp and a handful of blue glass balls. Puzzled, I watched as she arranged them before me and then asked, “Daddy, what animal is this?”

I couldn’t fathom how a desk lamp and some stones could possibly resemble an animal. For a brief moment, I felt utterly perplexed.

However, after pausing for a few seconds, it dawned on me—Adalyn had conjured up a magical world within her imagination.

Though to me it seemed like mere objects, to her, they were the building blocks of an enchanting creature.

Grateful for the power of imagination and the assistance of technology, I decided to enlist the help of AI to bring Adalyn’s creation to life.

Sending a picture to ChatGPT with a query, “What type of animal is this?” I eagerly awaited its response.

In a matter of moments, ChatGPT wove a beautiful tale:

“In the heart of the magical forest, where the trees whispered secrets and the moonlight danced, lived a giraffe named Zara. With each step she took, her footprints left behind a trail of shimmering blue, marking her path through the enchanted woods…”

Reading the story aloud to Adalyn, her eyes lit up with joy. “Zara!” she exclaimed, delighted to have her creation given life and a name.

At that moment, I realized that it wasn’t my understanding alone that made her happy—it was the connection, the validation of her imagination, and the shared experience of storytelling.

Adalyn beamed at me and proclaimed, “You’re the best dad in the world!” But deep down, I knew it wasn’t solely my doing. It was the magic of childhood imagination and the wonders that technology and storytelling can bring to life.

This experience is also known as Imaginative Play, a crucial activity for child development. Sadly, in today’s fast-paced world, few parents engage in imaginative play with their children. While immensely enjoyable, it demands a significant amount of imagination and mental energy. Unfortunately, as adults, many of us lose touch with our imagination over time, as society often prioritizes “correct answers” over creative thinking.

Upon reflection, I realized that Imaginative Play isn’t just beneficial for children—it holds value for adults too. It fosters creativity, problem-solving skills, and emotional intelligence, all of which are essential for navigating the complexities of life.

Eager to share this revelation, I recounted my experience with my Toastmasters club. I shared how a seemingly mundane moment with my daughter sparked a journey into a magical realm of creativity and imagination. Through this story, I hoped to inspire others to embrace their inner child, reclaim their imagination, and rediscover the joy of imaginative play.

Please check out this video below:

Categories
English

How 500 Lines of Code Challenged a $500M AI Giant, and What Moats GenAI Startups Should Have

Recently, there’s been an interesting development in the big model industry. Perplexity AI, a hot big model company in Silicon Valley, completed a financing round two months ago, valuing it at over $500 million. However, using Lepton AI’s middleware, Lepton’s co-founder Jiayang Qing managed to create an open-source version with just 500 lines of code over a weekend, sparking a heated discussion in the industry. The related demo on GitHub quickly garnered five thousand stars in just a few days. This incident reflects a broader trend, and I’ll analyze it based on my own year and a half of entrepreneurial experience.

Currently, big model companies can be categorized into three types:

  • Base model companies, which primarily provide big model capabilities. This area requires significant capital and resources are highly concentrated among leading companies and giants.
  • Middleware companies, which offer middleware between big models and applications. Jiayang Qing’s Lepton AI falls into this category.
  • Application-layer companies, which directly provide consumer-facing applications. These can further be divided into platform-type application companies, like Perplexity AI, and vertical application companies focusing on niche markets, such as Harvey AI, which recently completed financing for its legal application.

The incident with Perplexity AI and Lepton AI highlights a pain point for application-layer companies — high competitive pressure with insufficient moats. For instance, Perplexity, which aims to solve general information search problems, faces challenges from four fronts: pressure from giants like Google, competition from vertical knowledge applications like Harvey, market encroachment from other knowledge service companies, and disruptions from middleware companies like Lepton AI. Vertical application companies face slightly less pressure, but they still confront these four forces and have a smaller market size, resulting in fewer resources.

So, what can be done? Many entrepreneurs believe that accumulating proprietary data can create a sufficient moat. This is very sensible, but it presupposes a systematic methodology for acquiring proprietary data. Here, I propose a methodology: using contrarian insights to gain a time advantage, leveraging the founder’s personal strengths for breakthroughs, and focusing on data-first products or operational capabilities.

First, contrarian insights, or insights not commonly understood, are essential. Entrepreneurs must find less trodden paths to build their core competitive advantage with minimal resources. What was once a contrarian insight can become common knowledge, such as the combination of big models with chat interfaces, which was novel before ChatGPT but is now common.

Second, the founder’s personal advantage is crucial. While contrarian insights offer temporary protection, they quickly become common knowledge once proven useful. Here, a deep understanding of user pain points in vertical applications can be a personal advantage. For example, one of Harvey’s co-founders was a lawyer. Even if a team doesn’t have a co-founder from a specific industry, previous experiences that can be leveraged as industry advantages are valuable.

Finally, building a data-first product or operational capability is key. The founder’s personal advantage must be systematized to sustain. There are two strategies:

  • Product-driven: The founder uses their deep understanding of user needs to design a product that naturally accumulates high-quality data, enhancing the product experience and creating a flywheel effect.
  • Operation-driven: The founder uses their resources and experience to build an operational system that continually acquires proprietary data, making operations or sales more efficient and faster.

The former suits products focused on Product-led Growth (PLG), while the latter suits those driven by Sales-led Growth (SLG). Both must prioritize data. If product-driven, each feature should contribute to data accumulation. If operation-driven, operations should focus on data, not just revenue or other metrics.

Returning to Perplexity’s case, Jiayang Qing could replicate Perplexity’s main functions over a weekend but not its data accumulation. As a middleware company, Lepton likely doesn’t intend this as a core strategy. However, many new application startups may use this to challenge Perplexity further. Whether Perplexity can withstand this depends on its ability to build a moat with proprietary data.

Categories
English

A Personal Story about LLMs and Storytell.ai

My name is Jing Conan Wang, a co-founder and CTO of Storytell.ai. In October 2022, together with two amazing partners DROdio and Erika, we founded Storytell.ai, dedicated to distilling signal from noise to improve the efficiency of knowledge workers. The reason we chose the name Storytell.ai is that storytelling is the oldest tool for knowledge distillation in human history. In ancient times, people sat around bright campfires telling stories, allowing human experiences and wisdom to be passed down through generations.

The past year has been an explosive one for large language models (LLMs). With the meteoric rise of ChatGPT, LLMs have quickly become known to the general public. I hope to share my own personal story to give people a glimpse into the grandeur of entrepreneurship in the field of large language models.

From Google and Beyond

Although ChatGPT comes from OpenAI, the roots of LLMs lie in Google Brain – a deep learning lab founded by Jeff Dean, Andrew Ng, and others. It was during my time at Google Brain that I formed a connection with LLMs. I worked at Google for five years, spending the first three in Ads engineering and the latter two in Google Brain. Not long after joining Google Brain, I noticed that one colleague after another began shifting their focus to research on large language models. That period (2017-2019) was the germination phase for LLMs, with a plethora of new technologies emerging in Google’s labs. Being in the midst of this environment allowed me to gain a profound understanding of the capabilities of LLMs. Particularly, there were a few experiences that made me realize that a true technological revolution in language models was on the horizon:

One was about BERT — one of the best LLMs before ChatGPT: One day in 2017, while I was in a Google Cafe, a thunderous applause broke out. It turned out that a group nearby was discussing the results of an experiment. Google provides free lunches for its employees, and lunchtime often brings people together to talk about work. A colleague mentioned to me: “Do you know about BERT?” At the time, I only knew BERT as a character from the American animated show Sesame Street, which I had never watched. My colleague told me: “BERT has increased Google Search revenue by 1% in internal experiments.” Google’s revenue was already over a hundred billion dollars a year, meaning this was equivalent to several billion dollars in annual revenue. This was quite shocking to me.

Another was my experience with Duplex: Sunder Pichai released a demo of an AI making phone calls at Google I/O 2018, which caused a sensation in the industry. The project, internally known as Duplex, was something our group was responsible for in terms of related model work. The demo only showed a small part of what was possible; internally, there was a lot more data on similar AI phone calls. We often needed to review the results of the Duplex model. The outcome was astonishing; I could hardly differentiate between conversations held by AI or humans.

Another gain was my reflection on business models. Although I had worked in Google’s commercialization team for a long time and the models I personally worked on generated over two hundred million dollars in annual revenue for Google, I realized that an advertising-driven business model would become a shackle for large language models. The biggest problem with the advertising business model is that it treats users’ attention (time) as a commodity for sale. To users, it seems like they are using the product for free, but in reality, they are giving their attention to the platform. The platform has no incentive to increase user efficiency but rather to capture more attention to sell at a very low price. Valuable users will eventually leave the platform, leading to the platform itself becoming increasingly worthless.

One of the AI applications I worked on at Google Brain was the video recommendation on YouTube’s homepage. The entire business model of Google and YouTube is based on advertising; longer user watch time means more ad revenue. Therefore, for applications like YouTube, the most important goal is to increase the total time users spend on the app. At that time, TikTok had not yet risen, and YouTube was unrivaled in the video domain in the United States. In YouTube’s model review meetings, we often joked that the only way for us to get more usage is to reduce the time people spend eating and sleeping. Although I wanted to improve user experience through better algorithms, no matter how I adjusted, the ultimate goal was still inseparable from increasing user watch time to boost ad revenue.

During my contemplation, I gradually encountered the Software as a Service (SaaS) business model and felt that this was the right model for large-scale models. In SaaS, users only pay for subscriptions if they receive continuous value. SaaS is customer-driven, whereas Google’s culture overly emphasizes an engineering culture and neglects customer value, making it very difficult to explore this path within Google. Ultimately, I was determined to leave Google and decided to start my own SaaS company. At the end of 2019, I joined a SaaS startup as a Founding Member and learned about the building process of a SaaS company from zero to one. 

At the same time, I was also looking for good partners. Finally, in 2021 I was able to meet two amazing partners DROdio and Erika and we started storytell.ai in 2022.

Build a company of belonging

The first thing we did at the inception of our company was to clarify our vision and culture. We want to build a company of belonging by defining our vision and culture clearly. The vision and culture of a company truly define its DNA; the vision helps us know where to go, and the culture ensures we work together effectively. 

Storytell’s vision is to become the Clarity Layer, using AI to help people distill signal from noise (https://go.storytell.ai/vision).  — a company with great vision and culture.

We have six cultural values: 1) Apply High-Leverage Thinking. 2) Everyone is Crew. 3) Market Signal is our North Star. 4) We Default to Transparency. 5) We Prioritize Courageous Candor in our Interactions. 6) We are a Learning Organization. Please refer to this https://go.Storytell.ai/values for details. 

We also pay special attention to team culture building during the company’s creation process. From the start, we hope to work hard but also play harder. We have offsite gatherings every quarter. The entire team is very fond of outdoor activities and camping, so we often hold various outdoor events (we have a shared album with photos from the very first day of our establishment). We call ourselves the Storytell Crew, hoping that we can traverse the stars and oceans together like an astronaut crew.

Build a Product that people love

In the early stages of a startup, finding Product-Market Fit (PMF) is of utmost importance. Traditional SaaS software emphasizes specialization and segmentation, with typically only a few companies iterating within each niche, and product stability may take years to achieve. This year, ChatGPT brought about a radical market change. The explosive popularity of ChatGPT is a double-edged sword for SaaS software entrepreneurs. On one hand, it reduces the cost of educating the market; on the other hand, the entire field becomes more competitive, with a surge of entrepreneurs entering the market and diverting customer resources. The influx of ineffective traffic brought by ChatGPT ultimately fails to convert effectively into the product.

Many believe that the moat for startups applying large models is technology or data. We think neither is the case. The real moat is the skill in wielding this double-edged sword. Good swordsmanship can transform both edges of the sword into a force that breaks through barriers:

  1. On one hand, for traditional SaaS, it’s about leveraging the momentum of ChatGPT to maximize the impact on traditional SaaS. Make customers feel the urgency to keep up with the times. Develop AI Native features that incumbents find hard to follow.
  2. On the other hand, use the competition to bring about a thriving ecosystem and have a methodical and steadfast approach in product iteration, ultimately shortening the product iteration cycle to achieve the greatest momentum.

We follow these two principles in our own product iteration.

1) Data-guided: In the iteration process, we use the North Star Metric to guide our general direction. Our North Star Metric is:

Effective Reach = Total Reach   x   Effective Ratio

Total reach is the number of summaries and questions asked on our platform each day. The Effective Ratio is a number from 0 to 1 that indicates how much of the content we generate is useful for users.

2) User-driven. Drive product feature adjustments through in-depth communication with users. For collecting user feedback, we’ve adopted a combination of online and offline methods. Online, we use user behavior analysis tools to identify meaningful user actions and follow up with user interviews to collect specific feedback. Offline, we organize many events to bring users together for brainstorming sessions.

With this approach in mind, our product has undergone multiple rounds of iteration in the past year.

V0: Slack Plugin

Since June 2022, Erika, DROdio and I have been conducting numerous customer discovery calls. During our interviews with users, we often needed to record the conversations. We primarily used Zoom, but Zoom itself did not provide a summarization tool back then. I used the GPT-3 API to create a Slack plugin that automatically generates summaries. Whenever we had a Zoom meeting, it would automatically send the meeting video link to a specific Slack channel. Subsequently, our plugin would reply with an auto-generated summary. Users could also ask some follow-up questions in response.

At that time, there weren’t many tools available for automatically generating summaries, and every user we interviewed was amazed by this tool. This made us gradually shift our focus towards the direction of automatic summarization. The Slack plugin allowed us to collect a lot of user feedback. By the end of December 2022, we realized the limitations of the Slack plugin. 

  1. Firstly: Slack is a system with high friction. Only system administrators can install plugins; regular employees cannot install plugins themselves. 
  2. We had almost no usage of our Slack plugin over the weekends. The likelihood of users using Slack in their personal workflows was low.
  3. Slack’s own interface caused a great deal of confusion for our users.

V1: Chrome Extension

We began developing a Chrome extension in December 2022, primarily to address the issues mentioned above. While Chrome extensions also have friction, users have the option to install them individually. Chrome extensions can also automatically summarize pages that users have visited, achieving the effect of AI as a companion. Additionally, Chrome extensions facilitate better synergy between personal and work use. During the iteration process of the Chrome extension, we realized that chat is one of the most important means of interaction. Users can accurately express their needs by asking questions (or using prompt words). Although we allowed users to ask questions during the Slack phase, the main focus was still on providing a series of buttons. In the iteration process of the Chrome extension, we discovered that the chat interface is very flexible and can quickly uncover customer needs that weren’t predefined.

On January 17th, we released our Chrome extension. However, on February 7th, Microsoft released Bing Chat (later known as Copilot), integrated into Microsoft Edge. By March, the Chrome Store was flooded with Copilot copycats. We quickly realized that the direction Copilot was taking would soon become a saturated market. Additionally, during the development of our Chrome extension, we became aware of some bottlenecks. The friction in developing Chrome extensions is quite high. Google’s Web Store review process takes about a week. This wouldn’t be a problem in traditional software development, but it’s very disadvantageous for the development of large models. This year, the iteration speed of large models is essentially daily. If we update only once a week, it’s easy to fall behind.

V2: VirtualMe™ (Digital Twin)

In March 2023, we began developing our own web-based application. Users can upload their documents or audio and video files, and then we generate summaries, allowing users to ask corresponding questions. Our initial intention was to build a user interaction platform that we could control. The development speed of the web-based application was an order of magnitude faster than the Chrome extension. We could release updates four to five times a day without waiting for Google’s approval. Moreover, with the Chrome extension, we could only use a small part of the browser’s right side. There were many limitations in interaction, but with the web-based platform, we have complete control over user interactions, allowing us to create more complex user-product interactions.

During this process, we learned that it is very difficult to retain users with utility applications. Users typically leave as soon as they are done with the tool, showing no loyalty. Costs remain high. Moreover, with a large number of AI utility tools going global, the field is becoming increasingly crowded.

We began deliberately filtering our users to interview enterprise users and understand their feedback. By June 2023, we realized that the best way to increase user stickiness was to integrate tightly with enterprise workflows. Enterprise workflows naturally result in data accumulation, and becoming part of an enterprise’s workflow enhances the product’s moat.

We started thinking about how our product could integrate with enterprise workflows. We came up with the idea of creating a personified agent. Most of the time when we encounter problems at work, we first ask our colleagues. A personified agent could integrate well with this workflow. We quickly developed a prototype and invited some users for beta testing.

Our initial user scenario envisioned that everyone could create their own digital twin. Users could upload their data to their digital twin so that when they are not online, it could answer questions on their behalf. After launching the product, we found that the most common use case was not creating one’s own digital twin, but creating the digital twin of someone else. For instance, we found that product managers were heavy users of our product. They mainly created digital twins of their customers to ask questions and see how the customers would respond.

During the VirtualMe™ phase, we began to refine our enterprise user persona for the first time. We identified several user personas, mainly 1. Product Managers, 2. Marketing Managers, 3. Customer Success Managers. Their common characteristic is the need to better understand others and create accordingly.

At the end of July, we organized an offline event and invited many users to test our VirtualMe product together. They found our product very useful, but they had significant concerns about the personified agent. Personal branding is very important for our user group. They were worried that what the virtual twin says could impact their personal brand, especially since large models generally still have the potential for “hallucination.”

It was also at this event that users mentioned the part of our product they found most useful was the customizable Data Container and the ability to quickly generate a chatbot. At that time, no other product on the market could do this.

V3: SmartChat™

Starting in August, we began to emphasize data management features based on this approach and launched SmartChat™. In SmartChat™, once users upload data, we automatically extract tags from the content. Users can also customize tags for data management. By clicking on a tag, the ChatBot will converse based on the content associated with that tag. At the same time, we also launched an automation system that runs prompts for users automatically, pushing the results to the appropriate audience via Slack or email.

The following figure shows our North Star Metric (NSM) up to December 1st of this year. At the beginning of the year, during the Slack plugin phase, our NSM was only averaging around 1. During the Chrome Extension phase, our NSM reached the hundreds. VirtualMe™ pushed our NSM up to 5,000.

By early December, our NSM was close to 20,000. Previously, our growth was entirely organic. By this time, we felt we could start to do a bit of growth hacking. In December, we started some influencer marketing activities, and our NSM grew by 30 times, reaching 550K!

From an NSM of less than 1 at the beginning of the year to 550K by the end of the year, in 2023 we turned Storytell from a demo into a product with a loyal user base. I am proud of our Crew and very grateful to our early users and design partners.

Words at the end

From a young age, I have been particularly fond of reading books on the history of entrepreneurship. The year 2023 marks the beginning of a new era for me to embark this journey. I know the road ahead is challenging, but I am fortunate to experience this process firsthand with my two amazing partners and our Crew. Regardless of the outcome, I will forge ahead with all the Storytell Crew, fearless and without regret. Looking forward to Storytell riding the waves in 2024!


Also, Storytell.ai is hiring front-end and full-stack engineers: https://go.storytell.ai/fse-role. If you are interested or you know anyone might be interested, please don’t hesitate to contact me at my email jingconan@storytell.ai.