llm – Jing Conan Wang

Andrej Karpathy, a prominent voice in the AI community, recently brought the term “Context Engineering” to the forefront. It describes the intricate art of manually crafting prompts and data to guide Large Language Models. While the concept is gaining significant attention, I believe it points us in the wrong direction.

The future of personal AI isn’t about endlessly engineering context, but requires a radical shift to what I call ‘context modeling.’

This isn’t just semantics—it’s the difference between a temporary patch and a real solution.

The Limitations of Current RAG Systems

Today’s Retrieval-Augmented Generation (RAG) systems follow a relatively straightforward paradigm. They retrieve relevant information using rule-based systems—typically employing cosine similarity to find the top-k most relevant results—and then present this context to a large language model for processing. While this approach has proven effective in many scenarios, it suffers from significant limitations.

Think of current LLMs as exceptionally intelligent but stubborn team members. They excel at processing whatever information is presented to them, but they interpret data through their own fixed worldview. As these models become larger and more complex, they also become increasingly “frozen” in their approaches, making it difficult for developers to influence their internal decision-making processes.

From Engineering to Modeling: A Paradigm Shift

The conventional approach of context engineering focuses on creating more sophisticated rules and algorithms to manage context retrieval. However, this misses a crucial opportunity. Instead of simply engineering better rules, we need to move toward context modeling—a dynamic, adaptive system that generates specialized context based on the current situation.

Context modeling introduces a personalized model that works alongside the main LLM, serving as an intelligent intermediary that understands both the user’s needs and the optimal way to present information to the large language model. This approach recognizes that effective AI systems require more than just powerful models; they need intelligent context curation.

Learning from Recommendation Systems

The architecture for context modeling draws inspiration from the well-established two-stage recommendation systems that power many of today’s most successful platforms. These systems consist of:

Retrieval Stage: A fast, efficient system that processes large amounts of data with a focus on recall and speed.
Ranking Stage: A more sophisticated system that focuses on accuracy, distilling signal from noise to produce the best results.

RAG systems fundamentally mirror this architecture, with one key difference: they replace the traditional ranking component with large language models. This substitution enables RAG systems to solve open-domain problems through natural language interfaces, moving beyond the limited ranking problems that traditional recommendation systems address.

However, current RAG implementations have largely overlooked the potential for model-based retrieval in the first stage. While the industry has extensively explored rule-based retrieval systems, the opportunity for intelligent, adaptive context modeling remains largely untapped.

The Context Modeling Solution

Context modeling addresses this gap by introducing a specialized model dedicated to generating context dynamically. This model doesn’t need to be large or computationally expensive—it can be a focused, specialized system trained on relevant data that understands the specific domain and user needs.

The key advantages of context modeling include:

Adaptability: Unlike rule-based systems, context models can learn and adapt to new patterns and user behaviors over time.
Personalization: These models can be trained on user-specific data, creating truly personalized AI experiences that understand individual contexts and preferences.
Efficiency: By using smaller, specialized models for context generation, the system maintains efficiency while providing more intelligent context curation.
Developer Control: Context modeling provides agent developers with a trainable component they can influence and improve, creating opportunities for continuous learning and optimization.

The Ideal Architecture: Speed and Specialization

For context modeling to be viable, it must satisfy one critical requirement: speed. The latency of the core LLM is already a significant bottleneck in user experience.

Right now, the main workaround is streaming the response. However, the latency to the first token cannot be mitigated by streaming. The end-to-end latency of the retrieval model contributes to the latency of the first token. Any context modeling system must be exceptionally fast to avoid compounding this delay.

This brings us to the concept of “thinking” models, which use their own internal mechanisms to retrieve and reason over context before generating a final answer. In a sense, these models perform a specialized form of context modeling. However, their primary challenge is that this “thinking” process is slow and computationally expensive.

I argue that these monolithic “thinking” models are an intermediate step. The optimal, long-term architecture will decouple the two primary tasks. It will feature two distinct models working in tandem, mirroring the two-stage systems that have been so successful in recommendations:

A Fast Context Model: A highly optimized, specialized model dedicated solely to retrieving and generating the most relevant context at incredible speed.
A Powerful Core Model: The large language model that receives this curated context and focuses on the complex task of reasoning, synthesis, and final response generation.

This dual-model approach allows for specialization, where each component can be optimized for its specific task, delivering both speed and intelligence without compromise.

The Infrastructure Opportunity

Context modeling represents a common infrastructure need across the AI industry. As more organizations deploy RAG systems and AI agents, the demand for sophisticated context management will only grow. This presents an opportunity to build foundational infrastructure that can support a wide range of applications and use cases.

The development of context modeling systems requires expertise in both machine learning and system design, combining the lessons learned from recommendation systems with the unique challenges of natural language processing and generation.

Looking Forward

The future of personalized AI lies not in building ever-larger language models, but in creating intelligent systems that can effectively collaborate with these powerful but inflexible models. Context modeling represents a crucial step toward this future, enabling AI systems that are both powerful and adaptable.

As we move forward, the organizations that successfully implement context modeling will have a significant advantage in creating AI systems that truly understand and serve their users. The shift from context engineering to context modeling isn’t just a technical evolution—it’s a fundamental reimagining of how we build intelligent systems that can adapt and personalize at scale.

The question isn’t whether context modeling will become the standard approach, but how quickly the industry will recognize its potential and begin building the infrastructure to support it. The future of personalized AI depends on our ability to move beyond static rules and embrace dynamic, intelligent context generation.

Questions or feedback? I’d love to hear your thoughts.

Want more insights? Follow me:

🎙️ Founder Interviews: https://www.youtube.com/@FounderCoHo
Conversations with successful founders and leaders.

🚀 My Journey: https://www.youtube.com/@jingconan
Building DeepVista from the ground up.

Everyone’s obsessed with making AI reason deeper — train models to solve complex mathematics and master intricate proofs. But in this race for artificial intelligence, we’ve forgotten something fundamental about human intelligence – how we actually think and work with others.

Deep reasoning is important, but it’s only half the story. What if, instead of treating AI as tools to be prompted, we can engage with them as naturally as we do with brilliant colleagues – each bringing their unique perspective to the conversation? This isn’t just about making AI think deeper – it’s about unlocking its breadth of knowledge through better questions.

Asking good questions is harder than finding answers. Try this mental exercise: give yourself two minutes to write down ten meaningful questions about different topics. Hard, isn’t it? The difficulty lies in where to even begin. With regular problems, we at least have a starting point – a puzzle to solve, a goal to reach. But with questions, we’re creating the map before we know the territory. It’s not about finding the right path; it’s about imagining what paths might exist in the first place.

And this is where LLMs shine: they’re incredible brainstorming partners. Not just because they can process vast amounts of information, but because they can consider countless angles that might never occur to us. Humans are limited by our experiences and cognitive biases. LLMs don’t have these limitations. They can draw connections between seemingly unrelated concepts and suggest possibilities we might never have considered.

The challenge isn’t that LLMs lack knowledge – they have plenty. The real problem is that we don’t know how to extract it effectively. We initially thought prompt engineering was the magic key, but that turned out to be too simplistic. It works, but it feels mechanical and constrained. We don’t think about conversation frameworks or prompt strategies when chatting with friends. We just… talk.

What we need is AI that can match this natural fluidity of human conversation. Imagine an AI system that could seamlessly shift between different types of expertise, like a conversation with a renaissance polymath who can speak authoritatively about both quantum physics and Renaissance art. This isn’t just about role-playing – it’s about having access to different ways of thinking about problems.

Imagine an AI system that could seamlessly shift between different types of expertise, allowing you to converse with a diverse range of experts. For instance, you could discuss quantum physics with a renowned physicist like Albert Einstein, then transition to a conversation about Renaissance art with an art historian like Giorgio Vasari. This isn’t merely about role-playing; it’s about accessing diverse perspectives and specialized knowledge to enhance your understanding and problem-solving abilities.

I’ve started calling this concept “persona-driven discovery,” and I believe it could revolutionize how we learn and solve problems by acting as a catalyst for serendipity.

We’ve all had those magical moments in libraries where we stumble upon exactly the book we needed but weren’t looking for. These AI systems could create those moments deliberately, suggesting unexpected perspectives and prompting us to explore unfamiliar territories. It’s like having a brilliant friend who knows when to push you out of your intellectual comfort zone.

All of this points toward a future where AI tools aren’t just answering our questions but actively participating in our thinking process. They could help us prototype ideas faster, facilitate group brainstorming sessions, and create personalized learning experiences that adapt to our individual ways of thinking.

The real breakthrough will come when we stop thinking about these systems as tools and start thinking about them as thought partners. This shift isn’t just semantic – it’s fundamental to how we might solve problems in the future. Instead of asking an AI to complete a task, we might engage it in a genuine dialogue that helps us see our challenges from new angles.

The building blocks are already there: we have models that can process and generate human-like text, we have systems that can maintain context in conversations, and we’re developing better ways to keep AI knowledge current and relevant.

There are still unsolved problems. When we make AI systems more specialized (like training them to be history experts), they often lose their broader capabilities. It’s the same trade-off you get with human experts – deep knowledge in one area often comes at the cost of breadth. The trick will be creating systems that can maintain both depth and breadth, switching between different modes of thinking without losing their fundamental capabilities.

This is what role-aware AI systems could offer – not just rigid question-and-answer sessions, but fluid conversations where different perspectives emerge organically as needed. Each AI “participant” would bring their unique expertise and viewpoint while staying current with the latest developments in their field. They would build on each other’s insights, challenge assumptions, and help you see problems from angles you might never have considered on your own. The key isn’t just having access to different types of knowledge, but having them work together in a way that mirrors the natural give-and-take of human conversation.

The potential impact of this shift could be profound. Just as the internet changed how we access information, these AI thought partners could change how we process and use that information. They could help us break out of our mental ruts, see connections we might have missed, and approach problems from angles we might never have considered.

This is the future I’m excited about – not one where AI replaces human thinking, but one where it enhances and expands it in ways we’re just beginning to imagine.

BTW, this essay is written after a long brainstorming session with the Hachi of “Paul Graham”. You can check it here: https://go.hachizone.ai/pg-think-wider

Please message me at jing AT hachizone.ai if you have any feedback on this essay or Hachi in general.