Categories
English

Why You Should Build a Consumer GenAI Startup and How to Make it Happen

While conventional wisdom holds that B2B startups are the safer choice, is this really the case? Let’s delve into why a consumer-focused GenAI startup might actually be your golden ticket.

In 2023, the startup landscape of GenAI applications experienced a remarkable surge, propelled by the advent of ChatGPT and foundational models such as GPT-4 and Anthropic. Over the past year, venture capital has invested at least $21 billion into GenAI, and most GenAI applications have primarily targeted on B2B, particularly productivity improvement. In the latest Y Combinator batch, 65% of the startups fall within the B2B SaaS and enterprise sectors, whereas only 11% are focused on consumer-oriented verticals. The most popular product form is AI assistant.

Current Challenges in B2B GenAI

However, as we transition into 2024, it has become evident that a lot of startups in the domain are facing significant challenges. A majority of these B2B GenAI companies are grappling with financial losses and are frequently pivoting in an attempt to find product market fit.

Many startup founders struggle to convert Proof-of-Concept contracts into full annual agreements, often facing significant limitations in their bargaining power over pricing. Despite the $21 billion VC investment, GenAI startup only generated around $1 billion in revenue.

Heavy competition is one of the main challenges for startups in converting Proof-of-Concept contracts. But why is there such a strong focus on productivity improvement applications? The reasons are multifaceted and stem from various technology and market dynamics:

First, it is related to the nature of the current foundational models. Foundation models such as GPT-4 are the result of significant research breakthroughs and depend extensively on benchmarks that have been established within the academic community. Historically, these benchmarks have predominantly focused on knowledge-based tasks. For example, the benchmarks used in the GPT-4 technical report primarily consist of academic tests. Essentially, what we are creating with these models are entities akin to exceptionally skilled students or professors. This orientation naturally steers generative AI applications toward productivity enhancements. Consequently, it’s not surprising that students are the primary users of many AI-assisted products like copilots.

Second, there is a B2B-first culture in the American startup ecosystem. The American startup ecosystem has predominantly favored B2B ventures, with the consumer sector receiving significantly less investment over the past decade. Startup founders in US are afraid to build consumer startups. Although other countries such as China do not exhibit this fixed mindset, the U.S. has been a global leader in generative AI research and substantially influencing trends worldwide.

Third, the GenAI infrastructure boom levels the playing field for everyone. In 2023, the majority of investments were directed towards GenAI infrastructure, with many investment firms likening it to a “gold rush.” There’s a prevailing belief that, much like the merchants who sold supplies during a gold rush, those who provide the essential tools and services will profit first. The following figure shows that $16.9B out of the $21B billion VC money was spent on GenAI infrastructure. Newer players can always leverage better infrastructure.

Source: Sequoia Capital’s AI Ascent 2024 opening remarks

Due to the factors mentioned above, competition among productivity-focused GenAI applications is intense, undermining the ability of startups in this space to extract value from customers. As a result, the entire ecosystem remains predominantly financed by venture capital.

The Untapped Potential of Consumer GenAI

History often repeats itself. During the Internet boom of the 1990s, emphasis was initially placed on B2B applications. However, it turned out that the integration of the Internet into business contexts would take longer than anticipated. Salesforce pioneered the SaaS model, but it took nearly a decade to reach the $1 billion revenue milestone. In contrast, consumer applications have proven to be a quicker avenue for both creating and capturing value.

Google, Facebook, and Amazon have each developed consumer products that serve billions of people, discovering unique methods to monetize the internet by reaching vast audiences cost-effectively. Additionally, this approach has proven to be an effective strategy for building strong competitive advantages, or moats.

Strategies for Success

The 7-power framework is a crucial tool for analyzing business opportunities, identifying seven key levers: Scale Economies, Network Economies, Counter-Positioning, Switching Costs, Branding, Cornered Resource, and Process Power. For B2B GenAI startups,

Counter-Positioning and Process Power are typically the only levers B2B GenAI startups can pull due to incumbents holding advantages in the other areas. In contrast, Consumer GenAI startups have the potential to develop competitive moats across almost all these powers, providing numerous strategic advantages — especially if your founding team has strong technical capability in AI models and infrastructure.

It’s crucial for Consumer GenAI companies to own their AI models and infrastructure. This ownership not only fosters the development of Scale and Network Economies but also secures Cornered Resources, enhancing competitive advantage and market positioning.

On the one hand, to create a successful consumer app, controlling costs is crucial. Historical trends in developing larger and more powerful models have made them unsuitable for consumer applications due to high costs as the lifetime value (LTV) of consumer use-cases is typically much lower. For example, the LTV of a user is often just $20-30 but might ask hundreds of questions. However, utilizing all the tokens in GPT-4 can cost approximately $1.28 for a single call. Developing in-house expertise to create models that are both powerful and cost-effective is crucial to bridge the gap.

The good thing is that consumer applications are usually much more tolerant to hallucination, and might not need the most powerful model. In addition, the evolution of open-source models has enabled startups to develop their own models cost-effectively. With the recent launch of LLaMa 3, its 8B small model has outperformed the largest model from LLaMa 2. Additionally, there is anticipation that the 400B model, currently in training, will match the performance of GPT-4. These advancements make it feasible for startups to create high-performing models at a fraction of the cost associated with proprietary models. While significant investment is still necessary to reduce costs sufficiently to support large-scale consumer applications.

On the other hand, current foundational models are not ideally suited for creating robust consumer applications, as most large language models lack personalization and long-term memory capabilities. Developing new foundational models or adapting existing ones to better suit consumer needs is a critical challenge that Consumer GenAI startups must address.

Despite these challenges, startups that successfully tackle these issues can secure a significant competitive edge and establish long-lasting market dominance.

Thanks for reading this article and hope the article is useful for you. If you have any questions or thoughts, please don’t hesitate to comment or message me at jing@jingconan.com 🤗

Categories
中文

2024: 关于中美创业生态的思考

借着我筹备新创业项目的契机, 过去三个星期我去中国跑了好几个城市,和很多朋友进行了深入交流,也有非常多的感悟和收获。这些也希望和大家分享一下。

这次创业之初为什么要去中国考察?我的职业和创业经历都在美国,上次创业基本上仅仅依托美国的创业生态系统,很少利用到作为华人的优势。但是我周围最优秀的朋友和师长都告诉我:一个优秀的企业Day 1就要是一个国际化的企业。如何能够整合全球的资源是一个成功企业的必要要素。尤其是我的华科师兄姚欣帮我系统的梳理了中国国内创业生态系统相比美国的生态系统的三大优势和劣势。1)产品思维。2)供应链优势,3)人才红利 。我在过去一年的创业过程中见到了很多非常好的中国出海企业, 对这些优势已经有了一定的了解。但是耳听为虚,眼见为实。这次回国也是希望能够更加系统的了解和观察。思考如何利用好这三大优势帮助构建下一次创业的护城河。

我先聊聊产品思维。上一波移动互联网对中国的影响力远远超过了美国。在这个过程中锤炼出了一大批非常优秀的产品人。通常来说对于国内产品人对产品的使用场景想得特别的细致。比如对比一下谷歌和百度的地图APP。谷歌地图是一个非常简洁的APP。用户选择目的地,获得导航信息,除此以外没有其他的设计。相比之下,百度地图至少就包括了 1)推荐车道的指示,2)语音包,2)一键打车。4)路况实时提示。5)违规拍照提示。这还只是一小部分,基本上常见的痛点都有一整套完整的功能设计。而且这个并非个例,比如滴滴和Uber相比对场景想得也精细得多。同时国内还需要大量美国完全没有出现的应用,比如本地生活这个门类在美国就完全没有出现。

我在回国之初我们项目只是想到了这些点 1)要做个性化大模型,2)首先切入陪伴,3)可能包含一些硬件。但是国内的一些优秀产品人聊过之后我意识到我的想法显然是过于粗了。至少我需要有这些问题的答案:产品形态是什么?用户为什么需要做陪伴?获得的价值是什么,什么样的产品功能能够体现这个价值?谁是获得主要价值的人?这个是给我的一个系统性的启发,让我开始仔细的思考产品的场景。

接着说说供应链优势。这次对我印象非常深刻的一点就是深圳的硬件生态系统在全球范围是遥遥领先的。在美国做硬件项目的难度极高,里面只要还有一点硬件元素基本难倒了99.999%的创业者。但是在深圳,具有相当完备的硬件生态的完善,使得硬件创业的门槛大幅降低。比如我短短几天至少找到三家可以大致生产我所需要硬件的供应商。而且大家对合作都非常的开放。

当然我还是感受到虽然深圳的硬件系统发达,在创业的起始阶段,碰硬件是一个非常危险的事情。主要两个问题。

  1. 虽然深圳硬件的生态系统很发达,但是如果需要真正用上深圳的供应链优势,你必须对硬件指标有了很明确的需求。一旦出现错误,对于时间和资金的浪费是创业公司难以承受的。
  2. 库存对于资金的消耗是非常大的,容易引发资金链断裂。

做硬件比较适合公司已经找到了产品的Product market fit,需要进一步稳固护城河和第二增长极的时刻,比如是B轮之后。

最后说说人才红利。在过去十几年互联网的发展也积累了大量的人才。而且随着过去两年的互联网的形势变化,和人民币对于美元的贬值,人才成本普遍下降了30%以上。使得在国内进行招聘变得非常的有吸引力。总的来说,国内一线城市的用工成本大概是旧金山湾区的一半左右。二线城市(比如成都武汉)是一线城市的70%左右。中国本身其实也分为几个小的创业生态系统,各个系统的人才还是不一样。一线城市里面,北京在互联网人才集中程度上具有明显优势。上海人的产品经理(尤其是在本地生活领域)特别强,深圳的硬件人才非常的多。二线城市里面我只去了成都和武汉。成都的软件人才是明显多于武汉。但是武汉有很多光学和硬件的人才,相对比较均衡。

另外我的感受是从人才密度和环境来说,国内二线城市和一线城市的差距要远远大于一线城市和硅谷的差距。尤其是AI方向的人才,二线城市非常的缺乏。一线城市在人才的密度,人的信息上来说和美国基本比较接近的。但是二线城市还是信息流通不畅。很多在一线城市大多已经比较常见的信息,在二线城市基本上还是很少人知道。相对来说,如果仅仅是比较传统的软件开发或者测试,放在二线城市没有问题。但是如果需要非常快速更新的领域(比如AI岗位),还是要放在硅谷或者一线城市。加拿大也是不错的选项,成本略高于国内一线城市,但是和硅谷没有时差,沟通起来容易很多。

另外,国内的APP开发已经达到了流水线化生产的地步。基本上APP开发的速度可以达到美国类似团队的两倍以上。如果能够合理的利用,这是一个非常大的优势。

说了这么多的优势,我也想说说我看到了需要注意的地方:

  1. 一是国内对于diversity较不看重。这个导致了生态系统偏向同质化竞争, 创新性不足。如果有大量的国内团队,如何保证团队的diversity和创新性是一个需要注意的事情。
  2. 二是中国的很多经验不能直接照搬到美国。前几年有一阵子Copy From China的热潮,有很多公司把国内的支付和本地生活的项目照搬到美国 ,但是没有特别成功的。这次我也对这个问题有了系统的反思。中国是一个单一文化国家,全中国14亿人的消费习惯都比较接近。而且由于竞争压力大,很多产品设计是对中国消费者的习惯过度拟合的。与此相比美国不是一个单一文化文化国家。和中国文化类似的人群只有600万左右的华裔和2000万左右的亚裔。每个人群具有非常明显不同的消费习惯。仅仅照搬中国的产品形态会对于华人群体形成过拟合,使得产品在切入美国其他族裔人群的时候遇见困难。但是美国的华裔的人口只相当于中国的一个地级市,仅仅服务这个人群又不足以支撑一个大型的公司。

中国的消费商业模式向美国溢出是一个大概率事件。但是并不是所以消费业务形态都会适合。 在美国要做好消费类的业务,必须打的是人类的普遍感情。不能够对于具体的人群过分的优化。什么事普遍情感?迪士尼的产品其实主打的就是一个非常朴素的人类普遍感情 — 真善美。这是它为什么能够迈向全球的原因。中国的亲子陪伴类业务基本上都完全转向了教育 — 比如课本绘本或者学习机之类。这个如果直接照搬是不行的。华人文化对于学习是非常的重视的,但其他族裔对于教育的关注的程度要远低于华人。如果针对教育过分优化,会难以跳脱华人群体。但是害怕孤单是一个普遍的情感,从陪伴这个角度切入会是更好的选择。

回过头来说这次的考察的思考。我觉得整合全球的资源是不仅是一个优秀企业家的机会而且也是一种责任。过去的几年似乎全球化碰到了一些波折,但是我觉得全球融合的大趋势是不会发生改变的 — 不论是大量的优秀中国公司出海,还是更多的外国企业整合中国的资源。无论文化或者族裔,人对于美好生活的向往是共通的,只有全球资源一体化,才能够更加有效的创造出优质的产品,为每一个人创造更加美好的生活。

link of this post: https://s.jing.me/2024-cn-us-startup-ecosystem

Categories
中文

从零到一:大模型技术的创造史

虽然大模型是在OpenAI推出了ChatGPT之后进入公众视野的,但是在ChatGPT之前大模型技术已经经历了很长一段时间的演化。我从2014年到2019年在谷歌工作,大模型的许多关键技术是在这段时间在谷歌大脑成熟的。整个技术的成型不是一蹴而就,而是由许多优秀的研究者和工程师接力完成。我有幸参与其中,作为一个亲历这个过程的人,这里希望记录一下这段经历。让大家了解一下这项划时代技术从零到一时期的一些故事。

大模型的开发不仅需要强大的计算能力,还需要解决于三个关键问题:架构、训练算法和数据。因为搜索引擎本质就是用自然语言进行互联网数据的查询,谷歌一直在自然语言处理方向的研究上面重金投入,最终逐渐解决了了这三个问题。

首先是架构。谷歌大脑一个很重要的任务是为谷歌的翻译任务提供模型支持。语言本身就是一个序列,所以很自然大家就开开始使用一种称为序列模型(Sequence Models)的神经网络来做这个。与上一代的卷积神经网(CNN)相比,序列模型是专门用于处理序列数据的。每一层的输出又称为下次迭代的输入。这样就可以处理很长的序列。

谷歌翻译是支持多种语言的相互翻译的,所以需要有一种架构可以非常容易扩展语言而不需要重新训练。从直观上来说,一种语言不过是为了表达特定含义的符号,翻译的本质就是使用不同的符号系统来表示同一个含义。于是大家考虑到如果能够用一个序列模型提取一种语言要表达的含义(编码器encoder),然后在用另外一个序列模型将将之转化成为另一种语言的符号(解码器decoder)。这样encoder-decoder模型架构就出现了。

Encoder-decoder模型架构刚刚出现(2014年左右)的时候效果不好,主要问题就是序列模型(sequence model)很容易遗忘信息,一般只能记住几个上下词,这就是我们提到的context window limit滥觞。这种问题之前也已经有一种方法解决,(LSTM, 长短时记忆模型)。LSTM效果不错,但是特别复杂,很难写。所以大家一直在寻在更加简化的架构。

Transformer架构(变压器架构)就是在这个背景下面诞生的 (2017年)。比起LSTM来说,Transformer架构简单很多,效果也非常的好。后来所有的大模型都是基于这个架构。Transformer的八位作者后来都离开了谷歌,几乎每个人都创立了一个独角兽公司,这就是所谓的“Transformer八子”。 Transformer架构刚刚出来的时候并没有引起大家的注意。主要是这样的架构在当时非常的难以训练,只有少数的人可以使用。

第二个部分是训练算法。2013年左右时候,谷歌发明了词嵌入技术(Word Embedding或者 Word2Vec)。这个技术能够把一个单词映射成为向量。如果两个词在语义上面比较接近,那么他们在向量空间的距离也会比较接近。词嵌入技术被首先用到了谷歌的搜索系统和广告机器学习系统里面作为特征,带来了很大的提升。但是很快大家发现了词嵌入技术存在一些问题。自然语言里面的词都具有多义性。同样的词随着它所处的上下文会具有完全不同的语义。

如何能够生成一个能够考虑上下文的词嵌入呢? 这个时候大家考虑到可以将一串词序列顺序的处理。如果从前往后扫一遍就可以知道上文,如果从后往前扫一遍就可以知道下文。这样上下文就都有了。这种考虑上下文的思路就演进成为了BERT(双向变压器表征编码器, 2018年)。

BERT这种双向的特点和后来OpenAI的GPT只是从前往后的范式稍有不同,但是原理都是一样。BERT最大的优点就是特别适合生成词嵌入(Embedding)。当时BERT的词嵌入用到谷歌的搜索广告系统里面让产生了1%的营收增长。谷歌每年大概2000亿美金的收入,光这一项改进,就是一年20亿美金的收入。BERT的出现也直接催生了谷歌的TPU项目。

BERT论文一个重大的贡献是让预训练-微调范式发扬光大了。这个范式其实在谷歌内部很早就开始有了。主要是谷歌研究部门是一种混合制的研究部门。既有人做基础模型研究,也有人(Applied Scientist)到各个部门去做落地。最开始每个部门的模型都是从头训练。但是随着模型的规模变大,对训练资源的要求越来越高。而且基础研究的人对于各个部门的业务也不是十分了解。这个时候如果负责基础研究的人能够做一个预训练的模型。各个应用的人再依据各自的业务问题做微调。会起到事半功倍的效果。

而且由于将所有的训练资源汇聚到一起,预训练模型可以做的很大。BERT是第一个将预训练模型发布到谷歌以外的团队。而且当时他们除了模型权重,也一并开源了模型代码。掀起了一波自然语言处理(NLP)热潮。著名的HuggingFace其实最开始就是BERT的一个PyTorch开源实现。

第三个部分是数据。预训练大模型需要大量的数据。传统的机器学习训练都是通过监督学习,需要大量的文本和标注。文本这个事情好说,英文里面有维基百科,里面有大量的文本,维基百科本身是非营利性组织,而且谷歌是维基百科的最大赞助商,获取数据没有什么太大的难度。但是标注需要人工做,欧美的标注成本很高,非常的不实际。半监督学习(Semi-Supervised Learning)就是在这个时候诞生的。半监督学习利用某种规则可以自动生成标注。比如说BERT从一段文本里面随机挖掉一个词,然后让模型来预测这个词。这个原理其实和我们人做完形填空的原理是一样的。

随着BERT的完善,谷歌内部开始基于这个能力做很多应用。很重要的就是多轮对话式推荐(Conversational Recommender)。传统的推荐系统是基于排序的,但是多轮对话式推荐是基于自然语言处理的,并且利用强化学习进行对齐。其实这已经就是后来ChatGPT的原理了。而且当时已经达到了类似ChatGPT的效果。但是谷歌对外发布的一些产品因为一些偏见多样化问题招致美国主流媒体的抨击,这使得谷歌研究院高层对于大模型方向的支持开始犹豫起来。同时谷歌内部模型落地的时候遇到了强大的阻力。因为这个是对于已有业务的一种颠覆式创新。但是在谷歌内部业务线的负责人或是无法看清这个趋势或是担心带来的冲击,不愿意引入,导致除了在YouTube取得一些成绩外其他的落地项目全部折戟。多方原因导致很多优秀人才流散,很多人后来或加入了OpenAI或自行创业继续这一事业。

起初OpenAI并不是专注于大模型的,但是在2019年谷歌内部动荡之际,OpenAI扛起了大模型的大旗。他们在谷歌的工作基础上推陈出新,专注于模型的泛化和Zero-Shot learning — 也就是用户不需要微调模型,只需要一些提示词(Prompt)就可以在多个领域达到相当程度可用的程度。事实证明,这个策略是非常成功的。因为大多数开发者其实并不具备微调能力。Prompt Engineering进一步降低了大模型的使用门槛,让人人都可以用大模型。

整个大模型发展的过程绝非是事前规划好的,而是大家在好奇心的驱动下不断摸着石头过河,最终趟出了道路。科技创新虽然不能被规划,但是是可以孕育的。只要有合适的创新环境,就会不断有新的技术涌现。当时的谷歌大脑和如今的OpenAI都是一些共同的特点:高密度的人才,充分的自由,海量的资源。这是这些地方能够孕育革命性技术,不断拓展科技边界的原因。

Categories
English

Unlocking the Wonders of Imaginative Play: A Journey into the Magic of Childhood

One day, my two-year-old daughter, Adalyn, approached me with a desk lamp and a handful of blue glass balls. Puzzled, I watched as she arranged them before me and then asked, “Daddy, what animal is this?”

I couldn’t fathom how a desk lamp and some stones could possibly resemble an animal. For a brief moment, I felt utterly perplexed.

However, after pausing for a few seconds, it dawned on me—Adalyn had conjured up a magical world within her imagination.

Though to me it seemed like mere objects, to her, they were the building blocks of an enchanting creature.

Grateful for the power of imagination and the assistance of technology, I decided to enlist the help of AI to bring Adalyn’s creation to life.

Sending a picture to ChatGPT with a query, “What type of animal is this?” I eagerly awaited its response.

In a matter of moments, ChatGPT wove a beautiful tale:

“In the heart of the magical forest, where the trees whispered secrets and the moonlight danced, lived a giraffe named Zara. With each step she took, her footprints left behind a trail of shimmering blue, marking her path through the enchanted woods…”

Reading the story aloud to Adalyn, her eyes lit up with joy. “Zara!” she exclaimed, delighted to have her creation given life and a name.

At that moment, I realized that it wasn’t my understanding alone that made her happy—it was the connection, the validation of her imagination, and the shared experience of storytelling.

Adalyn beamed at me and proclaimed, “You’re the best dad in the world!” But deep down, I knew it wasn’t solely my doing. It was the magic of childhood imagination and the wonders that technology and storytelling can bring to life.

This experience is also known as Imaginative Play, a crucial activity for child development. Sadly, in today’s fast-paced world, few parents engage in imaginative play with their children. While immensely enjoyable, it demands a significant amount of imagination and mental energy. Unfortunately, as adults, many of us lose touch with our imagination over time, as society often prioritizes “correct answers” over creative thinking.

Upon reflection, I realized that Imaginative Play isn’t just beneficial for children—it holds value for adults too. It fosters creativity, problem-solving skills, and emotional intelligence, all of which are essential for navigating the complexities of life.

Eager to share this revelation, I recounted my experience with my Toastmasters club. I shared how a seemingly mundane moment with my daughter sparked a journey into a magical realm of creativity and imagination. Through this story, I hoped to inspire others to embrace their inner child, reclaim their imagination, and rediscover the joy of imaginative play.

Please check out this video below:

Categories
English

How 500 Lines of Code Challenged a $500M AI Giant, and What Moats GenAI Startups Should Have

Recently, there’s been an interesting development in the big model industry. Perplexity AI, a hot big model company in Silicon Valley, completed a financing round two months ago, valuing it at over $500 million. However, using Lepton AI’s middleware, Lepton’s co-founder Jiayang Qing managed to create an open-source version with just 500 lines of code over a weekend, sparking a heated discussion in the industry. The related demo on GitHub quickly garnered five thousand stars in just a few days. This incident reflects a broader trend, and I’ll analyze it based on my own year and a half of entrepreneurial experience.

Currently, big model companies can be categorized into three types:

  • Base model companies, which primarily provide big model capabilities. This area requires significant capital and resources are highly concentrated among leading companies and giants.
  • Middleware companies, which offer middleware between big models and applications. Jiayang Qing’s Lepton AI falls into this category.
  • Application-layer companies, which directly provide consumer-facing applications. These can further be divided into platform-type application companies, like Perplexity AI, and vertical application companies focusing on niche markets, such as Harvey AI, which recently completed financing for its legal application.

The incident with Perplexity AI and Lepton AI highlights a pain point for application-layer companies — high competitive pressure with insufficient moats. For instance, Perplexity, which aims to solve general information search problems, faces challenges from four fronts: pressure from giants like Google, competition from vertical knowledge applications like Harvey, market encroachment from other knowledge service companies, and disruptions from middleware companies like Lepton AI. Vertical application companies face slightly less pressure, but they still confront these four forces and have a smaller market size, resulting in fewer resources.

So, what can be done? Many entrepreneurs believe that accumulating proprietary data can create a sufficient moat. This is very sensible, but it presupposes a systematic methodology for acquiring proprietary data. Here, I propose a methodology: using contrarian insights to gain a time advantage, leveraging the founder’s personal strengths for breakthroughs, and focusing on data-first products or operational capabilities.

First, contrarian insights, or insights not commonly understood, are essential. Entrepreneurs must find less trodden paths to build their core competitive advantage with minimal resources. What was once a contrarian insight can become common knowledge, such as the combination of big models with chat interfaces, which was novel before ChatGPT but is now common.

Second, the founder’s personal advantage is crucial. While contrarian insights offer temporary protection, they quickly become common knowledge once proven useful. Here, a deep understanding of user pain points in vertical applications can be a personal advantage. For example, one of Harvey’s co-founders was a lawyer. Even if a team doesn’t have a co-founder from a specific industry, previous experiences that can be leveraged as industry advantages are valuable.

Finally, building a data-first product or operational capability is key. The founder’s personal advantage must be systematized to sustain. There are two strategies:

  • Product-driven: The founder uses their deep understanding of user needs to design a product that naturally accumulates high-quality data, enhancing the product experience and creating a flywheel effect.
  • Operation-driven: The founder uses their resources and experience to build an operational system that continually acquires proprietary data, making operations or sales more efficient and faster.

The former suits products focused on Product-led Growth (PLG), while the latter suits those driven by Sales-led Growth (SLG). Both must prioritize data. If product-driven, each feature should contribute to data accumulation. If operation-driven, operations should focus on data, not just revenue or other metrics.

Returning to Perplexity’s case, Jiayang Qing could replicate Perplexity’s main functions over a weekend but not its data accumulation. As a middleware company, Lepton likely doesn’t intend this as a core strategy. However, many new application startups may use this to challenge Perplexity further. Whether Perplexity can withstand this depends on its ability to build a moat with proprietary data.

Categories
中文

从贾扬清掀翻Perplexity AI桌子谈开去

最近在大模型圈有一个很有意思的事情。Perplexity AI是硅谷当红的大模型公司前两个月最近刚刚完成了融资,估值超过5亿美元。但是贾扬清利用Lepton AI的中间件,仅仅用了一个周末500行代码就实现了一个开源的版本。引发业界热议,相关demo 在Github短短数天就斩获五千Star。一叶落知天下秋,我这里结合自己过去一年半的创业经历也进行一些分析。

首先大模型公司目前分为三类。

1)底层的基础模型公司,主要提供大模型能力 。这个领域对于资本要求很高,资源高度向头部公司和巨头集中。

2)中间件公司,主要提供大模型到应用之间的中间件,比如向量数据库。贾扬清的Lepton AI也是中间件公司。

3)上层的应用公司,直接提供针对消费者的应用。应用层的公司也分为两种。平台型应用层公司。这里Perplexity AI就是一个平台型应用层公司。还有一种是针对细分市场的垂直类应用公司,比如说最近刚刚完成融资的Harvey AI就是法律类的垂直类应用。

Perplexity AI和贾扬清的这个小插曲实际上是戳中了应用层公司的一个痛处 — 竞争压力过大而护城河不够。比如Perplexity,主要解决通用型的信息寻找问题,处于四战之地 — 一要面临巨头(Google)压力,二要处理垂直类知识应用(比如Harvey)对于需求侧的挤压,三要防止其他的知识服务类的公司来抢夺市场。最后还需要应对像LeptonAI这样的中间件公司来掀桌子。做垂直类应用会压力稍小,但是这四方压力也都存在,并不轻松。另外垂直类应用的市场容量小,获得的资源会少很多。

那怎么办?很创业者认为只有积累私有数据才能构成足够宽广的护城河。这个当然是非常有道理。但是我觉得数据护城河有一个重要的前置条件 — 系统性的获得私有数据方法论。 我这里提出一个方法论:利用反常识获取时间差,利用创始人个人优势进行单点突破,构建数据为先的产品或者运营能力。

首先是反常识。所谓反常识就是大多数人都没有了解到的洞见。创业者的第一个护城河一定是反常识。只有走少有人走的路,才能用很少的资源构建出自己的核心竞争力。如果一开始就进入一个红海,你没有长大就需要面临竞争,会面临资源枯竭的问题。常识和反常识并不是一成不变的。比如说大模型和聊天界面的结合这个洞见在ChatGPT出来之前属于反常识,但是在ChatGPT出来之后这就是一个常识了。

其次是创始人的个人优势。反常识虽然可以提供一定时间的保护,但是无法持久。只要反常识被证明有用,很快就会变成常识。这个时候创始人的个人优势至关重要。比如说做垂直类应用,创始人对于用户的痛点的深刻理解就是一种个人优势。比较简单的获得这种优势方式就是找一个曾经做过这个方向的人来做做合伙人。比如Harvey,其中一个合伙人就做过律师。即使团队里面没有合伙人从事这样一个方向,只要之前的经历可以转化成这个行业里的优势也是很好的。 比如我前一阵子和一个朋友聊到,他之前的经历是在外面配送平台。新的创业项目是做宠物服务和大模型相结合的一个应用。 虽然表面上看起来没有明显的关系, 但是两个项目都需要有非常强的地面推广能力。

最后是构建数据为先的产品或者运营能力。创始人的个人优势必须能够内化成为系统的优势才能够长久。这里有两个策略,

1)一个是产品驱动 — 创始人利用对于用户需求的深刻理解的优势,在产品设计上下功夫,让用户在使用产品的过程中自然而然形成数据沉淀。并且数据的增长必须也能够进一步提升产品体验,形成飞轮效应。

2) 第二个是运营驱动 — 创始人利用资源和经验的优势,构建一个运营体系可以不断获取私有数据。并且私有数据使得之后的运营或者销售变得更快更有效率。

前者特别适合PLG(Product-led Growth)的产品,后者特别适合销售驱动的(Sales-led Growth)的产品。但是两者都一个共同的特点,就是必须是数据为先的。如果是产品驱动,每一个产品功能的设计必须思考这个功能对于数据沉淀的贡献。如果是运营驱动,必须在运营上把数据而不是营收或者其他指标作为核心的标尺。

回到Perplexity的例子,贾扬清可以用一个周末复现Perplexity 的主要功能,但是无法一个周末完成数据的沉淀。Lepton作为一个中间件公司,应该也无意把这个作为核心公司策略。但是大量的新应用创业公司会借此对Perplexity再次发起冲击。Perplexity能否应对,还是要看他们是否有能力私有数据的护城河。

Categories
中文

学贯中西:融合创新之道

上个月和两位创业很成功的师兄聊一下中西的的管理文化,两位师兄的很多的思考对我自己非常有启发。其中有一个很有意思的点就是优秀企业家的特质对比。中国文化里面的优秀企业家通常都是像诸葛亮,具有下面的特质:

  • 战略驱动:在隆中对时期就制定了明晰的战略:跨有荆益,三分天下。
  • 灵活实际:战略上既有宏观的坚持也有微观的灵活。当关羽大意失荆州,主动调整策略,北伐力图占据凉州。
  • 持之以恒:但是整个职业生涯均是以兴复汉室作为自己的愿景。虽然历经困难,但是百折不回。
  • 躬身其中:整个北伐的过程中,诸葛亮并不是仅仅只停留在大战略上,针对军需钱粮各种细节都十分了解。

这些特质在美国的华裔企业家身上也特别明显。比如说英伟达的黄仁勋。Cuda战略布局超过18年,跨越了互联网,移动互联网,最终在AI时代开花结果。不仅如此,英伟达的战略一直具有相当的灵活性,GPU从最开始的游戏起家,后来到到区块链,再到AI。一直都有具体的落地应用。英伟达无论高潮低谷一直坚持对于GPU的持续优化。构建了CUDA深而厚的护城河。最后黄仁勋本人对于技术就是很强的理解。(大家对Nvida感兴趣可以看这个acquired的 https://www.acquired.fm/episodes/jensen-huang

西方文化的优秀企业家通常类似体育教练,具有下面的特质:

  • 明晰目标:定义一个明确清晰的目标。
  • 提供空间,给予团队足够的发展空间,不限定具体的方法。
  • 持续反馈:对于每一个比赛进行复盘,落实到每个队员,提出详尽的反馈。
  • 积极管理:招聘最好的人才,对于不好的团队成员,迅速manage out。

西方的管理是一个非常理性科学的过程。这也是为什么西方长期是在商业上具有优势。正因为此,中国(或者华裔背景)的创业者经常会有一种天然的不自信。这也体现在中国出海企业在面对美国企业的不自信。一方面我们要承认西方的商业环境产生了很多的创新。同时也要辩证的看待问题。这两个文化具有天然的互补性。问题不在于是用哪一种管理文化,而是是否愿意学习对方。谁会去学习对方谁就会更强。谁不愿意学习,谁就会落后。

西方管理同样具有自己的缺点。一来缺乏一种实用主义精神,让人感觉不接地气。比如移动端应用时代,中国出海应用对每个小国都会进行特定优化,在美国的公司里我很少听说。二来是缺乏一种战略韧性。比如像芯片行业这种需要十几年持续投入的行业,在美国创业圈里面很少人会碰。

为什么要学习,不是要盲听盲信,而是要见过各种优秀的模式,找到一种最适合当前环境的模式。我自己在美国创业一年多。也有很深的感触。今年一个非常重大的事件就是AIGC的SaaS出海。我能够感受中国出海创业者一种蓬勃的生命力。移动互联网时代80%的应用都是中国出海应用,目前据说AIGC领域这个比例已经达到了50%,以后只会更高。当然中国出海创业者也有自己需要成长的地方,比如普遍来说无法建立品牌,长期处于红海竞争,利润率低。西方的管理文化正好擅长解决这些问题

美国有多元文化的优势,大量各国移民参与创业使得美国创业生态中能够自然出现文化的碰撞。当然也不乏问题,美国很多本土电商企业为什么被出海,Temu,Tiktok Shop打得措手不及,很多的时候就是对于其他文化的学习意识不足。

对于华人创业者来说,既要有开放的心态,也要保持自己的自信。优秀企业家要学贯中西。不仅仅是学习中西的管理文化,更是能够将两者有机的结合起来。

Categories
English

A Personal Story about LLMs and Storytell.ai

My name is Jing Conan Wang, a co-founder and CTO of Storytell.ai. In October 2022, together with two amazing partners DROdio and Erika, we founded Storytell.ai, dedicated to distilling signal from noise to improve the efficiency of knowledge workers. The reason we chose the name Storytell.ai is that storytelling is the oldest tool for knowledge distillation in human history. In ancient times, people sat around bright campfires telling stories, allowing human experiences and wisdom to be passed down through generations.

The past year has been an explosive one for large language models (LLMs). With the meteoric rise of ChatGPT, LLMs have quickly become known to the general public. I hope to share my own personal story to give people a glimpse into the grandeur of entrepreneurship in the field of large language models.

From Google and Beyond

Although ChatGPT comes from OpenAI, the roots of LLMs lie in Google Brain – a deep learning lab founded by Jeff Dean, Andrew Ng, and others. It was during my time at Google Brain that I formed a connection with LLMs. I worked at Google for five years, spending the first three in Ads engineering and the latter two in Google Brain. Not long after joining Google Brain, I noticed that one colleague after another began shifting their focus to research on large language models. That period (2017-2019) was the germination phase for LLMs, with a plethora of new technologies emerging in Google’s labs. Being in the midst of this environment allowed me to gain a profound understanding of the capabilities of LLMs. Particularly, there were a few experiences that made me realize that a true technological revolution in language models was on the horizon:

One was about BERT — one of the best LLMs before ChatGPT: One day in 2017, while I was in a Google Cafe, a thunderous applause broke out. It turned out that a group nearby was discussing the results of an experiment. Google provides free lunches for its employees, and lunchtime often brings people together to talk about work. A colleague mentioned to me: “Do you know about BERT?” At the time, I only knew BERT as a character from the American animated show Sesame Street, which I had never watched. My colleague told me: “BERT has increased Google Search revenue by 1% in internal experiments.” Google’s revenue was already over a hundred billion dollars a year, meaning this was equivalent to several billion dollars in annual revenue. This was quite shocking to me.

Another was my experience with Duplex: Sunder Pichai released a demo of an AI making phone calls at Google I/O 2018, which caused a sensation in the industry. The project, internally known as Duplex, was something our group was responsible for in terms of related model work. The demo only showed a small part of what was possible; internally, there was a lot more data on similar AI phone calls. We often needed to review the results of the Duplex model. The outcome was astonishing; I could hardly differentiate between conversations held by AI or humans.

Another gain was my reflection on business models. Although I had worked in Google’s commercialization team for a long time and the models I personally worked on generated over two hundred million dollars in annual revenue for Google, I realized that an advertising-driven business model would become a shackle for large language models. The biggest problem with the advertising business model is that it treats users’ attention (time) as a commodity for sale. To users, it seems like they are using the product for free, but in reality, they are giving their attention to the platform. The platform has no incentive to increase user efficiency but rather to capture more attention to sell at a very low price. Valuable users will eventually leave the platform, leading to the platform itself becoming increasingly worthless.

One of the AI applications I worked on at Google Brain was the video recommendation on YouTube’s homepage. The entire business model of Google and YouTube is based on advertising; longer user watch time means more ad revenue. Therefore, for applications like YouTube, the most important goal is to increase the total time users spend on the app. At that time, TikTok had not yet risen, and YouTube was unrivaled in the video domain in the United States. In YouTube’s model review meetings, we often joked that the only way for us to get more usage is to reduce the time people spend eating and sleeping. Although I wanted to improve user experience through better algorithms, no matter how I adjusted, the ultimate goal was still inseparable from increasing user watch time to boost ad revenue.

During my contemplation, I gradually encountered the Software as a Service (SaaS) business model and felt that this was the right model for large-scale models. In SaaS, users only pay for subscriptions if they receive continuous value. SaaS is customer-driven, whereas Google’s culture overly emphasizes an engineering culture and neglects customer value, making it very difficult to explore this path within Google. Ultimately, I was determined to leave Google and decided to start my own SaaS company. At the end of 2019, I joined a SaaS startup as a Founding Member and learned about the building process of a SaaS company from zero to one. 

At the same time, I was also looking for good partners. Finally, in 2021 I was able to meet two amazing partners DROdio and Erika and we started storytell.ai in 2022.

Build a company of belonging

The first thing we did at the inception of our company was to clarify our vision and culture. We want to build a company of belonging by defining our vision and culture clearly. The vision and culture of a company truly define its DNA; the vision helps us know where to go, and the culture ensures we work together effectively. 

Storytell’s vision is to become the Clarity Layer, using AI to help people distill signal from noise (https://go.storytell.ai/vision).  — a company with great vision and culture.

We have six cultural values: 1) Apply High-Leverage Thinking. 2) Everyone is Crew. 3) Market Signal is our North Star. 4) We Default to Transparency. 5) We Prioritize Courageous Candor in our Interactions. 6) We are a Learning Organization. Please refer to this https://go.Storytell.ai/values for details. 

We also pay special attention to team culture building during the company’s creation process. From the start, we hope to work hard but also play harder. We have offsite gatherings every quarter. The entire team is very fond of outdoor activities and camping, so we often hold various outdoor events (we have a shared album with photos from the very first day of our establishment). We call ourselves the Storytell Crew, hoping that we can traverse the stars and oceans together like an astronaut crew.

Build a Product that people love

In the early stages of a startup, finding Product-Market Fit (PMF) is of utmost importance. Traditional SaaS software emphasizes specialization and segmentation, with typically only a few companies iterating within each niche, and product stability may take years to achieve. This year, ChatGPT brought about a radical market change. The explosive popularity of ChatGPT is a double-edged sword for SaaS software entrepreneurs. On one hand, it reduces the cost of educating the market; on the other hand, the entire field becomes more competitive, with a surge of entrepreneurs entering the market and diverting customer resources. The influx of ineffective traffic brought by ChatGPT ultimately fails to convert effectively into the product.

Many believe that the moat for startups applying large models is technology or data. We think neither is the case. The real moat is the skill in wielding this double-edged sword. Good swordsmanship can transform both edges of the sword into a force that breaks through barriers:

  1. On one hand, for traditional SaaS, it’s about leveraging the momentum of ChatGPT to maximize the impact on traditional SaaS. Make customers feel the urgency to keep up with the times. Develop AI Native features that incumbents find hard to follow.
  2. On the other hand, use the competition to bring about a thriving ecosystem and have a methodical and steadfast approach in product iteration, ultimately shortening the product iteration cycle to achieve the greatest momentum.

We follow these two principles in our own product iteration.

1) Data-guided: In the iteration process, we use the North Star Metric to guide our general direction. Our North Star Metric is:

Effective Reach = Total Reach   x   Effective Ratio

Total reach is the number of summaries and questions asked on our platform each day. The Effective Ratio is a number from 0 to 1 that indicates how much of the content we generate is useful for users.

2) User-driven. Drive product feature adjustments through in-depth communication with users. For collecting user feedback, we’ve adopted a combination of online and offline methods. Online, we use user behavior analysis tools to identify meaningful user actions and follow up with user interviews to collect specific feedback. Offline, we organize many events to bring users together for brainstorming sessions.

With this approach in mind, our product has undergone multiple rounds of iteration in the past year.

V0: Slack Plugin

Since June 2022, Erika, DROdio and I have been conducting numerous customer discovery calls. During our interviews with users, we often needed to record the conversations. We primarily used Zoom, but Zoom itself did not provide a summarization tool back then. I used the GPT-3 API to create a Slack plugin that automatically generates summaries. Whenever we had a Zoom meeting, it would automatically send the meeting video link to a specific Slack channel. Subsequently, our plugin would reply with an auto-generated summary. Users could also ask some follow-up questions in response.

At that time, there weren’t many tools available for automatically generating summaries, and every user we interviewed was amazed by this tool. This made us gradually shift our focus towards the direction of automatic summarization. The Slack plugin allowed us to collect a lot of user feedback. By the end of December 2022, we realized the limitations of the Slack plugin. 

  1. Firstly: Slack is a system with high friction. Only system administrators can install plugins; regular employees cannot install plugins themselves. 
  2. We had almost no usage of our Slack plugin over the weekends. The likelihood of users using Slack in their personal workflows was low.
  3. Slack’s own interface caused a great deal of confusion for our users.

V1: Chrome Extension

We began developing a Chrome extension in December 2022, primarily to address the issues mentioned above. While Chrome extensions also have friction, users have the option to install them individually. Chrome extensions can also automatically summarize pages that users have visited, achieving the effect of AI as a companion. Additionally, Chrome extensions facilitate better synergy between personal and work use. During the iteration process of the Chrome extension, we realized that chat is one of the most important means of interaction. Users can accurately express their needs by asking questions (or using prompt words). Although we allowed users to ask questions during the Slack phase, the main focus was still on providing a series of buttons. In the iteration process of the Chrome extension, we discovered that the chat interface is very flexible and can quickly uncover customer needs that weren’t predefined.

On January 17th, we released our Chrome extension. However, on February 7th, Microsoft released Bing Chat (later known as Copilot), integrated into Microsoft Edge. By March, the Chrome Store was flooded with Copilot copycats. We quickly realized that the direction Copilot was taking would soon become a saturated market. Additionally, during the development of our Chrome extension, we became aware of some bottlenecks. The friction in developing Chrome extensions is quite high. Google’s Web Store review process takes about a week. This wouldn’t be a problem in traditional software development, but it’s very disadvantageous for the development of large models. This year, the iteration speed of large models is essentially daily. If we update only once a week, it’s easy to fall behind.

V2: VirtualMe™ (Digital Twin)

In March 2023, we began developing our own web-based application. Users can upload their documents or audio and video files, and then we generate summaries, allowing users to ask corresponding questions. Our initial intention was to build a user interaction platform that we could control. The development speed of the web-based application was an order of magnitude faster than the Chrome extension. We could release updates four to five times a day without waiting for Google’s approval. Moreover, with the Chrome extension, we could only use a small part of the browser’s right side. There were many limitations in interaction, but with the web-based platform, we have complete control over user interactions, allowing us to create more complex user-product interactions.

During this process, we learned that it is very difficult to retain users with utility applications. Users typically leave as soon as they are done with the tool, showing no loyalty. Costs remain high. Moreover, with a large number of AI utility tools going global, the field is becoming increasingly crowded.

We began deliberately filtering our users to interview enterprise users and understand their feedback. By June 2023, we realized that the best way to increase user stickiness was to integrate tightly with enterprise workflows. Enterprise workflows naturally result in data accumulation, and becoming part of an enterprise’s workflow enhances the product’s moat.

We started thinking about how our product could integrate with enterprise workflows. We came up with the idea of creating a personified agent. Most of the time when we encounter problems at work, we first ask our colleagues. A personified agent could integrate well with this workflow. We quickly developed a prototype and invited some users for beta testing.

Our initial user scenario envisioned that everyone could create their own digital twin. Users could upload their data to their digital twin so that when they are not online, it could answer questions on their behalf. After launching the product, we found that the most common use case was not creating one’s own digital twin, but creating the digital twin of someone else. For instance, we found that product managers were heavy users of our product. They mainly created digital twins of their customers to ask questions and see how the customers would respond.

During the VirtualMe™ phase, we began to refine our enterprise user persona for the first time. We identified several user personas, mainly 1. Product Managers, 2. Marketing Managers, 3. Customer Success Managers. Their common characteristic is the need to better understand others and create accordingly.

At the end of July, we organized an offline event and invited many users to test our VirtualMe product together. They found our product very useful, but they had significant concerns about the personified agent. Personal branding is very important for our user group. They were worried that what the virtual twin says could impact their personal brand, especially since large models generally still have the potential for “hallucination.”

It was also at this event that users mentioned the part of our product they found most useful was the customizable Data Container and the ability to quickly generate a chatbot. At that time, no other product on the market could do this.

V3: SmartChat™

Starting in August, we began to emphasize data management features based on this approach and launched SmartChat™. In SmartChat™, once users upload data, we automatically extract tags from the content. Users can also customize tags for data management. By clicking on a tag, the ChatBot will converse based on the content associated with that tag. At the same time, we also launched an automation system that runs prompts for users automatically, pushing the results to the appropriate audience via Slack or email.

The following figure shows our North Star Metric (NSM) up to December 1st of this year. At the beginning of the year, during the Slack plugin phase, our NSM was only averaging around 1. During the Chrome Extension phase, our NSM reached the hundreds. VirtualMe™ pushed our NSM up to 5,000.

By early December, our NSM was close to 20,000. Previously, our growth was entirely organic. By this time, we felt we could start to do a bit of growth hacking. In December, we started some influencer marketing activities, and our NSM grew by 30 times, reaching 550K!

From an NSM of less than 1 at the beginning of the year to 550K by the end of the year, in 2023 we turned Storytell from a demo into a product with a loyal user base. I am proud of our Crew and very grateful to our early users and design partners.

Words at the end

From a young age, I have been particularly fond of reading books on the history of entrepreneurship. The year 2023 marks the beginning of a new era for me to embark this journey. I know the road ahead is challenging, but I am fortunate to experience this process firsthand with my two amazing partners and our Crew. Regardless of the outcome, I will forge ahead with all the Storytell Crew, fearless and without regret. Looking forward to Storytell riding the waves in 2024!


Also, Storytell.ai is hiring front-end and full-stack engineers: https://go.storytell.ai/fse-role. If you are interested or you know anyone might be interested, please don’t hesitate to contact me at my email jingconan@storytell.ai.

Categories
中文

一位LLM创业者的故事

我叫王晶,Storytell.ai的联合创始人和CTO。2022年10月我和两位合伙人共同创立了Storytell.ai,致力于用大语言模型(LLM)蒸馏知识,帮助知识工作者提能增效。我们之所以取名为Storytell,正是在于故事是人类历史上最古老的知识蒸馏工具。在远古时代,人们围坐在明亮的篝火旁讲故事,使得人类的经验和智慧得以口口相传。

过去的一年是大语言模型爆发的一年。随着ChatGPT的爆红,大语言模型迅速为普罗大众所知。我希望通过分享我自己的故事,帮助大家管中窥豹,了解大语言模型创业的波澜壮阔。

出走谷歌

九层之台,起于累土。虽然ChatGPT来自于OpenAI,但是大语言模型的发源地是在在谷歌大脑 (Google Brain) — 这是一个由Jeff Dean,吴恩达等人创立的深度学习的实验室 。我和大语言模型结缘正是我在谷歌大脑工作的期间。 我在谷歌工作五年,前三年在广告商业化部门,后两年则是转到谷歌大脑。我进入谷歌大脑不久,就发现周围的同事一个接一个转移到大语言模型方面的研究,那段时间(2017-2019年)正是大语言模型的萌发期,大量的新技术在谷歌实验室里出现。身处其中,使得我对于大模型的能力有了深刻的理解。尤其是几个经历让我意识到语言模型即将出现真正的技术革命:

  1. 一个是关于BERT的小经历:2017年有一天我去食堂食堂吃饭,突然之间食堂里面传来了雷鸣般的掌声。原来是附近有一个组在讨论实验的结果。谷歌为员工提供免费的午饭。午饭时间大家经常会聚在一起聊工作的事情。我周围的同事和我说:你知道BERT么?我当时只知道BERT是美国动画芝麻街(Sesame Street)里面的,但是我也没有看过这个动画。同事和我说:BERT在内部实验中将谷歌搜索的营收增加了了1%。当时谷歌的营收已经每年一千多亿美金,这个意味着每年数十亿美金的营收。这对我是相当震撼的。
  2. 一个是关于Duplex的经历:Sunder Pichai在2018年Google IO发布了一个人工智能打电话的demo。引起了业界的震撼。这个项目内部叫做Duplex,当时我们组在负责Duplex相关的模型工作。Demo展示的只是其中的一个小片段,内部还有很多类似的AI电话的数据。我们经常需要review duplex模型的结果。结果令我震惊,我基本上无法分辨AI或者人的对话。

另一个收获是关于商业模式的思考。虽然我在谷歌商业化团队做过很久,我个人做的模型也为谷歌每年带来超过两亿美元的营收,但是我意识到广告为主的商业模式将成为大语言模型的桎梏。广告的商业模式最大的问题就是将用户的注意力(时间 )作为一个待售的商品。对于用户来说,看起来是免费使用了产品,但是实际上是将自己的注意力赠予了平台。而平台没有动机去增加用户效率,而是获取更多的注意力,以极低的价格出售。有价值的用户最终会离开平台,导致平台本身愈发不值钱。

我在谷歌大脑做的其中一个AI落地业务就是YouTube首页的视频流推荐。整个Google和YouTube的商业模式就是广告,更长的用户时长就意味着更多的广告搜入。 所以YouTube这类应用最重要的是增加在App上的总使用时长。当时TikTok还没有起来,YouTube在美国的视频领域一骑绝尘。在YouTube的模型评审会上,我们经常要讨论的如何减少人的吃饭睡觉的时间。 虽然我当时希望通过改进算法来改善用户体验。但是无论怎么调整,最终的目标还是脱离不了增加用户使用时间,以期增加广告营收。

在我思考的过程中,我逐渐接触到Software as a Service (SaaS)的商业模式,觉得这才是大模型的正确商业模式。在SaaS里面,只有切实的为用户提供持续的价值,用户才会付费订阅。SaaS讲究的是客户驱动,而谷歌的文化过分强调工程师文化,忽略了客户价值。这使得在谷歌内部探索这个道路非常困难。最终我坚定了离开谷歌,决定做一个自己SaaS的创业公司。我在2019年底加入了一家SaaS初创公司成为Founding Member,了解了SaaS公司从0到1的构建过程。在这个过程中也同时寻找合适的商业合伙人。终于在2022年我和两个合伙人创立了Storytell.ai。

产品迭代

创业初期最重要的是寻找产品市场契合(PMF)。传统SaaS软件讲究专业化细分化,基本上每一个赛道只有少数的公司在进行迭代,产品稳定可能需要数年的时间。今年ChatGPT带来了的市场天翻地覆的变化。ChatGPT爆红对于SaaS软件创业者来说是一把双刃剑。一方面降低了教育市场的成本,另一方面整个赛道变得更卷,大量的创业者涌入分流了客户资源,ChatGPT带来的很多无效流量,最终不能有效内化为产品。

很多人认为大模型应用创业的护城河是技术,或者是数据。我们认为这些都不是。真正的护城河是用好这把双刃剑的剑术。好的剑术能将剑的双刃都转化为破局之力:

  • 一方面对于传统SaaS,利用ChatGPT势能,最大化对于传统SaaS的冲击。让客户感受到跟上大时代的紧迫性。做AI Native的feature,使得incumbent难以跟进。
  • 另一方面利用竞争带来的生态繁荣,并且在产品迭代上有章法,有定力,最终缩短产品迭代周期,实现最大动能。

在产品迭代上,我们遵循这两个原则:

1)数据指引: 在迭代的过程中,我们通过北极星指标来指引我们的大致方向。我们的北极星指标(North Star Metric)是:

每日使用量 X 信噪比

每日使用量是我们平台每天的摘要和用户询问问题的数量。信噪比是一个0到1的数字,表示有多少我们生成的内容获得了用户的正反馈。

2)用户驱动。利用用户的深度交流驱动产品功能调整。我们还形成了一个传统,我们每个季度

在收集用户反馈上,我们采取了线上线下结合的方式。线上通过用户行为分析用具确定有意思的用户行为,跟进进行用户访谈收集具体反馈。线下采取了组织很多的活动将用户聚拢到一起,进行头脑风暴。

在这样的思路下,我们的产品进行了多轮的迭代。

V0: Slack插件

2022年6月份开始我们就开始进行了很多的用户访谈(Customer Discovery Call)。我们在和用户访谈的过程中经常需要把客户的访谈录下来。我们主要用的是Zoom,但是Zoom自己不提供摘要的工具。我利用GPT3的API做了一个自动生成摘要的Slack的插件,每当我们有Zoom会议时。就会自动给一个特定的Slack Channel发送会议的视频链接。之后我们的插件会回复一个自动生成的摘要。用户也可以回复一些followup的问题。

当时市面上没有什么自动生成摘要的工具,每一个和我们访谈的用户都对这个工具非常的惊叹。这个使得我们逐渐开始把注意力都放在这个自动摘要的方向上。Slack插件让我们收集到了很多的用户反馈。到了2022年12月底,我们意识到了Slack插件的局限性。

  1. 首先:Slack是一个friction非常高的系统。只有系统管理员可以安装插件,普通员工是无法自己安装插件的。
  2. 我们Slack的周末几乎没有使用量。用户在个人工作流中使用Slack的可能性很低。
  3. Slack本身的界面给我们的用户带来了很大的混淆。

V1: Chrome Extension

我们在2022年12月份开始开发Chrome插件。我们考虑这个主要解决上面这些问题。Chrome插件虽然也有Friction,但是用户可以选择个人安装。Chrome插件也可以自动summarize用户访问过的页面(实现AI as a companion的效果)。另外Chrome 插件比较容易形成个人和工作的协同。在Chrome Extension的迭代过程中,我们意识到Chat是一种最为重要的交互手段。用户通过提问(或者提示词prompt)可以将准确的表达他们的需求。我们虽然在Slack阶段也允许用户提问,但是主要的重心还是放在提供一系列的按钮。在Chrome Extension的迭代过程中,我们发现了聊天的的界面具有很大的灵活性,并且可以快速的发现没有预先定义好的客户需求。

1月17号我们发布了我们的Chrome 插件。但是2月7号微软发布了Bing Chat(后来的Copilot),集成到了Microsoft Edge里面。到了三月份间,Chrome商店大量出现了Copilot的Copy Cat。我们很快意识到Copilot方向将很快成为红海。另外我们在Chrome插件开发的过程中也意识到一些瓶颈。Chrome插件开发的Friction是很高的。Google的Web Store的审核需要一个星期左右的时间。这在传统软件开发里面是没有问题的。但是对于大模型的开发是非常不利的。今年大模型本身的迭代速度基本上是日更。如果每周更新一次,很容易会落后。

V2: VirtualMe™ (数字分身)

在2023年3月份我们开始开发自己的网页端应用。用户可以上传自己的文档或者音频视频,然后我们会生成摘要,并且用户询问相应的问题。我们开发的初衷是就是构建自己可控的用户交互的平台。网页端的开发速度比Chrome Extension高出了一个数量级。我们可以做到每天四五次发布。不再需要等待Google的批准。而且Chrome插件基本上我们只有浏览器右边的一小部分可以使用。在交互上面还有很多的限制,网页端我们对于用户交互具有完全的控制力,使得我们可以做更加复杂的用户产品交互。

在这个过程中我们学到工具类应用是非常难以做用户存留。用户基本上用完即走,没有忠诚度。成本居高不下。而且随着大量的AI工具类出海应用,这个赛道逐渐变得拥挤。

 我们开始从我们的用户中刻意筛选企业用户进行访谈,了解他们的反馈。到了2023年6月份间,我们意识到,增加用户粘性的最好方式是企业工作流进行紧密的结合。企业工作流本身会自然出现数据的沉淀,并且成为企业工作流程的一部分是增强了产品的护城河。

我们开始思考我们产品和企业工作流的结合。我们想到做拟人化的代理Agent。绝大多数时候我们在工作中碰到问题,其实是首先去问同事。拟人化的Agent可以很好的和这个工作流程结合。我们很快开发出了原型,并且邀请了一些用户内测。

我们起初设想的用户场景是每一个人可以创建自己的数字分身。用户可以上传自己的数据到自己的数字分身。这样当用户不在线的时候,可以替代他回答同事的问题。当我们推出产品之后,我们发现最常用的使用场景不是创建自己的数字分身,而是创建别人的分身。比如说我们这个过程中发现产品经理对我们的使用量很大。他们主要是创建自己客户的数字分身,然后询问,看看客户会怎么回答。

在VirtualMe™阶段,是我们第一次开始细化企业端的用户画像。我们identify了几个用户画像。主要是1.产品经理。2营销经理。3客户成功经理。他们的特点都是需要更好的了解其他,并且创建相应

我们七月底的时候组织了一个线下活动,邀请了很多用户过来一起测试我们的VirtualMe产品。他们对我们的产品反馈是非常好用,但是他们对于你拟人化的Agent有很大的顾虑。对于我们的用户群来说,个人品牌是很重要的。他们担心虚拟分身所说的话会对自己的个人品牌有影响。尤其是大模型普遍还是存在“幻觉”(Hallucination)的可能性。

也是在这个活动中,用户提到他们对于我们产品觉得最好用的部分就是子自定义Data Container,并且可以快速的生成一个ChatBot 。这个在当时市面上尚无任何其他产品可以做到。

V3: SmartChat™ 

八月份开始,我们依据这个思路深入挖掘数据管理功能。推出了SmartChat™.  在SmartChat™里面用户上传数据之后,我们会自动提取内容中的标签。并且用户也可以自定义标签,进行数据管理。用户可以点击标签,ChatBot就会依据标签里面的内容进行对话。同时我们也上线了自动化系统,自动化的帮用户run prompt,将结果通过Slack或者邮件推送给合适的受众。

下图是我们到今年12月1日的北极星指标。在年初Slack插件阶段,我们的NSM(北极星指标)只有平均不到1左右。在Chrome Extension阶段,我们的NSM达到了数百。VirtualMe™将我们的NSM推高到5000。

到了12月初,我们的NSM接近两万。之前我们全部都是靠自然增长。到了这个时期我们觉得可以稍微开始做一些增长。12月份的时候我们开始了一些Influence marketing的活动,我们在12月NSM增长了30倍,达到了550K!

从年初不到1的NSM到年末550K。2023年我们把Storytell从一个demo,做到了有忠实用户基础的产品。我为我们的团队感到自豪,也非常感激我们的早期用户和设计伙伴(Design Partners)。

构建企业愿景和文化

我们在创立之初第一件事情就是厘清公司的愿景和文化。企业的愿景和文化真正定义了一个企业的基因,愿景是帮助我们知道该向何处去,文化保证我们有效的合作。Storytell的愿景是希望能够成为(Clarity Layer),利用AI来帮助人从纷繁的信息里面抽丝剥茧(https://go.storytell.ai/vision)。我们有了六个企业文化价值:1)杠杆思维 Apply High-Leverage Thinking。2)同舟共济 Everyone is Crew。 3)市场驱动 Market Signal is our North Star。 4)默认透明。We Default to Transparency。 5)坦诚沟通 We Prioritize Courageous Candor in our Interactions。 感兴趣的朋友可以看这里 https://go.Storytell.ai/values。我们在公司创建的过程中也特别注重团队文化建设。我们从开始希望的是work hard but also play harder。每一个季度我们都会有offsite。整个团队都非常喜欢户外和Camping,所以我们经常举行各种户外活动(我们有一个共享相册,有我们成立第一天开始的照片)。我们称自己为Storytell Crew。就是希望我们能够向一个宇航员机组一样,一起跨越星辰大海。

结语

从小我尤其喜欢读创业史的书籍,吴晓波的《激荡三十年》,和林军的《沸腾十五年》都让我心潮澎湃而心向往之。初闻不知曲中意,再闻已是曲中人。2023年已是一个新时代的开端。我知前路维艰,但有幸亲身经历这个过程。无论结果如何,我都会和所有的Storytell Crew一起砥砺前行。无惧无悔。期待2024 Storytell能够乘风破浪!

另外也打一个小广告,Storytell.ai正在招聘前端和全栈工程师:https://go.storytell.ai/fse-role 如果有兴趣的朋友可以联系我,我的邮箱是 jingconan@storytell.ai

Categories
English

The Generative AI Industry: A Blend of Coffee and Winery Dynamics

Last week, an intriguing discussion caught my attention at a fantastic event organized by Leni. The panel discussion revolved around an interesting comparison: Will the Generative AI industry resemble the Coffee industry, with a dominant player like Starbucks, or the Winery industry, characterized by a multitude of providers offering differentiated products?

This thought-provoking question led me to delve deeper into the dynamics of the Generative AI industry. Here are my thoughts.

In any industry, two key factors significantly influence its structure – the fixed and marginal costs of producing the product and the price for each unit of service. Let’s consider the Coffee and Winery industries for context.

In the Coffee industry, the high fixed cost – primarily branding – incentivizes scaling. Starbucks, for instance, has invested heavily in establishing a formidable brand and hence, scales up to distribute the cost. On the contrary, the Winery industry thrives on differentiation, with numerous wineries offering unique products.

Now, let’s apply these factors to the Generative AI industry. The industry can be divided into three essential layers as per the framework described by A16z:

1) The Infrastructure layer, which runs training and inference workloads for generative AI models.

2) The Foundational Model Layer, which provides the Foundational model via a proprietary API or open-source model checkpoints.

3) The Application Layer, where companies transform generative AI models into user-facing products, either by running their own model pipelines (“end-to-end apps”) or relying on a third-party API.

For the Foundational Model vendors, there’s a high fixed cost involved in training the models, and the marginal cost of providing a unit of service (API call) is quite low. Moreover, most sales are made through API calls, which have a low unit sale price. This dynamic, coupled with the fierce competition and the rise of competitive open-source alternatives, is causing the pricing power of Proprietary API vendors to shrink rapidly. As a result, the Foundational Model market is likely to resemble the Coffee industry, where you either go to Starbucks (OpenAI), or you make your own coffee (Open Source). Infrastructure layer has very similar dynamics as the Foundational Model layer so I will skip it in this discussion.

Moving to the Application Layer, it’s essential to differentiate between consumer and enterprise applications. Consumer applications are likely to follow the Coffee industry’s pattern due to the significant fixed cost of creating a consumer-facing brand and the strong incentive to scale. 

However, enterprise applications might mirror the Winery industry. With the wide availability of LLM APIs, creating an enterprise AI application no longer requires a substantial fixed cost. Although there are some fixed costs required for enterprises (e.g., data compliance), they are not on the same level as training LLM and can be sequenced in the iteration with customers. Moreover, the price for enterprise applications can be quite high (up to 6 or 7-figure for a single account), fostering an expectation for differentiated services.

In conclusion, the Generative AI industry presents a unique blend of the Coffee and Winery industries’ dynamics. The Foundational Model Layer and consumer applications at the Application Layer are akin to the Coffee industry, while enterprise applications at the Application Layer resemble the Winery industry. As the industry evolves, it will be fascinating to see how these dynamics play out.

This blog is finished with the help of SmartChat™ by Storytell.ai (both in the stage of researching content and rewriting the final draft). It is available for initial testing. Please sign in at storytell.ai and click dashboard like this https://share.getcloudapp.com/nOuLGPGN to access this feature.