Building Cyborgs
With generative AI, we are all asking the same questions on how best to build a defensible product (along with everyone on Twitter).
Reibus* is a B2B marketplace for metals founded by three guys all named John. I thought about that company quite a bit as I read Janus’ post on Cyborgism.
John #1 (John Armstrong, the CEO), a veteran of the steel industry, identified a problem: demand forecasting is incredibly difficult, and every industry participant is in a constant state of having too much or too little. A hurricane means you need more steel for roofing; if chicken prices rise, you need for steel for chicken coops.
The person who is the most likely to have what you need is your competitor down the road, but negotiating with competitors directly is challenging. And competitors don’t want to be seen loading their steel into your truck.
We met with Reibus at the seed stage, and they checked many of the boxes on my mental checklist for what makes a great marketplace business:
Large TAM: Over 80 million metric tons of steel produced each year in the US
Shadow market: Essentially all offline / “call your buddy”
Bootstrapping liquidity: a buyer one day is a seller the next, and vice versa
Avoid slippage? Transactions need to be anonymous for the reasons above, and shipping tons of steel is a frustrating process that buyers and sellers are happy to hand off
John cares about having a strong, defensible business model. But Reibus transactions were anonymous not to “avoid slippage,” but because that is what made the most sense to solve the problem. It just so happens that the business model that best fits the problem is a compelling one.
In 2016, we wrote a blog post discussing the fact that, at the time, mobile devices had more users than desktop computers. Value was going to be created by that shift, but to whom would it accrue: startups or incumbents?
We ultimately invested in Braze* and Amplitude*, startups specializing in mobile-focused marketing and analytics that leveraged novel architectures.
We also spent a lot of time thinking about the “Salesforce for mobile,” where a startup would make a mobile UI a priority rather than an add-on.
As it turns out, Braze and Amplitude have both become public companies, but there is no clear “Salesforce for mobile” – that company is just Salesforce.
With Generative AI, we are asking all the same questions on how best to build a defensible product (along with everyone on Twitter). Which use cases are a “thin layer” on top of the base model that will just be handled by ChatGPT or its progeny? Which problems are best solved by an existing incumbent that will bundle generative capabilities into their existing offerings?
Figuring out the product characteristics that could comprise a “moat” is the first step; the second, and just as important, is to find the problems where applying those product characteristics is part of the solution rather than an unnatural addition.
In Cyborgism, Janus points out that when inventing something new, people often begin by building something that looks like what they know. The first automobiles, for example, were basically horse-drawn carriages with a mechanical engine in place of the horse.
We have fallen into this trap with GPT1. Many of the first GPT products do not fully acknowledge how the technology differs from the type of intelligence humans are accustomed to. GPT and human intelligence excel in different types of tasks, and using GPT as a direct replacement for a human can present challenges.
Janus identifies that GPT has the following:
(Rough) superhuman knowledge
Can generate text very quickly
No qualms about doing tedious things
Superhuman skills in unintuitive domains
Useful contexts can be reused (humans can’t “save” a productive brain state)
But it also struggles in the following way:
Goal-directedness: It is not designed to answer a question, we just try to make it.
Long-term coherence: GPT struggles to maintain the long-term thread of a document, becoming sidetracked and disconnected due to its finite context window. Any mistakes made in generation compound. For example, in this fiction-writing example, GPT-3 forgets that Sally is a man.
Staying grounded in reality: GPT's "working memory" relies on its specific prompts. It has general knowledge but not situational awareness.
Robustness: GPT's text generation is naturally chaotic and high variance due to the nature of its training data and its own logical uncertainty.
One way to look at these issues is to think of them as problems that you can try to engineer away while keeping the product framework the same. One method for this is RLHF (Reinforcement Learning from Human Feedback), a training approach that taking human-generated feedback as input signals to improve the quality, relevance, or coherence of the generated text.
This method helps us to shape the “alien intelligence” of GPT in a way that feels more human and agent-like, but it also collapses its abilities. RLHF trained systems exhibit more goal-directed behavior than the base models, but every response is filtered through the preferences and biases of that process and the perspective of the agent. This may be fine and appropriate for certain jobs-to-be done (e.g. a customer service bot that lives on your FAQ page). But losing the full breadth of the model will not be optimal for others.
There is another approach. The struggles of GPT — goal-directedness, long-term coherence, grounding in reality and robustness — are all things that humans do naturally. Instead of RLHF-ing GPT in to a familiar agent form, we can embrace the weirdness of GPT and provide the missing pieces with human intelligence instead.2
According to Janus, a “cyborg” is a process that involves a human operating GPT with the benefit of specialized tools, thereby extending human agency rather than replacing it.
An excellent example of a “cyborg” business is Seek AI*, our most recent investment.
Seek uses generative AI to translate natural-language questions into SQL queries. This process allows data analysts to focus on key initiatives rather than being bogged down by ad-hoc requests, and it also gives business users the ability to easily access data that may not be summarized in standard dashboards.
Providing a chatbot to a business user where they can ask a question and receive an automatic SQL response doesn't work. This is because of the issues the response needs to be ~100% accurate or it won’t be used - the first time you get garbage back, you’ll revert to tapping the shoulder of your friend on the data team like you used to.
High precision on a goal-directed task is a poor use case for GPT, as discussed above, but that doesn’t mean that you can’t use GPT to help data analysts solve their problems. Seek built a novel workflow for analysts to sit alongside GPT to do their job of responding to queries more quickly and effectively, and the system gets better the more it is used.
We can think of GPT businesses in two paradigms:
Bots: An AI that answers questions, follows instructions, and/or pursues a goal to accomplish a task.
Cyborgs: An AI that works with humans, combining their capabilities.
Value in bots will accrue to incumbents, but the value in cyborgs will accrue to startups.
Bots can easily be incorporated into existing apps (e.g. this very cool demo from ZoomInfo). Incumbents are in a strong position to capitalize on their data when the interaction model is “ask a question to get an answer.”
At first, cyborgs will feel as foreign as a car does to a horse-drawn buggy. The interaction mode will be completely different in a way that will better suit the technology. I’m particularly excited about product designs that leverage adaptive branching (like Loom), rather than supply a single output.
The first GPT cyborgs are just now being created, and entrepreneurs will lead the way in discovering use cases. Generally, opportunities on both ends of the bell curve seem ripe for cyborg businesses: problems that require a high degree of trust and close to perfect accuracy (e.g. the Seek example above), and creative work where there is no answer (e.g. writing a play or musical).
I am excited about how cyborgs have greater defensibility due to the natural data network effects from a “human-in-the-loop” model, and due to their novel UIs requiring a total rethinking from incumbents. But I am just as excited that cyborgs seem to be a better approach to solving specific problems. There need not be “cyborgs for cyborgs sake.” This approach will emerge as a natural solution to the problem at hand, as it did for John and Reibus.
I will use GPT as shorthand throughout for “Large language models trained on next-token prediction.”
This isn’t to say that there is not a place for RLHF/fine-tuning. Of course there is — a base model isn’t going to be optimal for every task. The distinction is around the use of these tools to “cover up the weaknesses” of GPT vs. enhancing it’s ability as a tool designed to work with humans.
*Denotes a Battery portfolio company. For a full list of all Battery investments, please click here.
The information contained herein is based solely on the opinions of Brandon Gleklen and nothing should be construed as investment advice. This material is provided for informational purposes, and it is not, and may not be relied on in any manner as, legal, tax or investment advice or as an offer to sell or a solicitation of an offer to buy an interest in any fund or investment vehicle managed by Battery Ventures or any other Battery entity.
This information covers investment and market activity, industry or sector trends, or other broad-based economic or market conditions and is for educational purposes. The anecdotal examples throughout are intended for an audience of entrepreneurs in their attempt to build their businesses and not recommendations or endorsements of any particular business.
Content obtained from third-party sources, although believed to be reliable, has not been independently verified as to its accuracy or completeness and cannot be guaranteed. Battery Ventures has no obligation to update, modify or amend the content of this post nor notify its readers in the event that any information, opinion, projection, forecast or estimate included, changes or subsequently becomes inaccurate.