Chang She wore a programmable LED hat to moderate our panel. Anyone in the room could connect to it and change what it displayed. A fitting metaphor for agent security. Nobody hacked it, but the night was young.
This past Thursday, we co-hosted an evening with LanceDB at the Hanwha AI Center in San Francisco. A full house was on hand to hear Harjot Gill (CodeRabbit), João Moura (CrewAI), and our own Devin Stein talk about what it takes to run agentic systems in production. Chang opened with a poll. "Now that agents can do everything for you, do you feel less busy or more busy?" Two people raised their hands for less busy. Everyone else raised theirs for more.

The product you shipped six months ago is gone
Every panelist on stage has been building in this space for over two years, and they all said basically the same thing. They're working on a different product than the one they started.
CodeRabbit has revamped its engineering org, where a new codebase is almost entirely written by AI for AI. Humans are no longer expected to operate in this new codebase. This isn't a side experiment. It's their production system.
Two years ago at Dosu, the hard problem was fitting complex workflows into 4-8K context windows. Then context windows grew, and that problem evaporated. Instruction-following challenges replaced it. Now models follow instructions well enough that the real problem is building the right sandbox for agents to work inside. "The only constant is that agents are probably going to get better," Devin said. "Having them operate in a computer-like environment and designing your infrastructure around that assumption is probably the most foundational thing you can do right now."
Embeddings have held up better than most of the stack. Set them up well once, and they power use cases across sales, support, and engineering. The model layer and orchestration layer keep shifting underneath.

Every engineer is now a developer experience engineer
As coding becomes more automated, the job shifts toward building environments that enable agents to succeed. That looks a lot like what DevEx teams have always done for humans. CI checks so you trust your code. Portable local environments. Good documentation.
DX's Q4 2025 AI-assisted engineering report found that non-AI bottlenecks like meetings, review delays, and slow CI pipelines still dominate, and that AI amplifies whatever engineering culture already exists. Teams with weak practices accumulate debt faster. Teams with strong foundations ship faster. "AI just amplifies everything," Devin said. "Any friction, you feel a lot more."
Every time you set an agent loose on a codebase, it's onboarding a very smart intern. It looks for usage examples, conventions, and documentation on how to do things. If those don't exist, the agent fills in the gaps itself. Sometimes well. Often not. This is a big part of what we're building at Dosu. The knowledge layer that agents depend on has to be accurate, current, and structured in ways they can actually navigate.
The inner loop is gone
The inner loop of writing code has mostly disappeared. What remains are the outer loops of planning and review. "Either you are creating a prompt for these agents to go and work on, or you are validating the work," Harjot said. "In the middle, it's all automated."
Planning now means specification and decomposition. If you can break a task into well-specified parts, agents can take them end-to-end. Review has changed, too. CodeRabbit maintains documents that define invariants, things like product flows and security postures that shouldn't change. If an agent touches those files, the change routes to a separate review group. They're investing in formal methods like TLA+ to validate distributed systems algorithms that agents produce. The mistakes agents make differ from human mistakes, so verification has to be different, too.

At CrewAI, the internal Slack bots began conversing with one another. "We had two bots start the conversation in Slack chat," João said. "The bosses are like, what is going on here? Are we paying tokens for the sake of paying tokens?"
When CrewAI rolled out its internal AI assistant, it made the channel public and intentionally didn't allow direct messages. People who wouldn't have tried it saw other people using it, grabbed the same prompts, and started building on each other's workflows. Adoption took off from there.
When code is cheap, the cost of experimentation drops. Product managers are opening PRs. Support engineers are prototyping fixes. The old gatekeeping around what gets built stops making sense as build costs approach zero.
What breaks when you go from one agent to a hundred
Harjot's engineers are running five CodeX work trees simultaneously. The bottleneck is their laptop RAM. CI/CD pipelines had to be refactored because the volume of agent-generated code overwhelmed them. "I am waking up in the middle of the night just to make sure that I keep up with these systems," he said.
Agents also expose previously unnoticed data quality problems. One agent works fine. A hundred agents, and suddenly, mislabeled data shows up everywhere. "No one was looking at it, so it was not an issue," João said. "Now agents have put a huge spotlight on it."
Pricing is getting weird. LinkedIn built a recruiting agent and priced it cheaper than a human seat. Customers started canceling their human seats for the agent. LinkedIn had to roll it back.
And then there's PII. Even with proper authentication and scoping, CrewAI found sensitive HR data showing up in OpenTelemetry traces piped to Datadog. Nobody planned for that. They ended up building an entire feature for PII sanitization on traces because the observation layer became a leak of its own.
The knowledge problem
"If you have static context or stale context, the agents are going to go off track," Harjot said. CodeRabbit builds graph-based knowledge systems that learn from each pull request, keeping summaries up to date for the next agent run.
We see a version of this with Dosu customers. A team plugs in their Confluence or Notion MCP, and the agent immediately finds four docs that say different things about the same topic. Sometimes it picks the right one. Sometimes it doesn't. Better retrieval alone won't fix that. You have to prune, maintain, and structure knowledge so agents can navigate it.
"You can read all the meeting notes you want," João said. "But if you go to Slack and start to capture that data, then you start to see the actual problems that people face today." The documented knowledge tells you what should be true. The conversational knowledge tells you what is.
Everyone on the panel has been gravitating toward the file-system metaphor for structuring agent knowledge, though João pushed on what that actually means. "I don't think people actually mean a file system literally. I think they mean a hierarchy of metadata." Hierarchical knowledge that agents can explore with indexes. Less context of higher quality, every time, rather than dumping everything into a giant prompt.

Hot takes
Chang asked each panelist for their most contrarian view.
João on technical moats. "You say, here this software is good, everyone's going to copy this in three to five years. I think you now have maybe four weeks."
Devin on hiring. "Going slow to actually understand how things work to build the foundation where you can execute really fast is harder when there is this temptation just to let the agent do it. But I do think that's still very, very important." It's a hard market to be a junior right now.
Harjot on where the bottleneck has moved. "Coding has become a $200/month problem." Architecture is the layer agents handle well because there are only a few right ways to build something. ("If you were to build Unix today, it would look like Unix.") Above that is the product, where taste and decision-making matter and agents struggle. Then marketing. Then sales. "The UX and the product decisions have become a bottleneck. It's really the product managers that are the bottleneck."
Someone in the audience had seen a pitch the night before in which two technical founders showed their use-of-funds deck, with a single line item for engineering. Everything else was marketing. They planned to have one senior manager and 100 agents. No engineering team.

Four months ago, this conversation wouldn't have happened
"We would not have had this conversation back in November, four months ago," João said. "If in four months the way to do coding has changed so much, how can you still be selling the same thing?"
"It's not only Salesforce trying to reinvent itself," he said. "You should be thinking about your company as well."
Devin's last word. "Even if you tried something before and it didn't work, try it again. Everything's moving so fast. Experimentation is the only way to learn what works."
We walked out of the Hanwha AI Center thinking about the same thing we think about every day at Dosu. The models keep getting better. The teams that invest in their knowledge layer, the context, and documentation that feed those models, will move faster than those that don't.
This event was co-hosted by Dosu and LanceDB at the Hanwha AI Center in San Francisco. Thanks to Chang for moderating and to Harjot, João, and Devin for the conversation.
If any of this sounds familiar, come talk to us or find us on Discord.



