What is the Software Teardown Series?
There's a classic meme in the Valley: software companies build for software companies build for software companies build for … you get the idea. It’s practically impossible to figure out where the cycle ends.
And like any good meme, it's rooted in very real truth. Most corporate tech talks and deep dives, especially on AI, focus on applications in code writing, code review, developer tooling, etc. It rarely makes it into the physical world; where real people and real product reside.
At DOSS, we're building operations software for the real world: the people in warehouses doing inventory counts, folks in accounting, supply chain teams, consumer goods companies, and manufacturing businesses. These aren't your typical SaaS buyers. Our customers are figuring out how to do real-world tasks, and hoping software solutions can enhance them.
We started the Software Teardown series to invite other companies and individuals back to the real world - to figure out how the software world meets the physical world.
Another Panel on Agents?
A couple weeks ago we hosted a panel at our office on agents in production, bringing together Tomasz Tunguz from Theory Ventures, Philip Cerles from Harvey, Abhi Aiyer from Mastra, Matthew Rastovac from Salesforce, and DOSS’s own Arnav Mishra to talk about deploying AI agents in Real World scenarios.
We wanted this panel to be different. Focused on agentic applications for industries that have so far been underserved by the AI revolution. The question isn't just how agentic AI works. It's how it can work for people outside the Bay Area tech bubble, and the unique challenges that come from building for those use cases.
The classic solutions don’t work here - we need something that can lower the barrier to entry to a relatively non-technical audience while providing outsized value on the effort put in.
The Evolution of AI Agents
Prior to an acquisition by Salesforce, Matt founded Respell to build agentic workflows. They launched research agents back in March 2023, when the internal teams didn't even agree on "Agentic LLMs" as a standard term. Those agents were primitive: one loop through a search API, web reading, no memory, no guardrails. Yet they were the most popular feature because the need was clearly there.
Similarly, Abhi shared that when Mastra started in January last year, vertical agents were all the rage. Everyone was building agents for specific use cases. In both cases, those simple agents for every use case have evolved into a sprawl of skills and tool calls that can be applied to agents as needed.
Eventually, folks have had to start deciding whether it’s best to build one agent that calls sub-agents versus one agent with multiple skills and tools. As Matt called out - in many ways, they're the same thing. It comes down to what the agent is supposed to do, how your engineering teams are organized, and what the scope of the problem actually is.
Today there's this umbrella category called "agents," but the only system unifying them is an LLM running in a loop. How you do guardrails, context management, and memory depends on whether you're building research agents, service agents, coding agents, or application-specific ones.
Shifting Dynamics
The biggest shift in the last six months has been human familiarity with agents. At Harvey, Philip noted that "agentic" was previously almost a dirty word internally, but that’s since changed. Six months ago, people were building deterministic flows for everything. Someone wants case law research? Write it down step-by-step: upload document, search knowledge base, cross-check, return results.
The problem emerged when trying to compose these flows. Things broke. They didn't work together. The breakthrough came from newer/better models and from switching to agent patterns. The determinism product managers craved wasn't actually there in the first place. Those deterministic flows wouldn't work end-to-end anyway. The agents ended up working better because they could handle inherent variability. As agents took over workflows, companies could actually enforce processes that, it turned out, weren't being enforced before.
An Exercise in Trust
Someone from the audience made the point well: "The world is bigger than San Francisco." Companies outside the Bay Area want experts who can teach them about AI firsthand. But they also need to see it work. Abhi mentioned that for companies adopting AI, having people they trust at those companies championing the technology makes all the difference. It's not just about the capability. It's about seeing someone like them successfully use it.
This trust shift isn't just within tech companies. At DOSS, we're replacing the vast majority - and eventually all - of traditional ERP implementation work with agentic solutions. The biggest problem in ERP delivery has never been technology. It's humans. Put 15 humans in a room working on an ERP installation and you get dramatically less determinism than one AI agent. Traditional ERPs have failure rates exceeding 50%, cost six to seven figures, and take years.
At DOSS, our FDE team’s job is rapidly becoming the first and last 10% of implementations coupled with an exercise in building system trust. They deeply understand the business, drive the agent forward, and handle the human interaction. In doing so, they are positioned to bring the trust in agentic systems, especially those in Doss’ system, to the real world.
Customers, especially outside the tech bubble, still want humans as their primary touch points. However, what happens behind the scenes is a choice for each software company.
The Authorization Problem
Matt asked the audience, “who uses ‘dangerously skip permissions’ as their default with coding agents?” Lots of hands went up. That's a problem right there.
The naive approach is giving agents the same permissions you have. However, as humans, we all have internal alarm bells that say "this feels like a bad idea" that help us make judgements about accessing or using information. Think about your own authorization, what information do you have access to but decide not to use. Agents don't have those same guardrails yet.
Abhi outlined the spectrum we're seeing in production: no auth (surprisingly common), giving agents your credentials, creating agent roles in your authorization system, or OAuth 2.1 with agents having their own accounts. However, when you take it out of just the technical design doc and look at the applications, the complexity grows larger.
For example, a single law firm can sometimes represent competitors on different matters. A legal search on Harvey across the company needs to respect ethical walls. Complete competitive separation. Their entire product is built around enforcing these boundaries. Similarly, at DOSS, we have to separate actors across human and agentic systems: determining what an AI can do based on the person who authored it and who actually ran it.
On the flip side, some systems required privileged execution for AI systems. Matt gave a good example: as a product manager at Salesforce, he can't access and change code. But it would be helpful if he could ask an AI to summarize what an endpoint does or why something can't be done in two weeks. The agent might have read access for summarizing even though he doesn't.
Ultimately, the end state of agentic authorization is far too nascent to actually deliver a comprehensive solution yet. Models and model wrappers will continue to iterate on the best practices for the time to come.
Memory as a Core Tenant
For a chat to have memory of past conversations and who you are is table stakes at this point.
Abhi broke down the types of memory we're seeing: long-term memory that stores every message - better than human memory because it never decays as long as you pay your bill, semantic recall that indexes your memory so you can retrieve it, observational memory where the agent maintains a scratch pad noting preferences, and procedural memory using the skills pattern from tools like Cursor where you complete a task, save how you did it, and reuse that process.
Abhi framed it well: if an agent has memory, it should be like a human. Memory for recall, memory for quick math. Who remembers what they wrote on the scratch pad during a math test? But it's all memory. Different types serving different purposes.
As we build at DOSS, the distinction between data memory and procedure memory is crucial. Data is transient. Inventory counts change constantly. But process? That's what you want to remember. How you complete a task must get encoded into sub-agents that execute consistently.
We’ve all heard the classic maxim. There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors. Memory actually opens a whole new world of cache invalidation type challenges.
Once you introduce memory, you run into context poisoning and context staleness. Polluting the memory can mean information that doesn't lead to the goal; or worse, attacking agents by injecting false memories to make them operate how you want. That's a real cybersecurity issue.
Philip mentioned traceability is a problem with black box approaches. You need to see the chain of causality when output is incorrect. Harvey can show customers which specific memory or context led to a decision. Furthermore, clients can indicate memory retention periods
What This Means
The companies that win will be those that figure out the boring stuff: solid authentication, proper memory management, good observability, clear guardrails. The AI capabilities are evolving rapidly. The infrastructure and discipline around them? That's on us to build in order to bring down the barrier to entry and let new players into the space.
Agents aren't replacing people. They're enabling people to focus on what humans do best: understanding context, making taste/judgment calls, building relationships, handling that final 10-20% where expertise and intuition matter most.
One of our customers told us: "We chose DOSS because no matter how big we get, we have no doubts about the system. It just works without us thinking about the underlying setup."
That's the goal. The technical complexity should be invisible to users who just want to understand their business metrics.