AI Is Becoming System Design

There is a noticeable shift happening in AI conversations lately. Not just in what models can do, but in where attention is moving. A year or two ago, most discussions centered on raw capability. Bigger models, better benchmarks, more impressive demos. Now the more interesting question is what happens when those capabilities begin to settle into real systems, real workflows, and real human environments.

That is what stood out to us while reflecting on the recent *The Future Live* discussion with Matt Berman and Nick Wentz. The guests came from very different domains — enterprise software, voice AI, long-term memory infrastructure, and healthcare — but they were all circling the same reality. AI is no longer just a layer of intelligence sitting on top of software. It is starting to become part of the operating logic underneath it.

That distinction matters.

When Rob Seaman spoke about Slack evolving into an operating system for AI agents, it pointed to something many product teams are beginning to feel firsthand. If agents become regular participants in work, then software has to support more than human interaction. It has to support delegation, coordination, permissions, context, and execution at machine speed. The interface is no longer just designed for a person clicking through a dashboard. Increasingly, it must also accommodate systems that read, decide, act, and hand work back to people only when needed.

This changes how we think about enterprise tools.

For a long time, “good software” often meant a better user interface, cleaner workflows, and faster access to information. Those things still matter, of course. But in an agent-driven environment, the center of gravity shifts toward structure. APIs matter more. Documentation matters more. Permissions models matter more. Event systems matter more. The parts of software that used to feel secondary suddenly become the foundation.

There was a striking idea in the conversation that we may soon have 100 to 1,000 times more agents than people using software. Even if that estimate turns out to be optimistic, the direction feels right. And if that is where things are going, then businesses need to stop treating AI readiness as a feature checklist. It is closer to an architectural question. Is your system legible to machines? Can it be acted on safely? Can it expose the right context at the right time? Can it recover gracefully when an agent makes a wrong move?

These are not abstract concerns anymore. They are design requirements.

The segment on voice AI pushed this even further into the real world. Verun Vumati’s perspective on customer support was especially telling because support is one of those domains where the gap between a good demo and a good product is enormous. It is easy to make a voice agent sound fluent for thirty seconds. It is much harder to make it useful across thousands of messy, emotional, repetitive, high-friction interactions.

That is why the most important part of the voice AI discussion was not novelty. It was improvement over time.

A support system that can resolve more issues as it accumulates experience starts to look less like a scripted bot and more like operational infrastructure. The promise is not simply automation. It is compounding competence. Every resolved case becomes part of a feedback loop that makes the next interaction better, faster, and more accurate.

From a product and engineering perspective, this is where things get serious. Voice AI is not just speech recognition paired with an LLM. To work reliably, it needs orchestration, retrieval, memory, escalation paths, monitoring, and clear boundaries. It needs to know when to continue, when to ask for clarification, and when to hand off to a human. In other words, it needs judgment, or at least a close approximation of it.

That often gets lost in public conversations about AI. We focus on the intelligence, but intelligence alone does not create trust. Trust comes from repeated, predictable performance under imperfect conditions.

And that brings us to what may be one of the most important ideas raised in the show: memory.

Charles Packer’s point that memory, not model size, may be the real bottleneck in AI feels especially relevant right now. In practice, many AI systems still behave like brilliant short-term thinkers. They can reason impressively within a prompt, but they do not naturally accumulate stable, useful understanding over time unless memory is designed deliberately. Without that, every interaction starts from scratch, every task requires re-explaining context, and every workflow remains more fragile than it should be.

Memory changes the equation.

It is what allows software to become adaptive rather than merely responsive. It is what lets an assistant remember preferences, preserve continuity across sessions, and develop a working understanding of a user, a team, or a process. In enterprise settings, it can reduce repetition and make agents genuinely helpful. In consumer products, it can make experiences feel less transactional and more coherent.

But memory is also where many of the hardest questions begin.

The more an AI system remembers, the more we need to ask what it should remember, how long it should retain it, who can inspect it, and how it can be corrected. Memory is useful, but it is also governance. It is personalization, but also risk. As soon as systems become persistent, security is no longer a side concern. It becomes part of the product experience itself.

We have seen this in our own thinking around AI systems. It is tempting to approach memory as a feature that improves output quality, and that is certainly true. But memory also shapes accountability. If an AI agent acts on prior context, then that context has to be auditable. If it makes a recommendation based on a user’s history, then that history needs boundaries. If memory becomes foundational, then so does deletion, review, and consent.

This is one reason the healthcare discussion felt so important.

Dr. Dominic King’s focus on consumer health, mental health support, preventive care, and safe accessibility points toward one of the highest-value and highest-responsibility applications of AI. Healthcare makes the stakes impossible to ignore. An AI health experience is not just another convenience layer. It operates in the presence of vulnerability, uncertainty, and asymmetry of knowledge. People use these tools when they are anxious, confused, overwhelmed, or trying to make sense of symptoms at difficult moments.

The number cited — 50 million health-related queries daily — says a lot on its own. The demand is clearly there. People are already looking for health guidance at scale. The question is not whether AI should enter that space. It already has. The deeper question is what kind of systems deserve to stay there.

In healthcare, usefulness is inseparable from safety. A better interface is not enough. A more conversational answer is not enough. Systems need guardrails, calibrated confidence, careful escalation, and a deeply considered approach to uncertainty. They need to know the difference between support and diagnosis, between helping someone navigate information and pretending to replace clinical judgment.

That is also why healthcare tends to reveal broader truths about AI product design. If a system cannot communicate uncertainty clearly, it is not ready. If it cannot preserve context responsibly, it is not ready. If it cannot balance personalization with privacy, it is not ready. Healthcare simply makes these requirements visible earlier and more sharply than other industries.

One thread running quietly through all of this is infrastructure. The show referenced the launch of new model generations and the astonishing increase in computing energy efficiency — 10,000 times, by one estimate. Those numbers are dramatic, and they matter. Better models and cheaper computation are what make all of this possible. But once that foundation is in place, the real differentiator becomes system design.

We are entering a period where many organizations will have access to similarly powerful intelligence. The advantage will not come only from choosing the best model. It will come from integrating intelligence into processes well, responsibly, and with a clear understanding of where autonomy helps and where it should stop.

That is a more grounded phase of AI adoption, and frankly, a healthier one.

It invites fewer grand declarations and more practical questions. What tasks should an agent own end to end? What context does it need? How should it ask for help? What data should it retain? What does success actually look like after six months, not just on launch day? These are less flashy questions than benchmark scores, but they are the questions that determine whether AI becomes durable.

If there is one broader lesson we took from the conversation, it is that AI is maturing from spectacle into systems thinking.

The most important work now is not just making models smarter. It is making products more coherent around them. It is building software that agents can use without breaking things. It is creating voice systems that improve through lived operational feedback. It is designing memory that is genuinely useful without becoming unsafe. It is bringing AI into health and other sensitive domains with humility, not just ambition.

That kind of progress is slower than the headlines suggest, but also more meaningful.

And maybe that is the most encouraging part. Beneath the noise, the industry seems to be asking better questions. Not just what AI can do, but how it should fit into the way people work, ask for help, make decisions, and trust the tools around them. That feels less like a passing technology cycle and more like the beginning of a real software transition.

The future of AI may still be defined by breakthroughs in models. But the future of useful AI will be defined by memory, interfaces, architecture, and restraint.

That is a more demanding challenge.

It is also a much more interesting one.