Written by

DeepMerge Team

Published on

How our agents learn from every decision they make

Your best operations person knows things nobody wrote down. Returns on a specific SKU are always defective — just approve them. A particular customer files a dispute every quarter — it's never legitimate. Refunds under $30 aren't worth investigating. When that person leaves, the knowledge walks out the door. DeepMerge agents build this institutional knowledge automatically. Every decision they make — and every correction you give them — becomes a precedent that informs future decisions. ## The first run is the hardest The first time an agent runs a new procedure, it has no history. It follows your instructions, calls your connected tools, gathers data, and makes decisions based on what it finds. After the run completes, every decision gets stored. Not just what it decided, but the full context: what data it looked at, what policy it applied, what the outcome was, and whether you approved or corrected it. ## The second run is already smarter When a similar situation comes up — another refund request for the same product, another customer with the same dispute pattern — the agent doesn't start from scratch. Before making a decision, it searches for relevant precedents. It finds last week's decision on a similar refund, sees that you approved it, notes the reasoning, and applies the same logic. This isn't a fixed rule engine. The agent doesn't blindly copy past decisions. It uses precedents as context — one input among many — and still evaluates each case on its merits. But precedents anchor the decision, reducing drift and inconsistency. ## Corrections make it better When the agent pauses for approval and you choose a different option — that correction is the most valuable signal in the system. Your refund procedure recommends denying a return because it's outside the 30-day window. You override and approve because the customer is a VIP with $2,000 in lifetime spend. Next time a VIP customer requests an out-of-policy return, the agent finds that precedent. Instead of recommending denial, it recommends approval with the precedent cited: "Based on a similar decision from March 12, VIP customers with high lifetime value have been approved for out-of-policy returns." Every correction teaches. Not through retraining — through accumulated context that the agent retrieves and reasons about in real time. ## The compounding effect In the first week, your agent calls Shopify and Stripe directly for every decision. Thorough but slow. By the second month, the agent finds relevant precedents for most situations before it even calls an external API. It knows your refund patterns, your customer risk profiles, your exception policies — not because someone programmed them, but because it learned them from your actual decisions. Faster resolution. Fewer approval requests. More consistent treatment of similar cases. And a knowledge base that captures how your operations team actually works — not how a policy document says it should work. ## The honest limits The learning loop works best for recurring decisions — refunds, disputes, risk scoring, customer routing. High-volume, pattern-heavy operations where precedents apply directly. It works less well for novel situations with no precedent. A new product category, a first-time fraud pattern, a policy change. In these cases, the agent falls back to your written instructions and the approval system. The precedent base catches up over the next few runs. Past decisions can become stale. A precedent from six months ago may reflect a policy you've since changed. The agent prefers recent precedents, and you can update or remove stored knowledge at any time. But stale precedent cleanup is a manual step.