The fastest-growing companies in the AI space today are either selling inference or reselling it, acting as the industry's "first derivative." However, a business model built on a markup of token costs is a race to zero. Such projects function more like payment gateways than software companies. According to analyst Tomasz Tunguz, the "cost-plus" model—where pricing is strictly tied to inference costs (e.g., a 30% markup)—is fundamentally fragile. It caps a customer's willingness to pay at the ceiling of compute costs, turning the product into a transparent and highly inconvenient tax.

Price Commoditization

When a startup builds a wrapper of interfaces and workflows around a model, it must justify a price premium. But as inference becomes a cheap commodity, margins in the cost-plus model inevitably collapse toward zero. Eventually, customers will simply compare your bill to raw API rates and find ways to bypass the middleman. To maintain gross margins above 30%, founders must completely decouple pricing from token expenditures.

Reselling inference at cost is a zero-margin business: you are building payment infrastructure, not a software company.

Transitioning to value-based pricing is the only way to survive compute power deflation. As Tunguz notes, Sierra only bills when an agent successfully closes a ticket, while Devin sells "Agent Compute Units," abstracting the client away from raw hardware. Databricks and Snowflake have used this logic for years by selling internal credits. When you charge for a completed task or a generated report, you capture a share of the value created, and the client becomes indifferent to how much inference was consumed.

Protecting Margins: Distillation vs. BYOK

Optimization remains a key lever for cost reduction, but not all methods are created equal. Model routing and caching are tactical moves that competitors can copy within a week. The real advantage lies in distillation. By routing traffic through top-tier frontier models and transferring their "knowledge" into compact student models (under 8 billion parameters), a company can run on cheap hardware. This creates a unique, hard-to-replicate asset and radically lowers the cost of service delivery.

An additional stress test for this strategy is the "Bring Your Own Key" (BYOK) trend. When a client sees inference costs on their own cloud bill, any markup you add feels like an unjustified toll. Under a platform-based or pay-for-results approach, BYOK stops being a threat: the client pays for the outcome or for a service that makes their own compute budget more efficient. Boards of AI startups today must honestly ask: are they building a legitimate tech company or just painting a pretty dashboard over a payment terminal?

Artificial IntelligenceAI in BusinessCost ReductionAI Agents