May 13, 2026

Your compute bill is growing. The pricing model isn't.

Niklas Hjern

Sharlic logo
Sharlic logo

AI services run on compute that costs real money per inference. Most manufacturers are pricing those services on fixed-fee guesstimates, and when token costs move, the margin disappears. Sharlic tracks actual usage and turns it into the right customer charge, automatically.

Every time your AI service runs; every detection, every analysis, every model call, there's a compute cost. It hits your infrastructure bill in real time, scales with usage, and for most manufacturers right now, goes nowhere near an outgoing invoice.

That gap isn't sustainable.

How it happened

Licensing structures were designed for software that costs the same to run whether it's used once a day or a thousand times. Inference doesn't work like that. The more your customers use the service, the more it costs you to deliver it.

Most manufacturers know this. Few have a clean answer for it. Instead, the cost gets absorbed as infrastructure overhead, justified as a cost of staying competitive, and quietly growing as adoption increases. According to ICONIQ Capital's 2026 State of AI report, inference alone averages 23% of revenue at scaling-stage AI companies, before any other cost of goods. The customers benefiting most from your AI services are, in many cases, the ones costing you the most to serve.

The guesswork problem

The workaround most manufacturers land on is a fixed service price, built on a rough estimate of average compute consumption, with a margin buffer that feels comfortable enough. It works, until it doesn't.

Compute costs aren't fixed. Per-token prices from the major inference providers have fallen dramatically over the past few years. Stanford HAI documented a 280-fold reduction between 2022 and 2024 for equivalent model performance, but the total cost of running a modern AI feature hasn't followed the same curve. Richer inputs, longer reasoning chains, and higher usage volumes mean that what a workload actually costs to run today looks very different from what it cost when you set your price.

And when your inference provider reprices (which they do, often without much notice) your margin assumptions reprice with them. A pricing model built on last year's token costs can get very wrong very fast, and there's usually no early warning. Just a P&L that stops making sense.

What Sharlic does with it

Sharlic lets you define a price per token and attach it directly to each customer service. Usage is tracked, costs are calculated, and the right figure flows to the right invoice.

If a fully variable model isn't right for your services, Sharlic supports included quotas too, a fixed price up to a defined token threshold, with usage-based pricing kicking in beyond it. This mirrors what the industry is converging on: according to Metronome's 2025 State of Usage-Based Pricing report, 61% of SaaS companies now use some form of hybrid pricing, combining a stable platform fee with metered usage components. Flexible enough to match how you want to package the service, precise enough to make sure the economics always hold.

The window to fix this is now

AI services in physical security are early. The VSaaS and AI video analytics markets are growing at double-digit CAGRs, and pricing models are still being set. End customers don't yet have strong expectations about what this should cost. That window closes as adoption matures.

The manufacturers who build the billing infrastructure now are the ones who get to price AI services as a real margin line, not a cost centre they're hoping to offset somewhere else.

Your compute bill is growing. The pricing model isn't.

Niklas Hjern

May 13, 2026

Sharlic logo

AI services run on compute that costs real money per inference. Most manufacturers are pricing those services on fixed-fee guesstimates, and when token costs move, the margin disappears. Sharlic tracks actual usage and turns it into the right customer charge, automatically.

Every time your AI service runs; every detection, every analysis, every model call, there's a compute cost. It hits your infrastructure bill in real time, scales with usage, and for most manufacturers right now, goes nowhere near an outgoing invoice.

That gap isn't sustainable.

How it happened

Licensing structures were designed for software that costs the same to run whether it's used once a day or a thousand times. Inference doesn't work like that. The more your customers use the service, the more it costs you to deliver it.

Most manufacturers know this. Few have a clean answer for it. Instead, the cost gets absorbed as infrastructure overhead, justified as a cost of staying competitive, and quietly growing as adoption increases. According to ICONIQ Capital's 2026 State of AI report, inference alone averages 23% of revenue at scaling-stage AI companies, before any other cost of goods. The customers benefiting most from your AI services are, in many cases, the ones costing you the most to serve.

The guesswork problem

The workaround most manufacturers land on is a fixed service price, built on a rough estimate of average compute consumption, with a margin buffer that feels comfortable enough. It works, until it doesn't.

Compute costs aren't fixed. Per-token prices from the major inference providers have fallen dramatically over the past few years. Stanford HAI documented a 280-fold reduction between 2022 and 2024 for equivalent model performance, but the total cost of running a modern AI feature hasn't followed the same curve. Richer inputs, longer reasoning chains, and higher usage volumes mean that what a workload actually costs to run today looks very different from what it cost when you set your price.

And when your inference provider reprices (which they do, often without much notice) your margin assumptions reprice with them. A pricing model built on last year's token costs can get very wrong very fast, and there's usually no early warning. Just a P&L that stops making sense.

What Sharlic does with it

Sharlic lets you define a price per token and attach it directly to each customer service. Usage is tracked, costs are calculated, and the right figure flows to the right invoice.

If a fully variable model isn't right for your services, Sharlic supports included quotas too, a fixed price up to a defined token threshold, with usage-based pricing kicking in beyond it. This mirrors what the industry is converging on: according to Metronome's 2025 State of Usage-Based Pricing report, 61% of SaaS companies now use some form of hybrid pricing, combining a stable platform fee with metered usage components. Flexible enough to match how you want to package the service, precise enough to make sure the economics always hold.

The window to fix this is now

AI services in physical security are early. The VSaaS and AI video analytics markets are growing at double-digit CAGRs, and pricing models are still being set. End customers don't yet have strong expectations about what this should cost. That window closes as adoption matures.

The manufacturers who build the billing infrastructure now are the ones who get to price AI services as a real margin line, not a cost centre they're hoping to offset somewhere else.