Performance

Server performance on a phone.

A Java event processor, compiled ahead-of-time in your browser, pushing 2.75 million events per second through an iPhone with zero garbage collection. The same code on a commodity server is fifty to a hundred times faster. Designed for efficiency gives server performance on a phone or embedded device.

Designed for efficiency. Server performance on a phone or embedded device — the same compiled dispatcher, the same deterministic dispatch, on whatever hardware you ship.

The design trade

Pay the design cost once. The runtime gets cheaper forever.

Fluxtion is built on three laws — none of them new. Compilers have applied them for fifty years. What's new is applying them to event-graph composition: same trade, same shape, different problem.

Pay the cost at compile time, not at run time.

Compilers exist because runtime decisions are expensive. Fluxtion applies the same principle to event-graph wiring: every dispatch decision is made once, when you build, not millions of times per second.

The artefact is the contract.

A C compiler emits a binary; reviewers read the disassembly. Fluxtion emits a Java class; reviewers read the dispatcher. The thing you deploy is the thing you can sign off on.

Determinism is a property, not a goal.

Compiled code is deterministic because the compiler decided the order. Fluxtion graphs are deterministic for the same reason. You don't test for it; you can't accidentally lose it.

The trade, drawn.

Same workload over its lifetime. One pays the cost at run time, on every event, forever. The other pays it once, at design time, and then ships a dispatcher that's cheap to run on any hardware.

Without Fluxtion

Low design cost. Runtime + latency climb forever.

Runtime cost Latency Design cost Requests / $
Without Fluxtion — runtime cost climbs, requests/$ falls.HighLowLow volumeHigh volume →

Cheap to start. Every event added thereafter costs CPU, latency, and dollars. Twice the volume = roughly twice the bill. Requests-per-dollar falls because every request still runs through the same per-event runtime decisions.

With Fluxtion

Design cost up-front. Runtime + latency stay flat.

Runtime cost Latency Design cost Requests / $
With Fluxtion — flat runtime, rising requests/$.HighLowLow volumeHigh volume →

One spike at design time (compiler does the work once). After that, runtime cost and latency are flat for the life of the processor. Twice the volume costs almost nothing extra. Requests-per-dollar rises linearly with scale.

Cost vs an LLM call per event

A compiled dispatcher vs a per-event prompt + guardrail layer.

Two ways to apply rules to a stream of events. One was decided once, at compile time. The other re-decides every event, with a rate-limited model in the loop and a guardrail pass on the way out. The numbers are stark — and they don't account for the auditability gap, which is the bigger story.

DimensionFluxtion dispatcherLLM call + guardrails per event
Cost per million events$0.0001 — CPU cycles only$10 – $1,000
LLM range: cached small model → guardrailed frontier model.
Latency per event300 – 400 nanoseconds300 – 3,000 milliseconds
Six to seven orders of magnitude. Same workload, different physics.
DeterminismSame input → same output, every timeStochastic — even at temperature 0, model upgrades drift behaviour
Regulatory audit asks: can you reproduce the decision? One answer is yes.
Audit artefactA single Java class + GraphML topologyA prompt template + model card + guardrail config + telemetry
The auditor needs to read it. One of these is 200 lines; the other is a moving target.
Debugging a misfireSet a breakpoint on the offending nodeRe-prompt with different examples, hope it sticks
Debugging a Fluxtion graph is debugging a normal program.

LLMs have a real place — open-ended classification, language tasks, anything genuinely fuzzy. None of those are stream-processing workloads in the compute sense. When the rules are knowable, a compiled dispatcher is six orders of magnitude cheaper and runs in the same place every time.

Worked example: a fraud-detection stream at 10,000 events/sec is 26 B events/month. At $10/M (cheap small-model + caching) that's $260,000 per month, before guardrails. The same workload on a single Fluxtion dispatcher fits in well under one core.

Why it's fast

Why it runs at near-native speed.

Most stream-processing frameworks are slow because the runtime decides a lot. Fluxtion decides everything at compile time and emits a Java class that just dispatches. The properties below are consequences of that decision, not features to be tuned.

AOT compilation

The event graph is compiled to a single Java dispatcher class before the program runs. No graph traversal at event time, no scheduler thread, no decisions to make. Just method-to-method dispatch baked into straight-line code.

Mechanical sympathy

The dispatcher is flat and small enough to stay hot in L1 icache. No reflection in the hot path; the JIT inlines aggressively. Nodes allocate once at startup. The shape of the code matches what the CPU is good at — you don't tune it, you just don't fight it.

Branch prediction

Every event of a given type runs the same code path, dispatching in the same order. The branch predictor sees no surprises. A reactive-stream framework hides each dispatch behind virtual calls and a scheduler — the predictor mispredicts; the pipeline stalls.

Zero garbage collection

Steady-state operation allocates nothing. Nodes are pre-allocated; events pass through them as references; no per-event garbage means no GC pauses. The live demo records zero collections across fifteen million events.

Small call stacks

Direct method invocation between nodes — no reactive trampoline, no scheduler frame, no Future / Promise / Subscriber chain. A stack trace from inside an event handler reads like a normal Java program because it is one.

Factored event dispatching

Dirty-flag propagation: events only do work in the parts of the graph whose state actually changed. The compiler computes this at build time; the runtime just follows the precomputed dispatch order. No conditionals, no fan-out search.

Engineering cost

Performance isn't only CPU. It's engineer-hours too.

The most expensive performance problem in any system is the engineer staring at a flaky production bug. A reactive-stream framework hides the dispatch order inside a scheduler; an LLM-driven flow hides it inside a stochastic model. Either way, the cost of diagnosing a misfire is paid by your senior engineers, in wall-clock hours, on every incident.

A Fluxtion graph is a Java class. The runtime dispatch order is in front of you. The cost of diagnosing a production bug drops by an order of magnitude because the artefact you're debugging is the same thing the JVM is executing.

Reproduce-then-fix, not guess-and-pray

A production misfire reproduces locally because the dispatcher is deterministic. Capture the input event stream once, replay against the same dispatcher, breakpoint on the failing node. No "couldn't reproduce, closing as flaky" backlog.

One file to read, not a topology

On-call gets paged at 3 a.m. They open the generated dispatcher class and read straight-line Java — the entire dispatch order in one screen. No stream graph diagram to mentally simulate, no scheduler internals to grok before they can start.

Tests run as fast as JUnit

Every node is a plain Java class. Every test is a JUnit method that feeds events and asserts on output. No embedded broker, no test container, no async-await harness. CI runs the whole suite in seconds.

Reviews stay junior-readable

A junior engineer can review the change because the generated dispatcher is the diff. PRs are about graph topology, not about which Subscriber subclass the framework will pick. The bus factor of any single graph is one engineer.

The honest math. A senior engineer fully loaded is around $300 / hour. A single four-hour "couldn't reproduce" investigation costs more than a year of CPU for an entire Fluxtion dispatcher. The cheapest way to make a system faster is to make it easier to understand when it breaks.

The iPhone surprise

2.75 million events per second, on a phone, under Safari.

The headline number from the live capture: 2.75 M events/sec, measured over a five-second wall-clock window, on an iPhone running Safari. No native plugin, no app install — a CheerpJ-hosted Java 8 JVM under WebAssembly, dispatching the same processor class that would run on a server.

Three things are worth understanding about this number.

1. It's an honest measurement, not a synthetic.

The browser clock is rate-limited to ~1ms on iOS Safari to prevent timing-side-channel attacks. The dispatcher processes events in 300–400 nanoseconds — well under the clock's resolution. The benchmark times 10,000-event batches instead of single events, so every sample is real nanoseconds, and every percentile is a real number. The histogram clumps into one or two buckets because the dispatcher's per-event cost is genuinely stable, not because the chart is rounding.

2. Safari and Chrome are running the same engine.

On iOS, Apple requires every browser to use the system WebKit JavaScript engine. Chrome-on-iPhone is Safari-on-iPhone with different chrome. The performance difference between them on this workload is roughly zero. Desktop Chrome — running V8 with its own JIT — pushes more events per second than the phone, but the gap is much smaller than the "desktop crushes phone" intuition suggests, because the dispatcher is small enough to fit cleanly in either CPU's instruction cache.

3. The real JVM is fifty to a hundred times faster.

CheerpJ is a Java 8 JVM compiled to WebAssembly. It has no C2 compiler, no escape analysis, no code-cache warm-up — none of the tricks a real HotSpot JVM applies. The same dispatcher running under OpenJDK on commodity hardware measures single-digit nanoseconds per event — closer to 50–100 million events per second. The phone number is a live-AOT demonstration, not the upper bound.

JMH reference: telaminai.github.io/fluxtion/reference/performance

Where the time goes

Same dispatcher, phone to server

The same compiled processor, measured across environments. On a real JVM the framework's dispatch overhead is a handful of nanoseconds — the cost you actually pay is your own node logic.

EnvironmentPer-event latencyThroughput
Phone — CheerpJ (Java 8 → WebAssembly, Safari)~330 ns~2.75 M/sec
Real JVM — bare dispatch overhead5–10 ns
Real JVM — with application work (4-node market-data graph)~20 ns~50 M/sec

Zero GC in steady state — the hot path allocates nothing; tail latency p99.99 ≈ 83 ns. Server numbers measured on an Apple M3 MacBook Air, JDK 21, JMH (single-threaded), 10,000-event iterations. Full methodology: fluxtion/reference/performance · runnable benchmark: fluxtion-examples / aot-compiler.

See the dispatcher Fluxtion will give you.

Run the benchmark on the device you're holding. Open the generated Java class and read it. That's the artefact your team will deploy and your auditor will sign off on.