A Java event processor, compiled ahead-of-time in your browser, pushing 2.75 million events per second through an iPhone with zero garbage collection. The same code on a commodity server is fifty to a hundred times faster. Designed for efficiency gives server performance on a phone or embedded device.
Designed for efficiency. Server performance on a phone or embedded device — the same compiled dispatcher, the same deterministic dispatch, on whatever hardware you ship.
Fluxtion is built on three laws — none of them new. Compilers have applied them for fifty years. What's new is applying them to event-graph composition: same trade, same shape, different problem.
Compilers exist because runtime decisions are expensive. Fluxtion applies the same principle to event-graph wiring: every dispatch decision is made once, when you build, not millions of times per second.
A C compiler emits a binary; reviewers read the disassembly. Fluxtion emits a Java class; reviewers read the dispatcher. The thing you deploy is the thing you can sign off on.
Compiled code is deterministic because the compiler decided the order. Fluxtion graphs are deterministic for the same reason. You don't test for it; you can't accidentally lose it.
Same workload over its lifetime. One pays the cost at run time, on every event, forever. The other pays it once, at design time, and then ships a dispatcher that's cheap to run on any hardware.
Cheap to start. Every event added thereafter costs CPU, latency, and dollars. Twice the volume = roughly twice the bill. Requests-per-dollar falls because every request still runs through the same per-event runtime decisions.
One spike at design time (compiler does the work once). After that, runtime cost and latency are flat for the life of the processor. Twice the volume costs almost nothing extra. Requests-per-dollar rises linearly with scale.
Two ways to apply rules to a stream of events. One was decided once, at compile time. The other re-decides every event, with a rate-limited model in the loop and a guardrail pass on the way out. The numbers are stark — and they don't account for the auditability gap, which is the bigger story.
| Dimension | Fluxtion dispatcher | LLM call + guardrails per event |
|---|---|---|
| Cost per million events | $0.0001 — CPU cycles only | $10 – $1,000 |
| LLM range: cached small model → guardrailed frontier model. | ||
| Latency per event | 300 – 400 nanoseconds | 300 – 3,000 milliseconds |
| Six to seven orders of magnitude. Same workload, different physics. | ||
| Determinism | Same input → same output, every time | Stochastic — even at temperature 0, model upgrades drift behaviour |
| Regulatory audit asks: can you reproduce the decision? One answer is yes. | ||
| Audit artefact | A single Java class + GraphML topology | A prompt template + model card + guardrail config + telemetry |
| The auditor needs to read it. One of these is 200 lines; the other is a moving target. | ||
| Debugging a misfire | Set a breakpoint on the offending node | Re-prompt with different examples, hope it sticks |
| Debugging a Fluxtion graph is debugging a normal program. | ||
LLMs have a real place — open-ended classification, language tasks, anything genuinely fuzzy. None of those are stream-processing workloads in the compute sense. When the rules are knowable, a compiled dispatcher is six orders of magnitude cheaper and runs in the same place every time.
Worked example: a fraud-detection stream at 10,000 events/sec is 26 B events/month. At $10/M (cheap small-model + caching) that's $260,000 per month, before guardrails. The same workload on a single Fluxtion dispatcher fits in well under one core.
Most stream-processing frameworks are slow because the runtime decides a lot. Fluxtion decides everything at compile time and emits a Java class that just dispatches. The properties below are consequences of that decision, not features to be tuned.
The event graph is compiled to a single Java dispatcher class before the program runs. No graph traversal at event time, no scheduler thread, no decisions to make. Just method-to-method dispatch baked into straight-line code.
The dispatcher is flat and small enough to stay hot in L1 icache. No reflection in the hot path; the JIT inlines aggressively. Nodes allocate once at startup. The shape of the code matches what the CPU is good at — you don't tune it, you just don't fight it.
Every event of a given type runs the same code path, dispatching in the same order. The branch predictor sees no surprises. A reactive-stream framework hides each dispatch behind virtual calls and a scheduler — the predictor mispredicts; the pipeline stalls.
Steady-state operation allocates nothing. Nodes are pre-allocated; events pass through them as references; no per-event garbage means no GC pauses. The live demo records zero collections across fifteen million events.
Direct method invocation between nodes — no reactive trampoline, no scheduler frame, no Future / Promise / Subscriber chain. A stack trace from inside an event handler reads like a normal Java program because it is one.
Dirty-flag propagation: events only do work in the parts of the graph whose state actually changed. The compiler computes this at build time; the runtime just follows the precomputed dispatch order. No conditionals, no fan-out search.
The most expensive performance problem in any system is the engineer staring at a flaky production bug. A reactive-stream framework hides the dispatch order inside a scheduler; an LLM-driven flow hides it inside a stochastic model. Either way, the cost of diagnosing a misfire is paid by your senior engineers, in wall-clock hours, on every incident.
A Fluxtion graph is a Java class. The runtime dispatch order is in front of you. The cost of diagnosing a production bug drops by an order of magnitude because the artefact you're debugging is the same thing the JVM is executing.
A production misfire reproduces locally because the dispatcher is deterministic. Capture the input event stream once, replay against the same dispatcher, breakpoint on the failing node. No "couldn't reproduce, closing as flaky" backlog.
On-call gets paged at 3 a.m. They open the generated dispatcher class and read straight-line Java — the entire dispatch order in one screen. No stream graph diagram to mentally simulate, no scheduler internals to grok before they can start.
Every node is a plain Java class. Every test is a JUnit method that feeds events and asserts on output. No embedded broker, no test container, no async-await harness. CI runs the whole suite in seconds.
A junior engineer can review the change because the generated dispatcher is the diff. PRs are about graph topology, not about which Subscriber subclass the framework will pick. The bus factor of any single graph is one engineer.
The honest math. A senior engineer fully loaded is around $300 / hour. A single four-hour "couldn't reproduce" investigation costs more than a year of CPU for an entire Fluxtion dispatcher. The cheapest way to make a system faster is to make it easier to understand when it breaks.
The headline number from the live capture: 2.75 M events/sec, measured over a five-second wall-clock window, on an iPhone running Safari. No native plugin, no app install — a CheerpJ-hosted Java 8 JVM under WebAssembly, dispatching the same processor class that would run on a server.
Three things are worth understanding about this number.
The browser clock is rate-limited to ~1ms on iOS Safari to prevent timing-side-channel attacks. The dispatcher processes events in 300–400 nanoseconds — well under the clock's resolution. The benchmark times 10,000-event batches instead of single events, so every sample is real nanoseconds, and every percentile is a real number. The histogram clumps into one or two buckets because the dispatcher's per-event cost is genuinely stable, not because the chart is rounding.
On iOS, Apple requires every browser to use the system WebKit JavaScript engine. Chrome-on-iPhone is Safari-on-iPhone with different chrome. The performance difference between them on this workload is roughly zero. Desktop Chrome — running V8 with its own JIT — pushes more events per second than the phone, but the gap is much smaller than the "desktop crushes phone" intuition suggests, because the dispatcher is small enough to fit cleanly in either CPU's instruction cache.
CheerpJ is a Java 8 JVM compiled to WebAssembly. It has no C2 compiler, no escape analysis, no code-cache warm-up — none of the tricks a real HotSpot JVM applies. The same dispatcher running under OpenJDK on commodity hardware measures single-digit nanoseconds per event — closer to 50–100 million events per second. The phone number is a live-AOT demonstration, not the upper bound.
JMH reference: telaminai.github.io/fluxtion/reference/performance
The same compiled processor, measured across environments. On a real JVM the framework's dispatch overhead is a handful of nanoseconds — the cost you actually pay is your own node logic.
| Environment | Per-event latency | Throughput |
|---|---|---|
| Phone — CheerpJ (Java 8 → WebAssembly, Safari) | ~330 ns | ~2.75 M/sec |
| Real JVM — bare dispatch overhead | 5–10 ns | — |
| Real JVM — with application work (4-node market-data graph) | ~20 ns | ~50 M/sec |
Zero GC in steady state — the hot path allocates nothing; tail latency p99.99 ≈ 83 ns. Server numbers measured on an Apple M3 MacBook Air, JDK 21, JMH (single-threaded), 10,000-event iterations. Full methodology: fluxtion/reference/performance · runnable benchmark: fluxtion-examples / aot-compiler.
Run the benchmark on the device you're holding. Open the generated Java class and read it. That's the artefact your team will deploy and your auditor will sign off on.