Skip to content

Fabric

Serverless compute for low-latency AI and robotics.

Fabric is the compute layer of CodecFlow. It pools machines from cloud providers and DePIN networks into one programmable compute mesh — accessible through a simple SDK, billed per second, and tuned for latency-sensitive AI work.

Think of it like Modal Labs, built for agents and real-time inference at the edge.

flowchart TD
    SDK["Developer / Agent\ncalls Fabric SDK"]
    SDK -->|"specify: GPU tier,\nlatency target, budget"| COORD["Fabric Orchestrator"]

    COORD --> SCORE{"Score available nodes"}

    SCORE -->|"best match"| SELECT["Select provider node"]

    SELECT --> CLOUD["Cloud GPU"]
    SELECT --> DEPIN["DePIN nodes"]
    SELECT --> ONPREM["On-prem cluster"]

    CLOUD -->|"QUIC/Iroh tunnel"| EXEC["Execute workload\non selected node"]
    DEPIN -->|"QUIC/Iroh tunnel"| EXEC
    ONPREM -->|"QUIC/Iroh tunnel"| EXEC

    EXEC -->|"stream results"| RESULT["Return result\nto caller"]
    EXEC -->|"per-second metering"| BILL["Facilitator\n(Fiat + Stables)"]

    RESULT --> SDK
    BILL --> SETTLE["On-chain settlement"]

    style COORD fill:#d97706,color:#fff
    style EXEC fill:#059669,color:#fff
    style SETTLE fill:#7c3aed,color:#fff
  • Orchestrate compute across cloud providers (AWS, GCP, etc.) and DePIN networks. Users get the best available machine without managing providers.
  • Schedules jobs across the connected network based on availability, latency, and cost.
  • Connects machines with QUIC and Iroh for low-latency peer-to-peer talk. This makes distributed jobs possible that would be impossible with plain HTTP.
  • Connects on-prem and local hardware too. Teams can plug their own machines into Fabric.
  • Agent-friendly. An AI agent can ask for compute, get a GPU, and pay with x402 — no human needed.

Because Fabric machines talk over QUIC with Iroh, the delay between nodes is low enough to split jobs in ways that weren’t practical before.

For example: instead of bundling a GPU model inside the same container as its worker, the model can live as its own service and be called remotely — with delay low enough that it feels local. This means GPU resources can be shared across jobs dynamically instead of being locked to one deployment.

Python and TypeScript SDKs let developers write apps the same way they would with Modal Labs — decorate a function, set the compute requirements, deploy.

from fabric import service, GPU
@service(gpu=GPU.A100)
def run_inference(frames):
return model.predict(frames)

The same code runs locally for development and on Fabric’s distributed network for production.

You pay for what you use — nothing more. Billed per second, no idle charges, no provisioning. Request a machine, run your workload, done.

Agents can request and pay for machines on their own the same way. The Facilitator handles escrow for long-running sessions — funds hold upfront, draw down as compute runs, remainder returns when done.

  • SimArena — sends heavy cloud sim jobs (Isaac Sim, Genesis, Newton) to Fabric when browser physics isn’t enough
  • optr — deploys graph nodes to Fabric for production distributed jobs (graph.deploy())
  • Autonomous agents — request and pay for compute on demand with x402

Modal Labs — but Fabric adds agent-native x402 payments, DePIN provider integration, and the QUIC/Iroh mesh that powers optr’s streaming runtime.

Alpha released. Public release in Q2.