MCP Server Architecture: JSON-RPC, Transports, Primitives

MCP server architecture has three layers: a host-client-server role model, JSON-RPC 2.0 as the message format, and a transport (stdio or Streamable HTTP) that carries those messages. The host application runs clients; each client holds a one-to-one connection to a server; the server exposes primitives — resources, tools and prompts — that the AI can use. Every request and response is a JSON-RPC 2.0 message, sent over either a local stdio pipe or an HTTP endpoint. Understand those three layers and the whole protocol clicks into place.

This article breaks down each layer of MCP architecture — the roles, the message format, the transports, the primitives, and the session lifecycle — in plain terms.

In this guide

What are the roles in MCP architecture?
What message format does MCP use?
What transports does MCP support?
Why stdio and HTTP, and how to choose
What are the core primitives a server exposes?
Client-side primitives: sampling, roots, elicitation
How does an MCP session start?
The lifecycle of a connection
Common architecture mistakes
Why this architecture scales to a whole company

What are the roles in MCP architecture?

MCP defines a host-client-server structure. Each role has one job, which keeps the system modular.

The host is the application the user sees — the chat interface, the coding editor, the assistant. It manages execution and context, and it creates one or more clients. The client handles protocol messaging; importantly, each client maintains a one-to-one relationship with a single server. The server exposes capabilities — data and actions — through the protocol.

So a host with three connected servers runs three clients, one per server. This 1:1 client-to-server binding keeps connections isolated and easy to reason about.

Why insist on one client per server rather than a single client multiplexing everything? Isolation. If one server misbehaves, hangs, or floods notifications, it affects only its own client connection — the others keep working. It also keeps capability negotiation clean: each pairing negotiates its own feature set independently, with no shared state to get tangled. The architecture trades a little redundancy for a lot of robustness.

What message format does MCP use?

MCP uses JSON-RPC 2.0 as its wire format. Every interaction is one of three JSON-RPC message types.

A request carries a method name and parameters and expects a reply.
A response returns a result or an error for a given request.
A notification is a one-way message with no reply — used for lifecycle signals and dynamic updates.

For example, the server may send a notifications/tools/list_changed message when its available tools change. JSON-RPC was chosen because it is simple, language-agnostic and well understood — the transport layer just converts MCP messages to and from this format.

A concrete shape helps. A request to invoke a tool looks roughly like this:

{
  "jsonrpc": "2.0",
  "id": 42,
  "method": "tools/call",
  "params": {
    "name": "search_documents",
    "arguments": { "query": "remote work policy" }
  }
}

The id ties the eventual response back to this request, method names the operation, and params carries typed arguments. A notification looks the same but omits the id — because nobody is waiting for a reply. This uniformity is the point: every interaction, whether listing tools or reading a resource, follows the same envelope, so a client only has to understand one message grammar regardless of which server it’s talking to.

What transports does MCP support?

A transport is the channel that carries the JSON-RPC messages. MCP defines two standard transports.

stdio

With stdio, the host spawns the server as a child process and communicates over standard input and output. The client writes JSON-RPC messages to the server’s stdin; the server writes responses to stdout. It is the simplest transport and well suited to local servers running on the same machine.

Streamable HTTP

Streamable HTTP, introduced in March 2025, is the recommended transport for servers that run over a network. The server exposes a single HTTP endpoint that accepts POST and GET. Clients send messages via POST; the server can reply with a normal HTTP response or upgrade to a stream for incremental output.

The rule of thumb: stdio for local processes, Streamable HTTP for remote or shared servers.

Why stdio and HTTP, and how to choose

The two transports exist because MCP servers live in two very different places, and forcing one channel on both would compromise each.

stdio is ideal when the server and host are on the same machine — a coding assistant launching a local file-search server, for instance. There’s no network, no ports to manage, no separate authentication: the host owns the child process and talks to it through pipes. It’s about as simple and low-latency as inter-process communication gets, which is why it dominates desktop and developer-tool scenarios.

Streamable HTTP is for everything that crosses a network boundary — a shared server many people connect to, a hosted service, anything you don’t run as a local child process. It’s worth knowing the history here: Streamable HTTP replaced the earlier HTTP+SSE (Server-Sent Events) transport in the March 2025 spec revision. The change came partly from reliability limitations of an always-open SSE connection and partly from security: a persistent SSE stream was checked once at connection time and then stayed open, creating a blind spot, whereas the request-oriented Streamable HTTP model lets each message be inspected. New networked servers should use Streamable HTTP; the older SSE transport is deprecated, though some servers keep an SSE endpoint for backward compatibility during the transition.

To choose: same machine, single user, simplest setup → stdio. Remote, shared, or hosted → Streamable HTTP. The protocol behavior above it is identical either way; only the channel changes.

What are the core primitives a server exposes?

Primitives are the categories of things a server makes available. On the server side there are three.

Resources

Resources are read-only context the model can load — documents, records, files. They provide information without side effects, so the AI can ground its answers in real data.

Tools

Tools are actions the model can invoke — run a search, create a record, call an operation. Each has a name, a description and a typed input schema, so the model knows when and how to call it.

Prompts

Prompts are reusable templates the server offers — predefined message structures a user or model can select to standardize a task.

A useful way to remember the split is by who’s in control. Resources are typically application-controlled — the host decides what context to load. Tools are model-controlled — the model decides when to invoke them based on the task. Prompts are user-controlled — a person usually picks them, like choosing a slash command. That control distinction is why the three are separate primitives rather than one catch-all: they’re meant to be governed differently.

It also explains a safety property worth noting. Because resources are read-only and tools are the only primitive with side effects, a host can apply very different policies to each — freely loading resources while requiring confirmation before a tool runs. The architecture makes “reading is safe, acting needs a gate” easy to enforce.

Client-side primitives: sampling, roots, elicitation

The server isn’t the only side that offers capabilities. Clients expose their own primitives, which let a server reach back toward the host in controlled ways.

Sampling lets a server ask the client’s model to generate text. This is powerful: a server can request a completion without bundling its own model or API key, and the host stays in control of which model runs and whether to allow the request.
Roots let a client tell a server which parts of a filesystem (or URI space) it’s allowed to operate within — a scoping boundary the server is expected to respect.
Elicitation lets a server ask the user for additional input mid-task through the client, rather than failing when it’s missing a parameter.

The pattern across all three is the same: capabilities flow in both directions, but the host always mediates. A server can request things of the model or the user, but it does so through the client, which keeps the human-facing application as the control point. That bidirectional-but-mediated design is a defining trait of MCP’s architecture, and it’s why the three server primitives above are what most servers are built around while the client primitives quietly enable the richer flows.

How does an MCP session start?

Before any work happens, the client and server perform capability negotiation. During initialization, each side declares what it supports.

A server might declare that it offers tools, resource subscriptions or prompt templates. A client might declare it supports sampling and certain notifications. Capabilities determine which protocol features are available for the rest of the session, so neither side assumes a feature the other lacks.

This handshake makes MCP stateful and forward-compatible: new capabilities can be added without breaking older clients or servers. For the request flow once a session is live, see how MCP works step by step.

The lifecycle of a connection

It helps to see the whole arc of a connection, not just the handshake, because each phase has a distinct job.

Initialization. The client sends an initialize request stating its protocol version and capabilities; the server replies with its own. They agree on a common feature set. Until this completes, no real work happens.
Operation. The session is live. The client discovers what’s on offer (listing tools, resources, prompts), then issues requests as the task demands. The server may send notifications when its offerings change — say, notifications/tools/list_changed — and a capable client re-fetches the list.
Shutdown. When the host no longer needs the server, the connection is closed cleanly. For stdio that means terminating the child process; for Streamable HTTP it means ending the session.

The version exchange in step one is what makes the protocol durable over time. Because both sides announce a protocol version and a capability set, a newer client and an older server can find a common subset and still work together. New features get added as new capabilities rather than as breaking changes — the architecture is built to evolve without orphaning existing implementations.

Common architecture mistakes

Picking the wrong transport. Reaching for Streamable HTTP for a purely local tool adds networking and auth complexity you didn’t need; trying to run a shared remote server over stdio simply doesn’t fit. Match the transport to where the server lives.

Building a sixth “primitive” by overloading tools. When everything becomes a tool — including things that are really read-only context — you lose the safety and control distinctions that resources and prompts provide. Use the right primitive for the job.

Assuming one client can serve many servers. The 1:1 client-to-server binding is deliberate. Trying to multiplex breaks the isolation and per-pairing negotiation the architecture depends on.

Skipping capability negotiation assumptions. Code that assumes a feature exists without checking the negotiated capabilities will break against servers or clients that don’t offer it. Honor what was declared at initialization.

Why this architecture scales to a whole company

The three-layer design is what makes one server reusable across an organization. Because every server speaks the same JSON-RPC dialect over a standard transport, any compatible AI tool can connect with no bespoke integration — so a single server can ground every AI surface in the same knowledge, instead of M tools each wiring N sources by hand.

Once the plumbing is standardized, the real engineering problem moves up a layer: it stops being how do we connect and becomes what slice of context do we return per request. That scoping problem is where a good context layer earns its keep — exposing your knowledge as resources behind the same host-client-server model, returning the relevant slice rather than the whole base. For the bigger picture of that layer see what an MCP server is; for the trade-off of how much to return, see how much context an AI agent needs.

FAQ

What protocol does MCP use under the hood? MCP uses JSON-RPC 2.0 for all messages — requests, responses and notifications. The transport layer converts MCP messages to and from JSON-RPC, which is simple, language-agnostic and widely supported across programming environments.

What transports does an MCP server support? Two standard transports: stdio, where the host spawns the server as a local child process and talks over standard input/output, and Streamable HTTP, a single networked endpoint that accepts POST and GET, recommended for remote or shared servers.

What are MCP primitives? Primitives are the categories a server exposes: resources (read-only context), tools (actions the model can call) and prompts (reusable templates). Clients also expose primitives like sampling, roots and elicitation. They define what the AI can read and do.

What is the host-client-server model in MCP? The host is the user-facing app; it runs clients. Each client holds a one-to-one connection to a server, which exposes capabilities. The roles separate orchestration, messaging and capability, keeping the architecture modular.

What is capability negotiation in MCP? At the start of a session, the client and server each declare which features they support. These declarations determine what is available for the rest of the session, making MCP stateful and forward-compatible without breaking older implementations.

Why was SSE replaced by Streamable HTTP? The earlier HTTP+SSE transport kept a persistent connection that was checked once and stayed open, which hurt reliability and created a security blind spot. The March 2025 spec introduced Streamable HTTP, a request-oriented model that’s easier to inspect and scale; SSE is now deprecated.

Can a single client connect to multiple servers? No — each client binds one-to-one to a single server. A host that uses several servers simply runs several clients, one per server. This isolation keeps a misbehaving server from affecting the others and lets each pairing negotiate capabilities independently.