How Does MCP Work? A Step-by-Step Walkthrough
MCP works by letting an AI tool open a standard connection to a server, discover what that server offers, and request data or actions on demand using JSON-RPC messages. When you ask a question, the AI’s client connects to a server, negotiates capabilities, and asks what is available. If the model decides it needs outside information, it sends a structured request; the server fetches the data and returns it; the model answers using that real content. Nothing is pulled until the model asks, which keeps every interaction scoped and auditable.
This walkthrough follows a single request through the Model Context Protocol — an open standard Anthropic open-sourced in November 2024 — step by step, then looks at the edges: errors, multiple servers, and what makes the flow different from pasting context.
In this guide
- How does MCP work, step by step?
- A concrete trace of one request
- What makes this different from pasting context?
- What carries the messages between tool and server?
- What happens when something goes wrong?
- How does the model decide which capability to call?
- What about multiple servers at once?
- Common misunderstandings about the flow
- Why this flow matters for a team
How does MCP work, step by step?
Here is the full path of one request, from your question to the model’s grounded answer.
Step 1 — The host connects a client to a server
The host — your chat tool or editor — starts a client for each server it wants to use. The client opens a connection over a transport (a local stdio pipe or a networked HTTP endpoint). Each client talks to exactly one server.
Step 2 — The two sides negotiate capabilities
Before any work, the client and server exchange an initialization handshake. Each declares what it supports — the server might offer tools and readable resources; the client might support certain notifications. This negotiation sets the rules for the session.
Step 3 — The client discovers what the server offers
The client asks the server to list its capabilities — which resources it can read, which tools it can run, which prompts it provides. Now the model knows the menu of options available for this task.
Step 4 — You ask a question and the model decides
You send a prompt. The model reasons about it and decides whether it needs outside information. If your question can be answered from its own knowledge, it just answers. If not, it picks the right capability from the discovered list.
Step 5 — The client sends a structured request
The client sends a JSON-RPC request to the server — for example, “search the documents for X” or “read this resource.” The request names the capability and includes typed parameters, so the server knows exactly what to do.
Step 6 — The server fetches and returns the result
The server does the work — searching, reading, or running an action — against the source, which stays where it lives. It returns the result through the protocol in a shape the model can use. The data is not copied wholesale into the AI; only the requested slice comes back.
Step 7 — The model answers using the real content
The model folds the returned content into its reasoning and produces an answer grounded in your actual data, not a generic guess. If it needs more, it can send another request and repeat the loop.
A concrete trace of one request
Abstract steps are easier to trust when you can see them play out. Suppose you ask a coding assistant, “Does our codebase already have a function for formatting currency, and where?”
- Connect & negotiate (steps 1–2). When the assistant launched, it started a client and connected to a local code-search server over stdio, and the two agreed on a capability set.
- Discover (step 3). The client already learned that the server offers a
search_codetool and aread_filetool. - Decide (step 4). The model can’t answer from memory — it has never seen your repo — so it chooses
search_code. - Request (step 5). The client sends a
tools/callrequest namingsearch_codewith the argumentformatCurrency. - Fetch & return (step 6). The server runs the search across your files and returns three matches with file paths and line numbers — not the whole codebase, just the hits.
- Answer (step 7). The model replies, “Yes — there’s a
formatCurrencyhelper insrc/utils/money.ts,” then optionally callsread_fileon that path to confirm the signature before summarizing it.
That second read_file call is the loop in action: observe the search results, decide more detail is needed, request again. The model drives; the server serves.
What makes this different from pasting context?
The defining trait is on-demand pull. With copy-paste, you front-load everything into the prompt and hope it is relevant. With MCP, the model requests only what it needs, when it needs it.
That difference matters for quality and cost. Stuffing a prompt with everything invites noise, higher cost and worse answers; pulling a scoped slice keeps the model focused. For why that balance is so important, see how much context an AI agent needs.
It also keeps interactions auditable: because each fetch is an explicit request, you can see exactly what the model asked for and when.
There’s a freshness benefit too. Pasted context is a snapshot frozen at the moment you copied it; an on-demand fetch reflects the source as it is right now. Ask the same question tomorrow and the server returns tomorrow’s data, with no stale paste to remember to refresh.
What carries the messages between tool and server?
Every message in the flow is JSON-RPC 2.0 — a simple, language-agnostic format. Requests carry a method and parameters; responses carry a result or error; notifications are one-way signals (for example, “the tool list changed”).
Those messages travel over a transport: stdio for a local server running as a child process, or Streamable HTTP for a networked server. The flow above is identical regardless of transport — only the channel changes. For the full breakdown of roles, message format and transports, see MCP server architecture.
What happens when something goes wrong?
The clean walkthrough assumes everything succeeds. Real flows have failure points, and the protocol handles them in defined ways.
- The server can’t do what was asked. JSON-RPC has a structured error response — a code and a message — that the server returns instead of a result. The model sees the error and can adjust: retry with different parameters, try another capability, or tell you it couldn’t complete the request.
- A required parameter is missing. A well-built server validates inputs and returns a clear error rather than guessing. Some servers use the elicitation capability to ask the user for the missing piece mid-task instead of failing outright.
- The source is unavailable. If the underlying system the server wraps is down or a credential has expired, the server surfaces that as an error rather than fabricating data. The honest failure is the point — the model isn’t left to hallucinate a result.
- The capability isn’t offered. Because the client discovered the menu in step 3, it shouldn’t call something that doesn’t exist. If offerings change mid-session, a
list_changednotification prompts the client to re-discover.
The throughline is that failures are explicit and structured. The model gets a real error it can reason about, not silence — which is exactly what you want when the alternative is a confident-sounding guess.
How does the model decide which capability to call?
Step 4 hides a lot of nuance worth unpacking. The model isn’t matching keywords; it’s reasoning over the descriptions the server provided during discovery.
Every tool a server exposes comes with a name, a human-readable description, and a typed input schema. The model reads those descriptions the way a person reads a menu, then maps your request to the best-fitting capability. This is why good MCP servers invest in clear, specific tool descriptions: a vague description like “do stuff with data” gives the model little to go on, while “search internal documents and return matching passages with their source paths” tells it exactly when this tool is the right choice.
The typed schema matters just as much. It constrains what the model can pass, so the request is well-formed before it ever reaches the server. The model fills the schema’s fields from your intent, and the structure keeps the call valid.
What about multiple servers at once?
A single host commonly connects several servers — one client per server, each over its own connection. When you ask a question, the model sees the combined menu of every connected server’s capabilities and picks across all of them.
So a request might fan across servers in sequence: search a documents server, then call a calendar server to schedule a follow-up, all within one conversational turn. The agent loop in the host orchestrates the order; each server only ever sees the requests addressed to it. The isolation keeps things tidy — a slow or failing server affects only its own calls, not the others — while the model gets a unified set of options to reason over.
Common misunderstandings about the flow
“The AI scans everything before answering.” It doesn’t. It fetches only what it decides it needs, when it needs it. Most of your data is never touched for any given question.
“MCP uploads my files to the model provider.” No. The source stays where it lives; the server returns only the requested slice, and that slice enters the conversation the same way any context does. There’s no bulk transfer of your corpus.
“Discovery happens on every question.” Discovery (step 3) happens once per session, not per question. The model reuses the known menu and only re-discovers if a list_changed notification says the offerings changed.
“The server decides what to fetch.” The model decides; the server executes. The server has no agenda of its own — it answers the specific request it receives.
Why this flow matters for a team
For one person, on-demand fetching is just convenient. For a team, it is the whole foundation of shared, grounded AI.
Company knowledge is scattered across documents, wikis, tickets and files. Run that same connect → negotiate → discover → request → fetch → answer loop against a server that fronts all of it, and every AI surface gets the same grounded answer to the same question — because they are all walking the same path to the same source. The hard part stops being plumbing and becomes scoping: serving the right slice, with permissions intact, rather than dumping everything. The natural next read is what an MCP server is, the layer that sits at the far end of step six.
FAQ
How does MCP work in simple terms? An AI tool opens a standard connection to a server, learns what the server offers, and requests data or actions on demand. The server fetches the result and returns it, and the model answers using that real content rather than a guess.
Does the AI pull all the data at once? No. MCP works on demand. The model requests only the specific resource or action it needs for the current question, so context stays scoped. This keeps answers focused and the interaction auditable.
What happens during the MCP handshake? At the start of a session, the client and server negotiate capabilities — each declares what it supports. This determines which features are available for the rest of the session and prevents either side from assuming a capability the other lacks.
What format do MCP messages use? JSON-RPC 2.0. Every request, response and notification is a JSON-RPC message. The transport — stdio for local servers or Streamable HTTP for networked ones — carries those messages between the tool and the server.
Does MCP move my data into the AI? No. The source stays where it lives. The server exposes a scoped interface and returns only the slice the model requests, rather than copying everything into the AI tool. That keeps access controlled and reviewable.
What happens if an MCP request fails? The server returns a structured JSON-RPC error — a code and a message — instead of a result. The model can then retry with different parameters, choose another capability, or tell you it couldn’t complete the request. Failures are explicit, not silent.
How does the model know which tool to use? During discovery, each tool arrives with a name, a description, and a typed input schema. The model reads those descriptions and maps your request to the best-fitting capability, then fills the schema’s fields. Clear tool descriptions make this selection more reliable.