Quick Answer:Schema markup for AI search does a different job than rich-snippet markup. It does not push a page up a ranked list; it removes ambiguity for the model. Effective schema in 2026 does three things: it disambiguates your entity so an AI knows who and what the content is about, it structures claims so they can be extracted as standalone facts, and it reinforces author and publisher trust signals. Identity, extractability, and trust — not rich-result eligibility — are what now justify the markup.
When a developer asks whether schema markup "still matters," the honest answer is that the question is dated. Schema mattered for one decade-long reason: it earned eligibility for enhanced search results. That payoff has shrunk. Google has quietly retired FAQ and HowTo rich results for most sites, and the visible reward for marking up a page is smaller every year.
Meanwhile a second payoff appeared, and almost no one re-architected their structured data to claim it. AI answer engines — ChatGPT, Perplexity, Google's AI Overviews, Claude — do not browse a ranked list. They identify entities, extract claims, and synthesize answers. Schema is one of the cleanest inputs to all three steps. The teams winning AI citations are not the ones with the most markup. They are the ones whose markup answers three machine questions correctly: who is this, what does it claim, and can I trust it?
Rich Snippets Were a Google Game. AI Citation Is a Different One.
For ten years, structured data had a single, measurable purpose: enhanced appearance in Google's ranked results. You marked up a recipe to get the cook-time card. You marked up a product to get the price and star rating. You marked up an FAQ to get the expandable accordion under your listing. The markup was a means to a visual end, and the end lived inside one search engine's results page.
That contract has weakened from both sides. Google has reduced rich-result eligibility — FAQ rich results are now limited to a narrow set of authoritative health and government sites, and HowTo results were deprecated outright. At the same time, an increasing share of branded and informational queries never reach a results page at all. They are answered inside an AI interface that quotes a handful of sources and moves on.
This is the reframe that matters. Schema is no longer primarily a formatting request to a search engine. It is a legibility layer for machines that read your page to construct an answer. The same JSON-LD block now serves a different reader with different priorities. Google asked: are you eligible for this visual treatment? An answer engine asks: is this content unambiguous enough to quote without getting it wrong?
That shift changes which markup earns its place. Some schema that never drove a rich result is now high-value because it grounds your entity. Some schema that once won a snippet is now noise. Sorting one from the other requires understanding what these systems actually read.
What LLMs Actually Read From Your Markup
Large language models do not execute your structured data as instructions. They do not see "aggregateRating": 4.8 and obediently believe your product is excellent. Treating schema as a command channel is the single most common misconception, and it leads teams to over-mark and under-think.
What schema does is reduce ambiguity. Natural-language pages are full of it: a name that could refer to three different entities, a claim buried in a paragraph that a model might attribute to the wrong subject, a publication whose authorship is implied but never stated. Structured data resolves these ambiguities into explicit, machine-readable assertions. That resolution is the value.
Three mechanisms do the work. The first is entity grounding. An @id graph plus sameAs links connect the subject of your page to a canonical identity the model already recognizes — a Wikidata record, a LinkedIn profile, an official domain. Grounding is what lets a system confidently say "this page is about the Fernando Angulo who works in search, not a different person with the same name."
The second is claim extraction. Schema types like FAQPage, HowTo, Article, and DefinedTerm present information as discrete, self-contained units — exactly the shape an answer engine wants to lift and quote. A claim wrapped in a clean question-and-answer pair is dramatically easier to extract correctly than the same claim embedded mid-paragraph in narrative prose.
The third is trust attribution. author, publisher, dateModified, and credential properties tell a retrieval system who stands behind a claim and how current it is. For systems that weight source authority — and the major ones do — these properties feed the same trust calculus that decides whether your source makes the citation shortlist. This is why entity and author schema, covered in my piece on relevance engineering, has quietly become more important than any rich-result markup.
The core principle: Schema does not make your content more authoritative. It makes your content less ambiguous — which raises the probability an AI system identifies it correctly, extracts it cleanly, and attributes it to the right trusted source. Disambiguation, not persuasion, is the entire job.
The 4 Schema Tiers for AI Citation
Not all structured data carries equal weight in an AI-search context. The model below sorts schema into four tiers by what it actually contributes to citation. Work top-down: a brand with Tier 1 and Tier 2 solid will outperform a brand with a sprawling, untiered graph of everything. The fourth tier is the one to stop shipping.
Tier 1 — Identity (the foundation)
Identity schema answers who and what is this? It is the highest-leverage tier because every downstream judgment depends on the model correctly resolving your entity.
What to ship: Organization and Person types, each with a stable @id reused identically across every page. Add sameAs links to your authoritative external records — Wikidata first, then LinkedIn, Crunchbase, official social profiles. Wire author and brand together so the model sees a connected identity graph, not isolated fragments. If your organization or key people have Wikidata items, referencing those QIDs through sameAs is among the strongest grounding signals available.
Tier 2 — Answer (the extraction layer)
Answer schema answers what does this page claim, in units I can quote? This is where you make your knowledge liftable.
What to ship: Article with a precise headline, description, and about; FAQPage where the questions are real and the answers stand alone; DefinedTerm for the concepts you want to own; and a speakable specification pointing at your sharpest summary block. The goal is that any single claim can be extracted without the surrounding paragraph for context. If an answer only makes sense in its original position on the page, it is not yet extractable.
Tier 3 — Authority (the trust layer)
Authority schema answers why should I trust and prefer this source? It rarely produces a visible result, which is exactly why most teams under-invest in it.
What to ship: a fully specified author with credentials and a linked Person @id; a clear publisher; honest datePublished and dateModified values maintained as content changes; and citation or isBasedOn properties when your claims rest on documented research. These signals map directly onto the source-quality judgments retrieval systems make. Stale or missing dates and anonymous authorship are quiet disqualifiers.
Tier 4 — Noise (stop shipping this)
Noise is markup that consumes maintenance budget, adds risk, and returns nothing in an AI-search context. What to cut or never add: self-serving aggregateRating and Review markup that a model will not treat as credible; BreadcrumbList spam on pages with no real hierarchy; irrelevant Product or Offer types bolted onto editorial content to chase a snippet; and keyword-stuffed FAQ entries written for a crawler rather than a reader. Every one of these either fails to help or trips a quality filter. More schema types is not a strategy. It is a liability disguised as thoroughness.
| Tier | Question it answers | What to ship (or cut) | Why it matters for AI |
|---|---|---|---|
| Tier 1 — Identity | Who and what is this? | Organization and Person with a stable @id and sameAs links to Wikidata, LinkedIn, Crunchbase |
Grounds your entity so the model resolves it correctly; every downstream judgment depends on it |
| Tier 2 — Answer | What does this page claim, in units I can quote? | Article, FAQPage, DefinedTerm, and a speakable spec on your sharpest summary |
Makes individual claims extractable as standalone facts an answer engine can lift |
| Tier 3 — Authority | Why should I trust and prefer this source? | Fully specified author with credentials, clear publisher, honest datePublished/dateModified, citation |
Feeds the source-quality calculus retrieval systems use to build the citation shortlist |
| Tier 4 — Noise | (Nothing — stop shipping it) | Cut self-serving aggregateRating/Review, BreadcrumbList spam, irrelevant Product/Offer, keyword-stuffed FAQ entries |
Returns nothing for citation and can trip quality filters; a liability disguised as thoroughness |
The Schema Most Teams Ship That AI Ignores
Here is the uncomfortable part, and it is worth stating plainly because it contradicts a decade of received practice: a large share of the structured data on most enterprise sites does nothing for AI citation, and a portion of it is mild self-harm.
The clearest example is review and rating markup applied to a brand's own pages. It was always a rich-result play, and answer engines have no reason to ingest a company's self-reported five-star average as a trustworthy fact. Marking it up does not help; if it overstates reality relative to independent sources, it adds a small trust mismatch the system can detect.
The second example is breadcrumb and navigational markup on flat or shallow sites. It made sense when it produced a tidy breadcrumb in a Google listing. On a page with no genuine hierarchy, it is structure for structure's sake — parsing cost with no extraction value.
The honest counterpoint: none of this means you should rip schema out tomorrow. Some of it still serves traditional search, internal site logic, or accessibility tooling, and those are legitimate reasons to keep it. The point is narrower. When you are deciding where to spend engineering time for AI visibility, Tier 4 markup is not where the return lives. The return lives in getting Tier 1 identity grounding genuinely right — which most sites have not done.
What This Means for Your Dev Backlog
For a developer or technical SEO lead, the practical translation is a re-prioritization, not a rewrite. The instinct to "add more schema" is the wrong instinct. The right move is to audit what you have against the four tiers and reallocate.
Start with one question your current markup probably cannot answer: if an AI system landed on your most important page, could it unambiguously identify the entity behind it and connect that entity to an authoritative external record? For most sites the answer is no — there is an Organization block with a name and logo and nothing that grounds it. Fixing that is a half-day of work with a disproportionate payoff.
From there, the sequence is straightforward. Make every important claim extractable through Tier 2 answer schema and the FAQ structure that AI engines quote most readily. Specify author and publisher trust signals fully, and keep dateModified honest. Then audit Tier 4 and stop spending maintenance cycles on markup that returns nothing. The structural choices that govern AI visibility extend well beyond schema into how you write the content itself — but schema is the layer where a small, precise change moves the needle fastest.
The Open Question
Structured data is one of the few AI-visibility levers a technical team can pull unilaterally, without a content overhaul or a budget fight. That makes it tempting to treat as a checkbox — validate the JSON-LD, pass the test, move on. The teams that will benefit most are the ones that treat it instead as an identity decision: a deliberate statement of who the brand is, what it claims, and which authoritative records vouch for it.
The validators will tell you your schema is correct. They will not tell you whether it is legible — whether an AI system reading it comes away knowing exactly who you are and what you reliably claim. That is the standard worth holding your markup to.
So the question for your next sprint is narrow and answerable: if an answer engine read only your structured data, would it know which entity you are — and would it have a reason to trust you over the competitor one paragraph away in the same generated answer?
Frequently Asked Questions
Not directly, and not in the way meta keywords once promised. Large language models do not read your JSON-LD as a ranking instruction. Schema helps indirectly: it disambiguates which entity your content is about, makes individual claims cleanly extractable, and reinforces author and publisher trust signals that retrieval systems and answer engines weight. Think of schema as removing ambiguity, not injecting authority. It raises the probability your content is understood correctly — it does not force a citation.
Use both, linked by a shared @id graph. Organization schema establishes the brand as an entity; Person schema establishes the author or expert behind the content. The critical move is consistency: the same @id for the same entity across every page, with sameAs links to authoritative external profiles like Wikidata, LinkedIn, and Crunchbase. A single canonical, consistently referenced entity is far more legible to an AI system than a brand whose identity is implied differently on every page.
Yes, but for a different reason than in 2020. Google reduced FAQ rich results in standard search, which led some teams to abandon the markup. For AI answer engines, FAQPage schema remains valuable because it presents content in the exact shape these systems prefer to extract: a discrete question paired with a self-contained answer. The value now is extraction legibility, not a rich snippet. Keep it where the questions are real and the answers stand alone — a point I cover in depth in the FAQ-as-GEO-asset piece.
Use JSON-LD. It is the format Google recommends, it is the cleanest for machines to parse, and it keeps structured data separate from your visible markup, which reduces errors. Microdata and RDFa still validate, but they entangle structure with presentation and are more fragile to maintain. For new implementations there is no practical reason to choose anything other than JSON-LD.
The sameAs property connects your entity to its canonical records elsewhere — Wikidata, Wikipedia, LinkedIn, official social profiles. This is entity grounding: it tells an AI system that the entity on this page is the same documented entity it already knows from authoritative knowledge bases. Without grounding, a model may confuse your entity with a similarly named one or fail to connect your content to an established identity. A Wikidata QID referenced through sameAs is one of the strongest grounding signals available.
It can. Schema that misrepresents page content, marks up invisible or fabricated information, or stuffs irrelevant types is a trust liability. Search engines already penalize spammy structured data, and the same low-quality signals that erode search trust erode the source authority AI systems depend on. More markup is not better. Accurate, content-matching, minimal schema beats a sprawling graph of types that do not reflect what is actually on the page.
Is your strategy AI-ready?
I help global enterprises navigate the transition from traditional search to the generative era.
FA