Field Guide

How AI Reads Your Page: The Raw HTML, No JavaScript

By Jerome Bilaos·Technical Web Architect·Updated June 2026

Your page looks finished in your browser. There is a quieter version of it — the one most AI crawlers actually read — and the two are often not the same page at all. This tool shows you that quieter version, and it does it the same way a typical large-language-model crawler does: it fetches your raw HTML and never runs your JavaScript.

That last part is the whole point.

What it actually does

When you paste a URL, the tool requests the page and gets back the raw HTML — the bytes your server sends before any framework hydrates anything. Then it processes that HTML the way a text-extracting crawler would:

Strips <script>, <style>, <noscript>, and HTML comments out entirely
Treats block tags (<p>, <div>, <li>, <h1>–<h6>, <br>, <tr>, <section> and the rest) as word separators, so text doesn't fuse together
Removes the remaining tags, decodes HTML entities, and collapses whitespace

What's left is the plain text an AI sees. The tool reports the page title it extracted, the word count of that readable text, and a preview of the text itself — up to the first 1,800 characters. No JavaScript is executed at any point. If your content only appears after React, Vue, or Next hydrates the page, it simply will not be in what you see here.

Why this is different from "view source" or a Lighthouse run

You could open "view source" yourself, but you'd be reading raw markup — nav, inline scripts, class soup — not the extracted text. Lighthouse and PageSpeed will happily give a content-rich page a green score while saying nothing about whether a no-JavaScript crawler can read a word of it. Those tools measure speed and rendered output. They render the page in a real browser engine. AI crawlers, by and large, do not.

That gap is the thing nobody shows you. Google can run JavaScript on a second, slower pass — not guaranteed for every page, and often delayed. Most AI assistants skip that step entirely. So a single-page-app that looks perfect to you, and indexes eventually in Google, can read as a near-blank shell to ChatGPT, Claude, or Perplexity right now. This tool surfaces exactly that mismatch in one fetch.

The flags, and the honesty behind them

Beyond the raw text, the tool raises specific, evidence-based flags:

js-rendered — but only when it's genuinely warranted. It does not shout "JavaScript problem" just because it sees a framework. The call fires only when the readable text is thin (under ~200 words) and SPA markers are present — an empty <div id="root"></div>, a __NEXT_DATA__ blob, data-server-rendered, ng-version, and similar. A heavy React site that server-renders its content reads fine and won't be flagged. That combined test is deliberate: I didn't want false alarms on sites that use JavaScript correctly.
thin-content — fewer than ~200 words in the raw HTML, with the exact count.
noscript — counts <noscript> blocks, a sign the page leans on JavaScript. The warning: make sure your real content lives in the HTML, not only inside a <noscript> fallback.
no-h1 and no-title — missing primary heading or title tag, the labels AI and search use to understand what a page is about.

If none of these fire, the tool says so: no major extractability problems detected.

What it does not do

It reads one page, the URL you paste — not your whole site.
It does not run JavaScript, so it cannot tell you what your page looks like after hydration. That's by design — it's mirroring the crawler, not your browser.
It does not score quotability, schema, or authority. Whether your readable content is good enough to be cited is a separate question (the Answer-Readiness Checker covers that side).
A clean result here means the content is present for AI to read. It doesn't promise the content is persuasive.

What you'll see when you run it

The host and the word count AI can read, sitting at the top — with a "(JS-rendered)" tag appended if the combined thin-plus-SPA test tripped. The title the crawler extracted. A stack of flags, if any. Then the readable-text box: the actual sentences a crawler pulls from your page.

A healthy result looks busy. The word count roughly matches the page you wrote, your real headline is the title, and the text box contains your offer, your answers, your proof. A worrying result is thin or empty — a tiny word count, a missing title, and a text box showing only your menu and a logo. That's the signature of content injected by JavaScript after load.

Who should run this

Run it on any important page right after a redesign, a platform migration, or a move to a new front-end framework — those are the exact moments content silently vanishes from the HTML. If you run an SPA, run it on your money pages specifically; that's where the JS-rendering trap bites hardest.

Open How AI Reads Your Page and check your highest-value URL first. To scan every key page for AI readability at once, instead of one at a time, that's the full AUDXY audit.