Methodology · v1.0

The transparent GEO scoring rubric

Every score a Juma audit produces comes from the rubric on this page — six weighted dimensions, every sub-check documented, every weight versioned. Other GEO tools hide their methodology. We publish ours.

Last updated · Run an audit →

How the overall score is calculated

Each of the six dimensions produces a 0–100 sub-score. The overall score is a weighted average, rounded to the nearest integer.

DimensionWeight
AI Crawler Access20%
Citability25%
Schema Markup15%
Technical SEO15%
Content Authority15%
Brand Presence10%
Overall score100%

Weights reflect what most directly predicts AI citation today. Crawler Access is binary-ish — a block there zeroes out everything else — but Citability carries the highest weight because it's the most actionable lever for content teams.

Score bands

Excellent

80–100

Good

60–79

Needs Work

40–59

Critical

0–39

Dimensions, in detail

AI Crawler Access

20% weight

Whether AI search engines are actually allowed to read your site.

If GPTBot, PerplexityBot, ClaudeBot, or Google-Extended are disallowed in robots.txt, the other five dimensions don't matter — your content will never reach the index that answers user prompts.

Data source — Fetches /robots.txt directly; parses user-agent blocks and Disallow/Allow directives with wildcard fallback.

Sub-checkPointsCriteria
GPTBot (OpenAI / ChatGPT)20Allowed in robots.txt
ChatGPT-User (live browsing)20Allowed in robots.txt
PerplexityBot20Allowed in robots.txt
ClaudeBot (Anthropic)20Allowed in robots.txt
Google-Extended (Gemini / AI Overviews)20Allowed in robots.txt
Googlebot (baseline crawler)5Allowed in robots.txt
Bingbot (baseline crawler)5Allowed in robots.txt

Sub-scores are capped at 100 before the overall weight is applied.

Citability

25% weight

How easily an AI model can quote a coherent answer from your page.

LLMs preferentially cite content that is scannable, hierarchically structured, and leads with a substantive answer. Thin pages or walls of unstructured text get paraphrased away.

Data source — Cheerio-parsed HTML from Firecrawl; counts tags, inspects first paragraph, detects FAQ markers.

Sub-checkPointsCriteria
H1 present8Exactly one H1 in the document
H2 subheadings7 / 4≥2 H2 → 7pts; exactly 1 → 4pts
H3 sub-sections5At least one H3 tag
Answer-first opening25 / 15First <p> ≥50 words → 25pts; 30–50 words with question pattern → 15pts
Lists (ul/ol)12 / 6≥3 lists → 12pts; 1–2 → 6pts
Tables8At least one <table>
FAQ section15FAQPage JSON-LD, FAQ class/id, or FAQ heading detected
Content depthup to 201000+ words earns full 20; prorated below

Sub-scores are capped at 100 before the overall weight is applied.

Schema Markup

15% weight

Whether structured data is present and valid for AI parsers.

Schema.org JSON-LD lets models resolve entities, authorship, and answer structure without guessing. Incomplete schemas earn partial credit because they still help, just less reliably.

Data source — All <script type="application/ld+json"> blocks (including @graph); validates required fields per type.

Sub-checkPointsCriteria
Organization (name, url, logo)20 / 10Full → 20pts; missing any required field → 10pts
Article (headline, author, datePublished)20 / 10Full → 20pts; partial → 10pts
FAQPage (mainEntity with questions)20 / 10Full → 20pts; partial → 10pts
HowTo (name, step with items)15 / 8Full → 15pts; partial → 8pts
Product (name, offers)15 / 8Full → 15pts; partial → 8pts
BreadcrumbList (itemListElement)10 / 5Full → 10pts; partial → 5pts

Sub-scores are capped at 100 before the overall weight is applied.

Technical SEO

15% weight

Baseline hygiene signals that AI crawlers use to trust a page.

Missing titles, broken canonical tags, or 8-second response times don't stop an AI model, but they make your page less trustworthy to rank and cite.

Data source — Cheerio HTML inspection + Firecrawl response metadata (timing, content-type).

Sub-checkPointsCriteria
Meta title (30–60 chars)15Passes length check
Meta description (120–160 chars)15Passes length check
Canonical tag10<link rel="canonical"> present
Open Graph tags (title + description + image)15All three present
Viewport meta10<meta name="viewport"> present
HTTPS10URL uses https://
Response time10First-byte response < 3000ms
Content-Type header5Contains text/html
Language attribute5<html lang="…"> set
Single H15Exactly one H1 tag

Sub-scores are capped at 100 before the overall weight is applied.

Content Authority

15% weight

Signals that tell a model this page is worth trusting.

AI engines systematically over-cite pages with named authors, original data, and outbound links to recognized authorities. These are the strongest levers after citability.

Data source — HTML heuristics: author/byline selectors, table density, statistic regexes, outbound-link host matching, credential regexes.

Sub-checkPointsCriteria
Author attribution25rel=author, byline class/id, itemprop=author, or 'Written by …' pattern
Original data20Table with >3 rows OR ≥5 statistics (%, $, comma numbers) in body
Authority citationsup to 20Links to gov/edu/Nature/Reuters/etc. — 5+ earns full 20
Credential signalsup to 15Mentions Ph.D., MD, founder, certified, years of experience — 3+ unique earns full 15
Content depth20 / 15 / 102000+ words AND 4+ sections → 20; 2000+ words → 15; 1000+ → 10; prorated below

Sub-scores are capped at 100 before the overall weight is applied.

Brand Presence

10% weight

Whether your brand is being talked about where AI models forage for context.

LLMs condition on where a brand is mentioned, not just on your own site. Reddit, LinkedIn, and general web mentions disproportionately influence how models describe you.

Data source — DataForSEO SERP API — quoted-brand searches scoped to reddit.com, linkedin.com, and the open web.

Sub-checkPointsCriteria
Reddit mentionsup to 3010+ results earns full 30; prorated below
LinkedIn mentionsup to 3010+ results earns full 30; prorated below
General web mentionsup to 2010+ results earns full 20; prorated below
Platform diversity20 / 10 / 53 platforms → 20; 2 → 10; 1 → 5

Sub-scores are capped at 100 before the overall weight is applied.

Versioning

  • v1.0 · 2026-04-15Initial published methodology. Six weighted dimensions; rubric pinned to the analyzer source of record.

Rubric changes will be versioned and dated here. Older reports stay tied to the methodology version that produced them.

Audit a site against this rubric

Every score uses the same published rubric. No paywall, no login.

Run a free audit