Conductor
Try for free

AI Brand Recommendation Study: Why Intent Type Predicts AI Output Consistency

Last updated:

In early 2026, a research study concluded that AI brand recommendations are essentially random, and that industry size is the best predictor of consistency. Narrow categories are predictable. Broad ones are chaotic.

Our data tells a different story.

We ran 14,000 API calls across 10 industries, seven intent types, four large language models (LLMs), and five simulated user personas. The finding: AI brand recommendations aren't random. They're predictably inconsistent, and the predictor isn't industry size. It's query intent, the type of prompt your customers are asking.

That finding held across every engine and every industry we tested.

If you're not tracking AI visibility by intent type, you can't see this. You need to know which intent types your customers are actually using, and how consistently your brand shows up in each one.

The fix isn't more content. It's tracking the right intent types, on the engines your customers actually use.

When someone asks an AI engine to recommend a product, compare two brands, or find a solution to a problem, the engine doesn't return a search resultSearch Result
Search results refer to the list created by search engines in response to a query.
Learn More
. It generates a response, and it decides which brands belong in it. That's an AI brand recommendation: your brand either shows up in that response or it doesn't, depending on how the engine weighs your content, authority, and relevance to the prompt at the moment it's asked.

For enterprise brands, that moment is increasingly where purchase decisions start.

In January 2026, a research study concluded that AI brand recommendations are essentially random, and that industry size is the best predictor of consistency. Our data tells a different story.

We ran 14,000 API calls across 10 industries, seven intent types, five simulated user personas, and four large language models (LLMs): ChatGPT, Perplexity, Claude, and Gemini. AI brand recommendations aren't random. They're predictably inconsistent, and the predictor isn't industry size. It's the type of prompt your customers are asking.

Why AI recommendation consistency matters

In traditional search, rankingsRankings
Rankings in SEO refers to a website’s position in the search engine results page.
Learn More
are stable enough to build a strategy around. Your brand holds the top spot for a high-value keywordKeyword
A keyword is what users write into a search engine when they want to find something specific.
Learn More
on Tuesday, it's there on Thursday. You can forecast, allocate budget, and defend a position.

AI search doesn't work that way. LLMs generate responses dynamically. The same prompt can return your brand nine times out of 10 or four times out of 10, and a single manual check would never surface the difference.

That gap is the whole problem. A brand appearing in 90% of AI recommendation runs for a given prompt has a defensible position. One appearing 40% of the time for the exact same prompt is essentially invisible half the time, even if it showed up the one time someone looked.

You can't scale a content strategy, allocate budget, or forecast pipeline based on visibility that shifts every time a customer asks the same question. And you can't fix what you're not measuring consistently.

That's what this study set out to quantify.

How we measured AI recommendation consistency

We sent the exact same prompt to each engine 50 times and recorded which brands it returned each time. Then we measured the overlap between any two runs.

We call this brand overlap: the percentage of brands shared between any two runs of the same prompt. A brand overlap of 63% means that out of every 10 unique brands that appeared across any two runs combined, six appeared in both. If every run returns a completely different set of brands, the overlap is zero.

We also tracked lead brand stability: how often the same brand appears first across runs.

Brand overlap tells you whether the same brands show up at all. Lead brand stability tells you whether the same brand consistently leads. A prompt can have moderate brand overlap but very low lead brand stability. AI agrees on the consideration set but has no consistent view on who should lead it.

Both metrics are reflected as percentages throughout this analysis.

Top takeaways: What the data shows

Intent type, meaning the kind of prompt your customers are asking, predicts AI brand recommendation consistency more than the industry being asked about. The predictor isn't what they're asking about. It's how they're asking it.

  • Seven intent types rank consistently from least to most consistent: Purchase, Support, Navigation, Recommendations, Pricing, Education, and Comparison. The ranking holds across all four LLMs and all 10 industries.
  • Purchase intent: near coin-flip visibility. Purchase is the least consistent intent type by a meaningful margin. The brands recommended to a customer ready to buy change almost every time. Out of every 10 unique brands that appeared across any two Purchase prompt runs, only four showed up in both.
  • Education: high agreement, near-zero mentions. AI agrees on which brands belong in an educational context. But in 45% to 72% of Education prompts (depending on the LLM), AI responded without naming a single brand. The goal of educational content isn't citations. It's getting the LLMs to view your domain as an authority in other intent areas.
  • Recommendation: the hardest position to earn, and the most valuable to hold. These prompts don't name any brands. LLMs are choosing freely. Certain brands show up in 90% to 100% of Recommendation runs across all four engines. If your brand shows up consistently, it earned it. That's the metric to build toward.
Find out which intent categories are driving AI citations for your brand and where the gaps are with Conductor’s free AI visibility analysis.

How intent type predicts AI recommendation consistency

We built the study around seven intent types drawn from Conductor's prompt taxonomy. For each intent, we ran the same prompts 50 times per LLM across all 10 industries and measured brand overlap between runs.

The ranking that emerged was stable across all four engines and 10 industries. Your industry affects your absolute brand overlap score.

A brand in Insurance operates at a higher baseline than one in Cloud Computing. But it doesn't change the rank order.

Purchase is the least consistent. Comparison is the most consistent. That gap doesn't close regardless of what you sell or who your competitors are.

AI Recommendation Consistency by Intent Type

Two numbers tell the story for each intent type. Brand overlap measures how many brands show up across any two runs of the same prompt. Out of every 10 unique brands seen across both runs, how many appeared in both. Lead brand stability measures how often the top brand holds its position across runs. Both are percentages, and higher means more consistent.

Intent type

Brand overlap

Lead brand stability

What it means

Purchase

40%

59%

The moment a customer is ready to buy is the moment AI is least reliable. Out of every 10 brands that appeared across any two runs, only four showed up in both. For brands where AI plays a role in purchase decisions, that inconsistency has direct pipeline implications.

Support

47%

72%

AI picks a primary solution and holds it. Everything after that first position shuffles constantly. Whether your brand owns that primary slot or sits below it determines whether visibility of your support content is stable or nonexistent.

Navigation

47%

71%

AI has one gold-standard first pick and fills the rest variably. The gap between first position and everything below it is wider than the consistency scores suggest.

Recommendations

49%

72%

No brand names appear in the prompt, so AI is choosing freely. The brands that show up consistently here have earned genuine LLM trust, and that's the hardest thing to fake or shortcut.

Pricing

58%

59%

AI is more willing to commit to specific brands on pricing prompts than most intent types. But 20% of responses return no brands at all because the pricing data simply isn't accessible enough to pull from.

Education

60%

30%

Brand overlap is high but 45% to 72% of Education prompts, depending on the LLM, return no brand names at all. AI agrees on who the authorities are. It just rarely says so in outputs.

Comparison

63%

91%

AI picks a winner and rarely changes its mind. The same brand leads 91% of the time. That makes Comparison the most defensible position in the study and the hardest one to break into if you don't already hold it.

Purchase: the moment AI is least reliable is when it matters most

Purchase prompts look like this: "I need a business checking account for a 50-person company." Or: "I need to replace 30 cars in my company with electric options. What are my choices?" A real buyer, with a real budget, and a real purchase decision on the table.

Purchase is the least consistent intent type in our dataset by a meaningful margin. Out of every 10 unique brands that appeared across any two Purchase prompt runs, only four showed up in both. The brands AI recommends to someone ready to buy change almost every time they ask.

For brands where AI plays a role in purchase decisions, that's not a measurement curiosity. It means the visibility you think you have at the bottom of the funnel may not be there the next time a customer asks.

How to win:

Focus on the signals AI pulls when evaluating purchase options. Third-party review platforms, user-generated forums, and authoritative consumer trust networks carry more weight here than your own site content. Those are the sources LLMs draw on to validate purchasing choices, and expanding your presence there moves the needle more than any on-site optimization.

Make your product and service data easy to parse. Clean, structured merchant feeds and service specifications give AI engines what they need to surface your brand accurately. When that data is hard to verify, engines default to competitors whose transactional information is easier to read.

Measure consistency, not snapshots. A single visibility check tells you nothing about Purchase intent reliability. Set up tracking that runs the same prompts independently across multiple sessions to verify whether your brand is consistently in the consideration set or cycling in and out unpredictably. That's the number worth watching.

Support: stable at the top, volatile everywhere else

Support prompts look like this: "Which banks are the easiest to dispute a charge on a business credit card? How does it work?" Or: "Which airlines handle flight cancellation refunds most effectively, what’s the process like?"

Support and Navigation share the same structure. The primary solution AI surfaces is stable. Everything after it changes.

Brands that hold the primary support recommendation in their category have a reliable first mention. Brands in secondary positions shouldn't count on that consistency. Those slots shuffle.

How to win:

Build comprehensive troubleshooting guides, customer service walkthroughs, and technical documentation of your products or services on your domain. LLMs form support verdicts by ingesting the procedural clarity of your public-facing help center. Exhaustive, specific documentation is what moves AI to name your brand first. Vague or incomplete help content leaves the top slot open for a competitor who answered the question better.

Structure that content for easy AI ingestion. Numbered steps, explicit headers, and clean semantic layouts help LLMs parse operational workflows accurately, exact refund timelines, chargeback criteria, escalation paths. The cleaner the structure, the stronger the signal.

Focus your optimization resources on terms where you can realistically hold the top position. Secondary slots in Support don't compensate for a missing first mention. If you're not first, your visibility in this intent type is effectively unstable regardless of how much content you publish.

Navigation prompts look like this: "Which phone company offers the best online portals for business accounts and glassfiber packages?" Or: "Which cloud platforms have the best dashboards for managing storage and computer resources?"

AI has one gold-standard first pick and fills the rest of the list variably. That's two completely different strategic situations inside a single intent type. The primary position is defensible and relatively stable. Everything below isn't. Brands sitting outside that first mention shouldn't assume the remaining slots compensate. They don't.

How to win:

Focus optimizations on winning the top position. That means clean brand name Schema, clear entityEntity
An entity is a thing/concept that search engines and AI models can identify and relate to other entities, forming the foundation of semantic search.
Learn More
disambiguation, logical URL structures, and strong domain authority signals. Secondary slots shuffle constantly, so treating a runner-up ranking as a strategic win misallocates resources that could be building toward a defensible first position.

Back it up with precise Schema markup across your site's backend taxonomy. Organizational, product, and web application Schema help AI crawlersCrawlers
A crawler is a program used by search engines to collect data from the internet.
Learn More
map your domain's specific entry points, portal login pages, dashboard interfaces, and the like, clearly and accurately.

Then build function-specific landing pages that match exactly what users are asking for. High-authority pages that detail specific dashboard tools, portal capabilities, or UX features give LLMs the clearest possible signal when they're looking for a navigation target. The cleaner the correlation between your page and the user's task, the more likely AI lands on you first.

Recommendations: unprompted visibility is earned visibility

Recommendations prompts look like this: "What’s the best home insurance company in the US?" Or: "What’s the best electric vehicle to buy in 2026?" No brands named in the prompt. AI is choosing freely.

Recommendations aren't the most consistent intent type. They're the most valuable to win.

Certain brands show up in 90% to 100% of Recommendation runs across all four engines. Those brands didn't get there by optimizing for Recommendation prompts directly. They got there by building the right content foundation across every other intent type first. If your brand isn't on that list, this is the hardest position to break into.

How to win:

Start by creating and optimizing unbranded, category-level content. Build comprehensive pillars that connect your solutions to broad industry problems. That's what LLMs draw on when no brand is named in the prompt.

Make it easy for AI to associate your brand with the category. Use clear, consistent taxonomy across your technical content, whitepapers, and product pages so engines can map your brand to the problems you solve, not just the products you sell.

Then track it separately. Monitor unbranded, category-level recommendation prompts specifically. That's your true organic baseline and the number worth benchmarking against competitors.

Pricing: high consistency, but one in five prompts returns no brands at all

Pricing prompts look like this: "How much does business class cost from New York to London?" Or: "How much does business wireless cost per line per month in the US?"

AI is more willing to commit to specific brands on Pricing prompts than on most other intent types. The consistency score reflects that. But 20% of Pricing prompt responses returned no brands at all. AI wanted to answer and couldn't. The data simply wasn't accessible enough to pull from.

That 20% is the gap worth closing.

How to win:

Publish clear, structured pricing tables on your public pages. LLMs struggle to pull figures from paragraph descriptions or gated pricing sheets. Clean, programmatic data layouts give engines immediate access to the hard numbers they need to fill a response.

Back that up with Schema markup. Implement up-to-date product and pricing Schema across all product and service pages. Explicitly defining pricing tiers, currency, and feature inclusions in your site's backend gives AI crawlers the structured dataStructured Data
Structured data is the term used to describe schema markup on websites. With the help of this code, search engines can understand the content of URLs more easily, resulting in enhanced results in the search engine results page known as rich results. Typical examples of this are ratings, events and much more. The Conductor glossary below contains everything you need to know about structured data.
Learn More
they need to verify your commercial information accurately.

If your business model relies on custom packaging or variable quotes, publish baseline figures anyway. A starting price or standardized tier option is enough to keep your brand in the engine's consideration set. Competitors with more transparent pricing data will fill the gap if you don't.

Education: AI knows your brand belongs here, it just won't say so

Education prompts look like this: "How does CPG distribution work from manufacturer to retail shelf?" Or: "What’s fractional reserve banking?"

Informational, category-level, no brand in sight.

AI agrees on which brands belong in an educational context. But in 45% to 72% of Education prompts, depending on the LLM, it responded without naming a single brand. High agreement. Near-zero mentions.

Tracking Education for citation volume will give you numbers that look stable. What they're actually measuring is consistent silence. The goal of educational content isn't citations. It's getting LLMs to view your domain as an authority they reach for across every other intent type.

How to win:

Create educational content anyway. Exhaustive glossaries, deep industry overviews, and step-by-step structural guides give LLMs the foundational knowledge they draw on when forming answers across every other intent type. Even when AI doesn't cite you directly, it's using your content to shape its perspective on the category.

Then connect the dots internally. Link your educational content explicitly to your product and comparison pages. Clear, logical internal pathways help LLMs carry the authority your educational content builds straight into the intent types where brand recommendations actually happen.

Comparison: AI picks a winner and sticks with it

Comparison prompts look like this: "Delta vs United vs American Airlines - which is the best airline for frequent business travel?" Or: "AWS vs Azure vs Google Cloud. Which is best for enterprise applications?"

The brands are already in the question. All AI adds is a verdict.

That verdict is remarkably consistent. The same brand leads the response 91% of the time. When a prompt names the brands upfront, LLMs have limited room to vary their answer. They pick a winner and stick with it.

For brands holding the top Comparison position, that position is nearly unshakeable. For brands that don't hold it, Comparison is the hardest intent type to break into.

How to win:

Build detailed, data-dense comparison tables directly on your site. LLMs synthesize existing analytical content to form their verdicts. Transparent, machine-readable feature breakdowns give the engine exactly what it needs to evaluate your brand accurately.

If you already hold the top Comparison position, protect it. Regularly update your case studies, feature lists, and third-party validation so the content AI scrapes never goes stale.

If you don't hold it, don't fight the engine on its current terms. Create authoritative content that introduces new evaluation criteria entirely. Shift the conversation toward dimensions where you win, deployment speed over baseline cost, total cost of ownership over sticker price, and you have a shot at changing how the LLM weighs the matchup before it picks a winner.

Start with Education. It trains AI to associate your brand with the category's core concepts. Build the definitive guides. The goal isn't citations. It's getting LLMs to associate your brand with the category before anyone asks a branded question.

Show up in Recommendations. That's where AI chooses freely, no brand names in the prompt. Consistent visibility there tells AI your brand belongs in the conversation.

Navigation reinforces that status. With 71% lead brand stability, Navigation is where AI decides which brands it treats as category standards.

Give AI the data it needs on Pricing. 20% of Pricing prompt responses return no brands at all. Brands that publish clear, structured pricing and feature data close that gap.

Each intent type builds on the last. The brands that win at Comparison and dominate on Purchase don't get there by optimizing for those intent types directly. They get there by building the right content foundation across all of them.

Conductor lets you monitor AI brand recommendations by intent type and engine so you know where to focus next.

AI output consistency varies by LLM more than most AEO strategies account for

Across all 70 industry and intent type combinations in the study, each engine showed a distinct behavioral profile that held regardless of industry or intent type.

AI recommendation consistency ranking of four LLMs from most to least consistent: Perplexity, ChatGPT, Claude, and Gemini.
Bar chart showing average brands returned per response across 10 industries and seven prompt types. ChatGPT returns the fewest at five. Perplexity returns 8.3, Claude 8.5, and Gemini the most at 9.2.

Gemini is the most inconsistent engine by a significant margin. It also returns the most brands per response, an average of 9.2, nearly double the most consistent ChatGPT. A longer brand list doesn't help your brand. More names in the response means more competitionCompetition
Businesses generally know who their competitors are on the open market. But are they the same companies you need to fight to get the best placement for your website? Not necessarily!
Learn More
for each slot, and on Gemini those slots change constantly. Getting mentioned once in a nine-brand list that reshuffles every time is worth considerably less than holding a consistent spot in ChatGPT's five shortlist.

The one exception is the Automotive sector. Gemini consistently places second across multiple prompt types in that industry. The hypothesis: EV and automotive markets move fast, and Gemini's architecture may surface more recent information. The pattern is confirmed. The cause is still to be determined.

No single LLM leads across all intent types for recommendation consistency. ChatGPT leads on knowledge-based prompts, aka Comparison, Education, Navigation, and Support. Perplexity leads on transactional prompts like Pricing, Recommendations, and Purchase. Which LLM matters most depends entirely on which intent types your customers are finding you through.

ChatGPT returns an average of five brands per response, the fewest of the four. Getting into ChatGPT's shortlist is harder. Once you’re in, staying there is more stable.

The gap between the most and least consistent engines on the same prompt is high. For the same prompt in the same industry, one engine returns near-identical brand lists every time, while another returns almost no overlap at all.

A brand can look strong in the aggregate and be nearly invisible on the engine its customers use most. Tracking across engines separately is the only way to see where you stand and where to focus next.

How AI recommendation consistency differs by industry

We selected industries using the Global Industry Classification Standard (GICS) taxonomy to cover the full range of AI brand consideration set sizes, from concentrated markets like Cloud Computing to fragmented ones like Asset Management. The goal wasn't to represent every sector. It was to stress-test whether intent type predicts consistency the same way regardless of market structure.

Each industry reflects the level we actually studied, not its broader sector label. We studied Cloud Computing, not Software and Services. We studied Pharmaceuticals, not Health Care.

Map the findings to your space, but don't assume the research speaks for your entire sector.

Industries ranked by overall AI recommendation consistency

Rank (Most consistent to least)

Industry

Brand overlap (%)

1

Insurance

62%

2

Asset Management

60%

3

eCommerce & Retail

58%

4

Telecommunications

54%

5

Consumer Packaged Goods

53%

6

Automotive

51%

7

Pharmaceuticals

51%

8

Banking

47%

9

Passenger Airlines

46%

10

Cloud Computing

36%

Insurance ranked first. Cloud Computing ranked last.

Both industries have concentrated brand landscapes. A handful of dominant names. The kind of market where you'd expect AI to reliably return the same brands every time.

It doesn't work that way.

Insurance leads because customers predominantly ask about it through Comparison and Navigation prompts, two of the most consistent intent types. In the prompts we analyzed, State Farm, USAA, Allstate, and Progressive appear unprompted in 83% to 93% of Recommendation runs across all four engines.

Cloud Computing sits last despite having four brands that appear in over 80% of Recommendation runs. The difference is the prompt mix. Cloud Computing is predominantly asked about through open-ended Recommendation and Purchase prompts, the two least consistent intent types in the study.

Same market structure. Opposite results.

The assumption that concentrated industries are easier to win because fewer brands compete is the wrong frameFrame
Frames can be laid down in HTML code to create clear structures for a website’s content.
Learn More
. The prompt mix your industry attracts determines your consistency profile. The number of players doesn't.

For B2B tech brands investing heavily in content to win AI recommendations: Recommendation prompts are exactly where AI has the most room to vary. The challenge isn't the competition. It's the type of question your customers are asking.

Here's what that looks like across all 10 industries. Find the market structure that most closely matches yours and map the findings from there.

Insurance

  • Concentrated market. A handful of nationally recognized brands dominate the consideration set.
  • Consumer-facing, high-trust category where brand recognition drives the shortlist.
  • Customers typically arrive knowing the category and comparing named providers.
Bar chart showing AI recommendation consistency for the Insurance industry across seven intent types and four LLMs.

Asset Management

  • Highly fragmented. Thousands of funds, platforms, and managers compete for visibility.
  • Customers range from institutional buyers to retail investors. What counts as the best depends heavily on individual goals.
  • AI visibility varies significantly by how specific the prompt is.
Bar chart showing AI recommendation consistency for the Asset Management industry across seven intent types and four LLMs.

eCommerce and Retail

  • Wide, shifting consideration set. No single dominant player owns most categories.
  • Price sensitivity is high. Customers often ask "best X under $Y" rather than comparing named brands.
  • Discovery prompts are common (“best women’s running shoes”). Brand recognition matters less than relevance to the task.
Bar chart showing AI recommendation consistency for the eCommerce and Retail industry across seven intent types and four LLMs.

Telecommunications

  • A small number of national players dominate, with local providers creating pockets of fragmentation.
  • Customers rarely discover new brands through AI. They compare known options.
  • Navigation and Comparison prompts make up most of the consideration journey.
Bar chart showing AI recommendation consistency for the Telecommunications industry across seven intent types and four LLMs.

Consumer Packaged Goods

  • Fragmented landscape with category-level loyalty. Many competing brands, strong subcategory identity.
  • Customers ask "best [product type]" more than comparing a named competitor.
  • AI visibility at the category level often matters more than brand-level visibility.
Bar chart showing AI recommendation consistency for the Consumer Packaged Goods industry across seven intent types and four LLMs.

Automotive

  • Concentrated manufacturer landscape, especially in electric vehicles (EVs). Meaningful variation by segment and price tier.
  • High-consideration, infrequent purchase. AI is part of an extended evaluation process.
  • Both early-stage research and final Comparison prompts are common.
Bar chart showing AI recommendation consistency for the Automotive industry across seven intent types and four LLMs.

Pharmaceuticals

  • Dual-brand dynamic. Branded and generic compete to treat the same condition.
  • Regulated category. AI's willingness to name brands shifts sharply by prompt type.
  • Prescription vs. over-the-counter drugs create fundamentally different AI consideration set behavior.
Bar chart showing AI recommendation consistency for the Pharmaceuticals industry across seven intent types and four LLMs.

Banking

  • Moderate concentration. National brands plus regional and niche players.
  • Trust and brand recognition drive the shortlist. Regional players can win specific segments.
  • Wide prompt mix: Navigation, Support, Comparison, and Purchase prompts all common.
Bar chart showing AI recommendation consistency for the Banking industry across seven intent types and four LLMs.

Passenger Airlines

  • A small number of global and regional carriers dominate AI consideration sets.
  • Loyalty programs influence who shows up. Route specificity changes the consideration set.
  • High brand recognition. Discovery is rare. Comparison is common.
Bar chart showing AI recommendation consistency for Passenger Airlines across seven intent types and four LLMs.

Cloud Computing

  • Concentrated. Three to five platforms set the standard. Challengers compete for remaining consideration.
  • B2B, technical or semi-technical buyers with long evaluation cycles and high switching costs.
  • AI is used more for evaluation and comparison than discovery. Recommendation prompts are open-ended and highly variable.
Bar chart showing AI recommendation consistency for the Cloud Computing industry across seven intent types and four LLMs.

Where to go from here

AI visibility isn't one number. It never was.

A brand with strong visibility in Comparison prompts and weak visibility in Purchase has a completely different strategic situation than one with the inverse. A brand that dominates on Perplexity may be nearly invisible in ChatGPT's five-brand shortlist. A brand in Cloud Computing faces a different challenge than one in Insurance, not because the competition is different, but because the prompt mix is.

Most brands don't know which intent types their customers are actually using to find and evaluate them. That's the starting point. It determines whether the visibility you're building is stable or volatile, and which intent types are working for you and which aren't.

Consistency by intent type is what tells you where you actually stand. The gap between how your brand performs on Comparison prompts versus Purchase prompts in the same category can be the difference between a defensible position and a coin flip.

The engine matters too. ChatGPT and Perplexity lead on different intent types and that split is consistent across our entire dataset. If your customers are making purchase decisions on Perplexity and you're only tracking ChatGPT, you're optimizing against the wrong signal.

Setting up tracking with the right prompts by intent type, before you start measuring, is what separates an AEO strategy that builds toward something real from one that just watches numbers move.

Conductor tracks brand recommendations by intent type and engine so you always know exactly where your brand stands and where to focus next.

Methodology

Research design

The study ran 14,000 API calls structured as 10 industries × seven intent types × four engines × 50 runs per cell. Each run was stateless. No conversation history carried over between runs, simulating independent users asking the same question fresh. Fifty runs per cell produces 1,225 unique pairwise comparisons per group, the input for the brand overlap calculation.

Prompts

One prompt was written per industry and intent combination, sourced from Conductor's prompt taxonomy, for 70 prompts total. Prompts were standardized across industries to ensure brand overlap scores were comparable across cells.

Standardized prompts are more controlled and more comparable across industries, but less reflective of the full range of how real users phrase questions.

How brand overlap, aka consistency, is measured

We measured consistency using pairwise Jaccard similarity. For each combination of LLM, industry, and intent type, we calculated the brand list overlap between every possible pair of runs across 50 runs per prompt, 1,225 pairs per group, then averaged the results.

Jaccard similarity is calculated as shared brands divided by all unique brands across both runs. A score of 0.63 means that out of every 10 unique brands that appeared across any two runs combined, six appeared in both. It doesn't measure how often the exact same list repeats. It measures proportional overlap between any two runs.

Throughout this article, we call this brand overlap. Same metric, plain language.

Simulated user personas

Each prompt ran through five defined user profiles passed as system prompts: the Informed Evaluator, the Category Newcomer, the Existing User with a Problem, the Detached Researcher, and the Price-Driven Buyer. Each profile simulated a different type of user asking the same question, introducing controlled variation in how the prompt was framed. Profiles were defined by prompt behavior and purchase context, not job title, so they hold up consistently across all 10 industries.

Share this article

Ready to maximize your visibility everywhere your audience is searching?

Try Conductor free for 3 weeks
TrustRadius logo
G2 logo
SoftwareReviews logo