Same Prompt, Four AIs — Why the Answers Aren’t the Same

The differences aren’t just in the answers—they’re in the thinking

image of four different ways for doing the same task

Generative AI tools are often discussed as if they were interchangeable—different interfaces delivering broadly similar outputs. However, when applied to complex intellectual tasks, meaningful differences begin to emerge.

To explore this, I ran the same academically rigorous prompt through four leading systems—Claude, ChatGPT, Google Gemini, and Copilot. The task required a full thematic analysis of a researcher’s career using the framework developed by Virginia Braun and Victoria Clarke.

What followed was not simply variation in output, but variation in how each system approached the act of analysis itself.

Same Input, Different Interpretations

At a high level, the experiment is simple:

One prompt → Four models → Four distinct approaches

What changes is not the instruction, but how each system:

Interprets the task
Handles uncertainty
Applies methodology
Defines what “good analysis” looks like

Experiment Overview

Task
Conduct a full thematic analysis of an academic career my Google Scholar profile using Braun & Clarke’s method

Models Tested

Claude
ChatGPT
Google Gemini
Copilot

Objective
Compare how each system interprets and executes the same analytical task

Four Models, Four Approaches

Claude — Methodical and Process-Driven

Claude takes a structured, research-oriented approach. It prioritises clarification, explicitly defines assumptions, and follows a clear analytical sequence. Did use the rofile but also other sources.

Notable pattern: strong emphasis on methodological discipline and balanced thematic output. It also roduce a downloadable document.

ChatGPT — Balanced and Interpretive

ChatGPT proceeds with minimal clarification, making reasonable assumptions and focusing on producing a clear, coherent synthesis. Couldn't access Google Scholar so went to use a number of sources.

Notable pattern: strong narrative flow and accessibility without losing analytical depth.

Gemini — Conceptual but Variable

Gemini initially misidentified the researcher, then corrected itself and completed the analysis. Its final output leans more toward conceptual and philosophical interpretation. Did access Google Scholar.

Notable pattern: generates abstract insights, but shows variability in grounding.

Copilot — Formal and Comprehensive

Copilot follows Braun & Clarke closely, explicitly labelling each phase and producing a detailed, academic-style report.Unlike the others it did not say it was using the rofile but have to assume it has.

Notable pattern: high transparency and completeness, with extensive methodological detail.

Comparison Table 1: Behavioural Differences

Dimension	Claude	ChatGPT	Gemini	Copilot
Clarification behaviour	High	Moderate	Moderate	Very high
Handling uncertainty	Cautious	Assumptive but controlled	Variable	Highly cautious
Method adherence	Strong	Flexible	Conceptual	Very strict
Output length	Medium	Medium	Medium	Long

Comparison Table 2: Thinking Style

Model	Dominant Style	Emphasis	Less Emphasised
Claude	Procedural	Method and structure	Speed and brevity
ChatGPT	Interpretive	Clarity and synthesis	Formal process detail
Gemini	Conceptual	Abstract insight	Consistent grounding
Copilot	Formal	Completeness and rigour	Concision

Comparison Table 3: Alignment by Use Case

Use Case	Alignment	Why
Academic reporting	Copilot	Explicit methodological structure
Research design	Claude	Strong process discipline
Communication	ChatGPT	Clear and accessible synthesis
Conceptual exploration	Gemini	Emphasis on abstract interpretation

What This Means in Practice

A few patterns emerge from this comparison:

Different models frame problems differently, even with identical prompts
Structured tasks benefit from models that foreground process
Communication-focused outputs benefit from clarity and synthesis
Conceptual exploration can surface deeper insights—but may require validation

These are tendencies rather than rules, but they are consistent.

The Common Prompt Used

From this profile https://scholar.google.co.uk/citations?user=ghQedZAAAAAJ&hl=en carryout a thematic analysis. Before doing the analysis please ask any question you need, then do the analysis, please CONSTRAINTS. Please use the steps in Baun and Clarke (V. Braun, V. Clarke Using thematic analysis in psychology Qual. Res. Psychol., 3 (2) (2006), pp. 77-101, 10.1191/1478088706qp063oa ) to do this.
CONSTRAINTS
Scope: Whole career
Depth of Data: All available data including publication title, abstract, etc.
Analytical lens: No specific research question
Target themes: Look for semantic themes and then a separate analysis of Latent themes

Final Reflection

The key takeaway is not which model is “best,” but that each embodies a different interpretation of what it means to analyse.

Understanding those differences is what turns generative AI from a tool into a capability.

Personally, I liked the results that Claude.ai produced. The report generated was more in line with what I was hoping for. All did the job, though not all used the Google Scholar profile.

All opinions in this blog are the Author's and should not in any way be seen as reflecting the views of any organisation the Author has any association with. Twitter @scottturneruon

Applying prompts to GenAI/LLMs

Pages

Tuesday, 28 April 2026