Tuesday, 28 April 2026

Same Prompt, Four AIs — Why the Answers Aren’t the Same

Same Prompt, Four AIs — Why the Answers Aren’t the Same

The differences aren’t just in the answers—they’re in the thinking


image of four different ways for doing the same task




Generative AI tools are often discussed as if they were interchangeable—different interfaces delivering broadly similar outputs. However, when applied to complex intellectual tasks, meaningful differences begin to emerge.

To explore this, I ran the same academically rigorous prompt through four leading systems—Claude, ChatGPT, Google Gemini, and Copilot. The task required a full thematic analysis of a researcher’s career using the framework developed by Virginia Braun and Victoria Clarke.

What followed was not simply variation in output, but variation in how each system approached the act of analysis itself.


Same Input, Different Interpretations

At a high level, the experiment is simple:

One prompt → Four models → Four distinct approaches

What changes is not the instruction, but how each system:

  • Interprets the task
  • Handles uncertainty
  • Applies methodology
  • Defines what “good analysis” looks like

Experiment Overview

Task
Conduct a full thematic analysis of an academic career my Google Scholar profile using Braun & Clarke’s method

Models Tested

  • Claude
  • ChatGPT
  • Google Gemini
  • Copilot

Objective
Compare how each system interprets and executes the same analytical task


Four Models, Four Approaches

Claude — Methodical and Process-Driven

Claude takes a structured, research-oriented approach. It prioritises clarification, explicitly defines assumptions, and follows a clear analytical sequence. Did use the rofile but also other sources.

Notable pattern: strong emphasis on methodological discipline and balanced thematic output. It also roduce a downloadable document.


ChatGPT — Balanced and Interpretive

ChatGPT proceeds with minimal clarification, making reasonable assumptions and focusing on producing a clear, coherent synthesis. Couldn't access Google Scholar so went to use a number of sources.

Notable pattern: strong narrative flow and accessibility without losing analytical depth.


Gemini — Conceptual but Variable

Gemini initially misidentified the researcher, then corrected itself and completed the analysis. Its final output leans more toward conceptual and philosophical interpretation. Did access Google Scholar.

Notable pattern: generates abstract insights, but shows variability in grounding.


Copilot — Formal and Comprehensive

Copilot follows Braun & Clarke closely, explicitly labelling each phase and producing a detailed, academic-style report.Unlike the others it did not say it was using the rofile  but have to assume it has.

Notable pattern: high transparency and completeness, with extensive methodological detail.


Comparison Table 1: Behavioural Differences

DimensionClaudeChatGPTGeminiCopilot
Clarification behaviourHighModerate   ModerateVery high
Handling uncertaintyCautiousAssumptive but controlled   VariableHighly cautious
Method adherenceStrongFlexible   ConceptualVery strict
Output lengthMediumMedium   Medium Long

Comparison Table 2: Thinking Style

ModelDominant StyleEmphasisLess Emphasised
ClaudeProceduralMethod and structure   Speed and brevity
ChatGPTInterpretiveClarity and synthesis   Formal process detail
GeminiConceptualAbstract insight   Consistent grounding
CopilotFormalCompleteness and rigour   Concision

Comparison Table 3: Alignment by Use Case

Use CaseAlignmentWhy
Academic reportingCopilotExplicit methodological structure
Research designClaudeStrong process discipline
CommunicationChatGPTClear and accessible synthesis
Conceptual explorationGeminiEmphasis on abstract interpretation

What This Means in Practice

A few patterns emerge from this comparison:

  • Different models frame problems differently, even with identical prompts
  • Structured tasks benefit from models that foreground process
  • Communication-focused outputs benefit from clarity and synthesis
  • Conceptual exploration can surface deeper insights—but may require validation

These are tendencies rather than rules, but they are consistent.


The Common Prompt Used

From this profile https://scholar.google.co.uk/citations?user=ghQedZAAAAAJ&hl=en carryout a thematic analysis. Before doing the analysis please ask any question you need, then do the analysis, please CONSTRAINTS. Please use the steps in Baun and Clarke (V. Braun, V. Clarke Using thematic analysis in psychology Qual. Res. Psychol., 3 (2) (2006), pp. 77-101, 10.1191/1478088706qp063oa ) to do this.

CONSTRAINTS
Scope: Whole career
Depth of Data: All available data including publication title, abstract, etc.
Analytical lens: No specific research question
Target themes: Look for semantic themes and then a separate analysis of Latent themes


Final Reflection

The key takeaway is not which model is “best,” but that each embodies a different interpretation of what it means to analyse.

Understanding those differences is what turns generative AI from a tool into a capability.


Personally, I liked the results that Claude.ai produced. The report generated was more in line with what I was hoping for. All did the job, though not all used the Google Scholar profile.



All opinions in this blog are the Author's and should not in any way be seen as reflecting the views of any organisation the Author has any association with. Twitter @scottturneruon

No comments:

Post a Comment

Same Prompt, Four AIs — Why the Answers Aren’t the Same

Same Prompt, Four AIs — Why the Answers Aren’t the Same The differences aren’t just in the answers—they’re in the thinking Generative AI too...