Skip to main content

Same Prompt, Four AIs — Why the Answers Aren’t the Same

Same Prompt, Four AIs — Why the Answers Aren’t the Same

The differences aren’t just in the answers—they’re in the thinking


image of four different ways for doing the same task




Generative AI tools are often discussed as if they were interchangeable—different interfaces delivering broadly similar outputs. However, when applied to complex intellectual tasks, meaningful differences begin to emerge.

To explore this, I ran the same academically rigorous prompt through four leading systems—Claude, ChatGPT, Google Gemini, and Copilot. The task required a full thematic analysis of a researcher’s career using the framework developed by Virginia Braun and Victoria Clarke.

What followed was not simply variation in output, but variation in how each system approached the act of analysis itself.


Same Input, Different Interpretations

At a high level, the experiment is simple:

One prompt → Four models → Four distinct approaches

What changes is not the instruction, but how each system:

  • Interprets the task
  • Handles uncertainty
  • Applies methodology
  • Defines what “good analysis” looks like

Experiment Overview

Task
Conduct a full thematic analysis of an academic career my Google Scholar profile using Braun & Clarke’s method

Models Tested

  • Claude
  • ChatGPT
  • Google Gemini
  • Copilot

Objective
Compare how each system interprets and executes the same analytical task


Four Models, Four Approaches

Claude — Methodical and Process-Driven

Claude takes a structured, research-oriented approach. It prioritises clarification, explicitly defines assumptions, and follows a clear analytical sequence. Did use the rofile but also other sources.

Notable pattern: strong emphasis on methodological discipline and balanced thematic output. It also roduce a downloadable document.


ChatGPT — Balanced and Interpretive

ChatGPT proceeds with minimal clarification, making reasonable assumptions and focusing on producing a clear, coherent synthesis. Couldn't access Google Scholar so went to use a number of sources.

Notable pattern: strong narrative flow and accessibility without losing analytical depth.


Gemini — Conceptual but Variable

Gemini initially misidentified the researcher, then corrected itself and completed the analysis. Its final output leans more toward conceptual and philosophical interpretation. Did access Google Scholar.

Notable pattern: generates abstract insights, but shows variability in grounding.


Copilot — Formal and Comprehensive

Copilot follows Braun & Clarke closely, explicitly labelling each phase and producing a detailed, academic-style report.Unlike the others it did not say it was using the rofile  but have to assume it has.

Notable pattern: high transparency and completeness, with extensive methodological detail.


Comparison Table 1: Behavioural Differences

DimensionClaudeChatGPTGeminiCopilot
Clarification behaviourHighModerate   ModerateVery high
Handling uncertaintyCautiousAssumptive but controlled   VariableHighly cautious
Method adherenceStrongFlexible   ConceptualVery strict
Output lengthMediumMedium   Medium Long

Comparison Table 2: Thinking Style

ModelDominant StyleEmphasisLess Emphasised
ClaudeProceduralMethod and structure   Speed and brevity
ChatGPTInterpretiveClarity and synthesis   Formal process detail
GeminiConceptualAbstract insight   Consistent grounding
CopilotFormalCompleteness and rigour   Concision

Comparison Table 3: Alignment by Use Case

Use CaseAlignmentWhy
Academic reportingCopilotExplicit methodological structure
Research designClaudeStrong process discipline
CommunicationChatGPTClear and accessible synthesis
Conceptual explorationGeminiEmphasis on abstract interpretation

What This Means in Practice

A few patterns emerge from this comparison:

  • Different models frame problems differently, even with identical prompts
  • Structured tasks benefit from models that foreground process
  • Communication-focused outputs benefit from clarity and synthesis
  • Conceptual exploration can surface deeper insights—but may require validation

These are tendencies rather than rules, but they are consistent.


The Common Prompt Used

From this profile https://scholar.google.co.uk/citations?user=ghQedZAAAAAJ&hl=en carryout a thematic analysis. Before doing the analysis please ask any question you need, then do the analysis, please CONSTRAINTS. Please use the steps in Baun and Clarke (V. Braun, V. Clarke Using thematic analysis in psychology Qual. Res. Psychol., 3 (2) (2006), pp. 77-101, 10.1191/1478088706qp063oa ) to do this.

CONSTRAINTS
Scope: Whole career
Depth of Data: All available data including publication title, abstract, etc.
Analytical lens: No specific research question
Target themes: Look for semantic themes and then a separate analysis of Latent themes


Final Reflection

The key takeaway is not which model is “best,” but that each embodies a different interpretation of what it means to analyse.

Understanding those differences is what turns generative AI from a tool into a capability.


Personally, I liked the results that Claude.ai produced. The report generated was more in line with what I was hoping for. All did the job, though not all used the Google Scholar profile.



All opinions in this blog are the Author's and should not in any way be seen as reflecting the views of any organisation the Author has any association with. Twitter @scottturneruon

Comments

Popular posts from this blog

GenAI Productivity: Ideas to project proposal 1

One of the ways I use Generative AI with students is to take basic ideas for projects, usually a title, and get these tools to greater ideas and start of a project proposal. This is with all the usual caveats  Check the references (if any); It is going to be basic, so extend it. In this example I am going to use Co-pilot but the ChatGPT, etc can be used, employing a few basic prompt engineering basics: personas (who is the target audience?) and Templates (how do I want it to look?) to start this process. Example:  Project ideas for MSc Data Intelligence students (persona)  on a particular topic. The reply will include subheadings and relevant (hopefully) content for  TITLE, INTRODUCTION, PROBLEM STATEMENT. The prompt: " Taking the topic "Leveraging open-source tools to measure and present academics publications automatically from public domain data.". Give five innovative projects for a Master's level student dissertation in Data Intelligence. Each project example wi...

Getting multiple viewpoints with ChatGPT

Well sort of! There are approaches where we can get the generative AI to look at a problem from multiple perspectives (or personas) and bring the ideas generated, ideally informed by the others. to a final plan. One of the main strategy is called Tree of Thoughts (see here for more detail  https://www.forbes.com/sites/lanceeliot/2023/09/08/prompt-engineering-embraces-tree-of-thoughts-as-latest-new-technique-to-solve-generative-ai-toughest-problems/?sh=5ce79bdb2c8b ). The central idea is get a number of expert opinions, allow potential cross-fertilization of ideas, come up with actions or plans. Let see this action.  Scenario: Find out about the UK Government's plans on Disability support and then use Tree of Thoughts to produce some ideas for a company making disability equipment based on their website. Google's Gemini will be used. Stage 1 "UK Governments plans on Disability support ": Prompt:  Read, convert to plain text and consolidate information from the followi...

AI as a Mirror: Transforming Vague Student Ideas into a More Rigorous Project Agreement

The Problem: The "Generic App" and the "Time Sink" We’ve all been there: a student walks into a 1-to-1 with a vague desire to "do something with AI" or "build a fitness app." You spend 45 minutes trying to find a technical "hook" that justifies a Level 6 or Level 7 grade, only for the student to drift back into "CRUD app" territory by week three. The Philosophy: AI as a Mirror Instead of you doing the heavy lifting, this workflow uses AI as a Mirror . It reflects the student’s own skills and career goals back to them, but with the structural rigour of a virtual supervisory team. It’s not about the AI "giving" the idea; it’s about the AI forcing the student to defend and refine their own concepts until they hold water. The Framework: 3 Months of Rigour This prompt is specifically designed for intensive/conversion MSc or summer capstone projects . It assumes a tight 12-week implementation window. By forcing the AI to w...