Same Prompt, Four AIs — Why the Answers Aren’t the Same
The differences aren’t just in the answers—they’re in the thinking
Generative AI tools are often discussed as if they were interchangeable—different interfaces delivering broadly similar outputs. However, when applied to complex intellectual tasks, meaningful differences begin to emerge.
To explore this, I ran the same academically rigorous prompt through four leading systems—Claude, ChatGPT, Google Gemini, and Copilot. The task required a full thematic analysis of a researcher’s career using the framework developed by Virginia Braun and Victoria Clarke.
What followed was not simply variation in output, but variation in how each system approached the act of analysis itself.
Same Input, Different Interpretations
At a high level, the experiment is simple:
One prompt → Four models → Four distinct approaches
What changes is not the instruction, but how each system:
- Interprets the task
- Handles uncertainty
- Applies methodology
- Defines what “good analysis” looks like
Experiment Overview
Task
Conduct a full thematic analysis of an academic career my Google Scholar profile using Braun & Clarke’s method
Models Tested
- Claude
- ChatGPT
- Google Gemini
- Copilot
Objective
Compare how each system interprets and executes the same analytical task
Four Models, Four Approaches
Claude — Methodical and Process-Driven
Claude takes a structured, research-oriented approach. It prioritises clarification, explicitly defines assumptions, and follows a clear analytical sequence. Did use the rofile but also other sources.
Notable pattern: strong emphasis on methodological discipline and balanced thematic output. It also roduce a downloadable document.
ChatGPT — Balanced and Interpretive
ChatGPT proceeds with minimal clarification, making reasonable assumptions and focusing on producing a clear, coherent synthesis. Couldn't access Google Scholar so went to use a number of sources.
Notable pattern: strong narrative flow and accessibility without losing analytical depth.
Gemini — Conceptual but Variable
Gemini initially misidentified the researcher, then corrected itself and completed the analysis. Its final output leans more toward conceptual and philosophical interpretation. Did access Google Scholar.
Notable pattern: generates abstract insights, but shows variability in grounding.
Copilot — Formal and Comprehensive
Copilot follows Braun & Clarke closely, explicitly labelling each phase and producing a detailed, academic-style report.Unlike the others it did not say it was using the rofile but have to assume it has.
Notable pattern: high transparency and completeness, with extensive methodological detail.
Comparison Table 1: Behavioural Differences
| Dimension | Claude | ChatGPT | Gemini | Copilot |
|---|---|---|---|---|
| Clarification behaviour | High | Moderate | Moderate | Very high |
| Handling uncertainty | Cautious | Assumptive but controlled | Variable | Highly cautious |
| Method adherence | Strong | Flexible | Conceptual | Very strict |
| Output length | Medium | Medium | Medium | Long |
Comparison Table 2: Thinking Style
| Model | Dominant Style | Emphasis | Less Emphasised |
|---|---|---|---|
| Claude | Procedural | Method and structure | Speed and brevity |
| ChatGPT | Interpretive | Clarity and synthesis | Formal process detail |
| Gemini | Conceptual | Abstract insight | Consistent grounding |
| Copilot | Formal | Completeness and rigour | Concision |
Comparison Table 3: Alignment by Use Case
| Use Case | Alignment | Why |
|---|---|---|
| Academic reporting | Copilot | Explicit methodological structure |
| Research design | Claude | Strong process discipline |
| Communication | ChatGPT | Clear and accessible synthesis |
| Conceptual exploration | Gemini | Emphasis on abstract interpretation |
What This Means in Practice
A few patterns emerge from this comparison:
- Different models frame problems differently, even with identical prompts
- Structured tasks benefit from models that foreground process
- Communication-focused outputs benefit from clarity and synthesis
- Conceptual exploration can surface deeper insights—but may require validation
These are tendencies rather than rules, but they are consistent.
The Common Prompt Used
From this profile https://scholar.google.co.uk/citations?user=ghQedZAAAAAJ&hl=en carryout a thematic analysis. Before doing the analysis please ask any question you need, then do the analysis, please CONSTRAINTS. Please use the steps in Baun and Clarke (V. Braun, V. Clarke Using thematic analysis in psychology Qual. Res. Psychol., 3 (2) (2006), pp. 77-101, 10.1191/1478088706qp063oa ) to do this.
CONSTRAINTS
Scope: Whole career
Depth of Data: All available data including publication title, abstract, etc.
Analytical lens: No specific research question
Target themes: Look for semantic themes and then a separate analysis of Latent themes
Final Reflection
The key takeaway is not which model is “best,” but that each embodies a different interpretation of what it means to analyse.
Understanding those differences is what turns generative AI from a tool into a capability.
Personally, I liked the results that Claude.ai produced. The report generated was more in line with what I was hoping for. All did the job, though not all used the Google Scholar profile.

No comments:
Post a Comment