Skip to content

Buyer Beware: How AI Got My Simplest Task Wrong

A simple country list became five spreadsheet versions, two AI tools, missing entries and confidently wrong explanations. A practical reminder to check AI’s homework before it feeds into reports, strategy decks or executive decisions.

4 min read
Comic-style illustration of a stressed analyst caught between ChatGPT and Gemini as conflicting AI outputs turn a simple country list into a data-quality problem.

Table of Contents

How many countries are there in the World?

Before getting into proper market analysis, I wanted a basic country list. Nothing sophisticated. Just country name, continent, population, and a short comment. The kind of simple dataset you build before doing the more interesting work.

In my head, this was admin. Ask the tools, get the list, clean it up, move on. Luckily, I was using both Gemini and ChatGPT to cross-check the work. Even when using thinking or reasoning modes, I now lean on validation for important data-based tasks. If the tools do not agree with each other, that is usually a sign to slow down and check the AI’s homework. Many people do not.

A 2025 global study by KPMG and the University of Melbourne found that 66% of people rely on AI output at work without evaluating its accuracy. The study covered more than 48,000 people across 47 countries, which makes it hard to dismiss as a niche problem. (KPMG)

That is why this bothered me. If AI outputs are already feeding into workplace reports, strategy decks, management updates, and executive decisions, blind trust becomes a real risk. What started as a boring country list turned into five versions of a spreadsheet, two AI tools giving different answers, missing entries, inconsistent definitions, and one confidently wrong explanation that sounded completely plausible.


Why AI gave me 232 countries instead of 195

Comic-style illustration showing a 232-count broad AI country list clashing with a 195-count strict country list, highlighting how scope definitions can create data errors.

If you Google how many countries there are, the answer usually comes back as 195. That is the familiar number: 193 United Nations member states, plus two observer states, the Holy See and Palestine.

That is broadly where Gemini started. ChatGPT, meanwhile, gave me 232. At first, that felt ridiculous. How can a basic country list be nearly 40 entries apart?

The reason was not random. ChatGPT had quietly used a broader “countries and territories” style definition, including dependencies, overseas territories, and special jurisdictions. Gemini was closer to a sovereign-country framing. Neither tool clearly stopped at the start and asked which definition I actually wanted.

That was the first problem. The answer looked factual, but the definition underneath was different.

How definitions and naming conventions created false gaps

Once I compared the outputs, it looked like countries were missing. Some were. ChatGPT left out Taiwan in one version, which may be explainable in a narrow diplomatic dataset, but makes no sense for a practical country or market analysis.

Many of the other gaps were not missing countries at all. They were naming conventions. One list used official names, another used recognisable English names. One used abbreviations, another used full names. Sometimes accents or brackets made the same place look different.

A human can usually see through that. A spreadsheet cannot. Unless you build the logic in, two versions of the same place become different entries. That is how a naming issue becomes a data issue.

How AI gave the wrong answer when challenged

The strangest moment came near the end, when there was still a one-row difference between the lists. Gemini explained that the difference was caused by Somaliland.

At first glance, that sounded plausible. Somaliland is exactly the kind of awkward geopolitical edge case that could confuse a country list. It has real-world political complexity, but is not widely recognised as a sovereign state.

Except it was not the issue. It was not the cause of the mismatch in the spreadsheet.

That was the real warning. AI is useful, but it can become dangerous when it sounds certain and we trust it blindly. The tool had not just made a mistake. It had produced a clean explanation for the wrong problem. It had the right tone, the right shape, and the right confidence, but it did not match the data.

How I built a cleaner 209-country master list

The useful version only emerged once the logic was made explicit. I started with the familiar 195-country baseline, then added a small number of practical exceptions that mattered for the analysis, including Taiwan and Kosovo, plus selected territories and special jurisdictions such as Hong Kong and Puerto Rico.

At the same time, I removed the long tail of smaller territories and edge cases that inflated the earlier ChatGPT list to 232. Guernsey and Jersey were bundled into the UK for this stage. Smaller places with limited relevance were left out. The naming was standardised into practical English so the list would be readable and reusable.

That gave me a master list of around 209 entries. Not 195, because I did not want a strict UN-style diplomatic list. Not 232, because I did not want every dependency or special area. Something in the middle: practical, readable, and fit for purpose.

What this taught me about checking AI outputs

The lesson was not that AI is bad. It was that AI needs boundaries.

If the definition is vague, it will often fill in the gaps for you. Sometimes it will mix scopes. Sometimes it will miss something obvious. Sometimes it will explain an error with a confident answer that sounds right but does not match the data.

That is why the boring first step matters. A wrong list becomes a wrong lookup. A wrong lookup becomes a wrong total. A wrong total becomes a wrong market view. By the time the final output looks polished, the error can be buried several layers deep.

The better approach is not to stop using AI. It is to stop treating the first answer as the final answer. Check the AI’s homework. Ask what definition it used. Ask what it excluded. Ask why the counts differ. Check the comparison. Force the logic into the open.

Because if a country list can drift from 195 to 232 without the issue being clearly explained, imagine what happens when the same habits are applied to a strategy paper, a market forecast, or a board presentation.


View Full Page