All notes
/// Notes

Our 13-category UK scam taxonomy

How CyberShield classifies fraud into thirteen fixed categories, why a closed taxonomy beats free-text labels, and where UK Finance's headline numbers map onto our bucket list.

·7 min read·Navdeep Singh
engineeringcybershieldtaxonomyfraudai

CyberShield is our UK scam-check product. A user pastes a link, a screenshot, a text message or a voice clip, and gets back a verdict in seconds: is this a scam, what kind of scam, what to do next. The interesting engineering decision sits underneath that: the verdict is not free text. Every output is mapped to exactly one of thirteen categories.

This note is about why that taxonomy exists, what the thirteen buckets are, where they come from, and why we will resist the temptation to add a fourteenth without a very good reason.

Why a fixed taxonomy beats free-text labels

The first version of CyberShield used the obvious approach: ask the LLM to describe the scam in its own words. The output was useful for a single user reading their own result. It was useless for everything else we cared about.

You cannot aggregate over free text. We could not tell a council partner "impersonation scams are up 40% in your postcode this month" because the model would write "HMRC impersonation" one day and "fake tax-office message" the next. We could not route the case to the right safeguarding partner because the labels were not stable. We could not train evaluators because there was no ground truth to evaluate against. The free-text approach made the product feel smart while making it operationally dumb.

The fix is the same one taxonomists have used for a hundred years: pick a closed set, write down the inclusion criteria, train the classifier to commit. The categories may be imperfect, but the system above them suddenly works.

The thirteen categories

These are the buckets a CyberShield verdict can land in. They are written into the model prompt, the database enum, the analytics dashboard and the partner reports. Changing one of them is a migration.

1. Impersonation (gov / brand)

Messages claiming to be HMRC, DVLA, Royal Mail, Companies House, NHS, a bank, a delivery firm, or a household brand. The single largest bucket by volume and the easiest to spot programmatically. Indicators: spoofed sender domain, urgent payment language, plausible-but-wrong URLs, asks to "verify" details.

2. Smishing (SMS phishing)

Phishing delivered specifically by SMS or RCS. We split this from generic impersonation because the medium materially changes the defence advice (block the number, forward to 7726, do not tap any link). Smishing is now the dominant attack vector hitting UK consumers.

3. Romance / relationship

Long-running social-engineering attacks where the attacker builds a fake relationship before requesting money. Distinguished from "investment" (below) by the emotional vector. The mean loss per victim is the highest of any UK fraud category — Action Fraud reports five-figure losses are common.

4. Investment / crypto

Cold-pitched investment opportunities, "guaranteed" returns, fake crypto exchanges, recovery-room scams (where a second attacker pretends to recover money lost in the first). Frequently uses cloned celebrity endorsements.

5. Marketplace / classified

Fraud on eBay, Facebook Marketplace, Gumtree, Vinted, Depop. Includes the buyer-asks-to-pay-outside-platform pattern and the seller-ships-empty-box pattern. Distinguished from impersonation because the platform itself is real.

6. Job / recruitment

Fake job offers, advance-fee scams ("pay £80 for your DBS check before we hire you"), task-platform scams, and mule recruitment ("we'll pay you to receive transfers"). Rising fast post-pandemic. The mule-recruitment sub-type is criminal on the victim's side, which makes the safeguarding response different.

7. Tech support

Pop-ups, cold calls, or remote-access requests claiming the user's computer is infected. Often impersonates Microsoft, Apple or a UK ISP. Distinguished from generic impersonation because the resolution involves immediate device-level safety advice (disconnect, scan, change passwords).

8. Authorised push payment (APP) — bank impersonation

The fraud type that triggered the UK's mandatory PSR reimbursement scheme. A criminal calls or texts pretending to be the victim's bank, claims the account is compromised, and walks the victim through "moving the money to a safe account." This is its own category because reimbursement law treats it differently.

9. Subscription / continuity

Hidden recurring charges, fake free-trial offers, undisclosed renewal terms. Less existentially threatening than APP fraud but the highest-volume consumer complaint to Trading Standards.

10. AI-cloned voice or video

Synthetic media used to impersonate a real person — typically a family member ("Mum, I've lost my phone, can you send money to this number?") or a CEO authorising an unusual transfer. We split this out because the indicators are different (voice artefacts, lip-sync glitches, unusual phrasing) and the advice to victims is different (verify on a second channel before acting).

11. Charity / disaster

Fake fundraisers exploiting current events — earthquakes, conflicts, NHS appeals, food-bank impersonators. Spikes around news cycles.

12. Ticket / event

Fake event tickets, gigs, sporting fixtures. Lloyds Banking Group's annual fraud report flags this as the single fastest-growing category for under-25s. Sub-type: secondary-market exit scams where a real listing is taken over by a criminal.

13. Unknown / mixed

The honest bucket. The verdict is "looks suspicious but does not fit cleanly above." We chose to make this an explicit category, rather than a silent fall-through, so that:

  • We can surface the count to ourselves and watch it. A growing "unknown" bucket is the signal that we need a new category.
  • Partners reading reports see "12 unknowns this month" instead of an implausibly tidy distribution.
  • The user gets honest copy: "This doesn't match a pattern we recognise. Treat it with caution and report to Action Fraud."

Where this lines up with UK Finance and Action Fraud

The UK Finance Annual Fraud Report (the most recent edition published in May 2024 covering 2023 data) uses a slightly different vocabulary — they split "authorised" and "unauthorised" at the top level, then drill into APP sub-types: purchase, investment, romance, impersonation (police / bank / other), advance fee, invoice / mandate, CEO fraud. Action Fraud's NFIB classification is broader and includes types we do not see in consumer flow (insurance fraud, application fraud).

Our taxonomy is deliberately consumer-facing. We do not have an "invoice / mandate" category, because the messages we see do not include B2B accounts-payable attacks. We do have "smishing" as a top-level category, because the defence advice is medium-specific in a way UK Finance's report does not need to model. We deliberately do not split "investment" from "crypto", because the advice and indicators have converged in 2025–2026 — every investment scam we have seen this year has had a crypto wrapper somewhere in it.

When we share data with a council or charity partner, we always provide the mapping document next to the figures. Knowing your categories don't line up perfectly with the published baseline is fine. Pretending they do is not.

When we will add a fourteenth category

The bar is high on purpose. A new category requires:

  1. At least 200 verdicts in the "unknown" bucket that all share the new shape. Not 200 we think fit — 200 we've manually reviewed and confirmed.
  2. A distinct defence advice. If we cannot write different copy for the user, the category does not pay rent.
  3. A migration of the enum + the dashboards + the partner reports + the training prompts. It is a real piece of work, on purpose, so we don't reach for a new label every time the news cycle changes.

We expect, on current trajectory, to add an "AI-text impersonation" category within twelve months as LLM-generated phishing emails start to look meaningfully different from human-written ones. We are watching the "unknown" bucket for that signal now.

What you can borrow

If you are building any kind of classifier — fraud, safeguarding, content moderation, support-ticket triage — three things travel:

  • Commit to a closed set early. It will feel restrictive. It will save you later.
  • Make "unknown" an explicit category, not a fall-through. Watch its size. It is your most honest metric.
  • Write down the inclusion criteria. Not "what does the model think this is" — what would you call it if you were classifying by hand. Then evaluate the model against that.

Taxonomies are unglamorous engineering. They are also the difference between a clever demo and a system you can run a real partnership on.

/// Subscribe

New notes, when we publish.

No release announcements. Roughly monthly.

/// Got a project?

Build it with the
person who wrote this.