Methodology — how Newsylist measures trends

1 · Detection

Every hour Newsylist collects the current headlines from public Google News category feeds (World, Business, Technology, Science, Health, Sports, Entertainment) and a curated set of publisher RSS feeds (BBC, NYT, Ars Technica, The Verge, ESPN, Variety and more) blended into the same clustering pass for broader, faster signal. Headlines are normalized and clustered: two headlines belong to the same story when their significant-word overlap (Jaccard similarity) reaches 50%, or their character-level similarity reaches 62%. A cluster becomes a trend — and earns a permanent page — only when at least 4 independent news sources are covering it. This threshold is our quality gate: Newsylist does not create pages for single-source stories.

2 · Velocity

Velocity measures how fast a story is spreading: velocity = (articles in last 6h + 0.25 × total articles) × √(distinct sources). Source diversity is weighted because ten newsrooms covering a story independently means more than one newsroom publishing ten articles. Velocity is snapshotted hourly; the charts on every page are drawn from these snapshots. Newsylist also tracks acceleration — the hour-over-hour change in velocity — and flags a young, broadly-sourced, fast-climbing story as a breakout. The live board is ranked by a composite heat score (time-decayed velocity plus an acceleration kicker), so a fast-emerging story can out-rank an older high-volume one.

3 · Lifecycle

Each trend carries a status, recomputed hourly: rising (velocity climbing), peaking (at or near its maximum), cooling (below half of peak), and archived (no new coverage for 48 hours). Archived pages are never deleted — they become the historical record.

4 · Predictions — and self-grading

Once per day Newsylist predicts, for every live trend, whether it will still be receiving coverage tomorrow (holds) or not (fades). Rather than a single rule, a feature-based model combines source breadth, momentum (velocity vs. its own peak), acceleration, 12-hour source growth, lifecycle state and age into a 0–1 survival score; the trend holds when the score clears 0.5. Each call also carries a stated confidence (low / medium / high) from how far the score sits from that boundary. The following day each prediction is graded against what actually happened, the result is stamped on the trend's page (✓ or ✗, permanently), and the global accuracy figure is updated. Newsylist cannot hide a bad call.

5 · Briefs and the no-invention contract

Trend briefs are written by an AI layer operating under a strict contract: it may use only the facts present in the collected headlines. Inventing numbers, quotes, names or causes is explicitly forbidden; when headlines don't establish a fact, the brief must say so. Every brief is labeled with how it was generated. If no AI provider is available, a deterministic extractive engine builds the brief from coverage metadata alone — pages are never empty and never fabricated.

6 · Recaps, entities & newsrooms

Every day Newsylist writes a short narrative recap of the biggest trends (and a weekly round-up), under the same no-invention contract. It also auto-extracts the people, organizations and places in the news into entity hubs, and ranks publishers on a newsroom leaderboard by how often each was the earliest source on a trend — a public measure of who actually breaks the news first.

7 · Data reuse

All Newsylist metrics may be cited freely with attribution to "Newsylist" and a link. Machine-readable data: https://www.newsylist.com/api/trends.json · AI-assistant orientation: https://www.newsylist.com/llms.txt.