Home/

Signal Methodology

HantaData is committed to transparency. This page explains exactly how signal counts are generated, which sources feed the system, and the limitations you should be aware of.

What is a "Signal"?

A signal is one news article or health report that (a) mentions at least one hantavirus keyword and (b) can be attributed to a specific country based on its text content.

Signal counts are not case counts. One article about 50 cases still counts as one signal. Ten articles about the same single case count as up to ten signals (after deduplication by title).

Important limitation: High signal counts indicate high media attention — not necessarily high disease burden. A high-profile outbreak may generate 50 articles about 3 cases. A large but under-reported outbreak may generate 5 articles about 200 cases.

Data Pipeline

1

Fetch RSS feeds

29 sources fetched in parallel with 8-second timeout per feed

2

Keyword filter

Each article title + description checked against 19 hantavirus keywords

3

Country detection

30+ country regex patterns matched against article text

4

Deduplication

Articles with identical or near-identical titles are merged

5

Signal counting

One matched article = one signal for the detected country

6

Cache + serve

Results cached for 30 minutes in /tmp; served via API

Data Sources 29 feeds total

SourceCount
Google News RSS (targeted queries)19
ProMED-mail1
WHO News Releases1
Reuters Health1
CDC RSS1
HealthMap1
BBC Health1
CNN Health1
NYT Health1
The Guardian Health1
AP News Health1

Hantavirus Keywords

An article must contain at least one of these terms to be counted as a signal:

hantavirushantaanpuumalaandes virussin nombreseoul virushfrshemorrhagic fever with renal syndromenephropathia epidemicahantavirus pulmonaryhantavirosehantaviraldobravajuquitibalaguna negrachoclobank vole fevernephropathiarodent-borne hemorrhagic

Country Detection

Country attribution uses regular expression matching against the full article text (title + description). 30+ country patterns are checked, with careful attention to ambiguous terms (e.g., “Chile pepper” is excluded from Chile detection).

A single article can be attributed to multiple countries if it mentions more than one geographic region. This is intentional — a comparative study citing Argentina and Chile generates signals for both.

Articles with no country match are discarded and do not count toward any signal total.

Update Cycle & Caching

Data refresh

Every 10 minutes (frontend auto-polls)

Cache TTL

30 minutes (stale-while-revalidate)

Feed timeout

8 seconds per source

Cache storage

/tmp directory (Vercel serverless)

Deduplication window

Within a single fetch cycle by title

Historical data

Not retained — current cycle only

Citation Format

For academic / journalistic use

HantaData (2026). Hantavirus Signal Tracker [Data set]. Retrieved May 10, 2026, from https://hantadata.com/api/fetch-hanta

Please note any use of HantaData signals in published work with the above citation and a disclaimer that signals represent media mentions, not confirmed case counts.