pyscan
Sampling for Region-Aggregated Spatial Scan Statistics
Spatial scan statistics are a core tool for anomaly detection in geospatial data — locating regions where a measured quantity (disease cases, crime, and so on) is significantly elevated relative to a baseline. The most efficient scan algorithms operate on point data, but real-world data is usually aggregated into predefined regions such as census tracts, zip codes, or counties. The standard workaround, used by widely adopted tools like SaTScan, collapses each region to its centroid — convenient, but it discards the region’s spatial extent and substantially reduces statistical power.
This work proposes a simple, scalable alternative: replace each region with 20–50 points sampled uniformly from its geometry, spreading the region’s baseline and measured values evenly across them (Geom k). It preserves the region’s spatial structure while staying fully compatible with fast point-based scan algorithms, and pairs with pyScan’s C++ backend and adaptive gridding so that even a 50× increase in points adds little runtime.
A convergence analysis shows the recovery error shrinks proportionally to 1/√k, and — perhaps surprisingly — that as a map is divided into more regions, fewer sample points per region are needed. Across six datasets (NYC zip codes and the counties of Arkansas, Utah, California, Georgia, and the continental U.S.), the method recovers planted anomalies at far smaller effect sizes than the centroid baseline, while running orders of magnitude faster than connected-region methods like FlexScan.
On a real public-health dataset — county-level Valley Fever incidence in California — it recovers the known San Joaquin Valley endemic region much more accurately than the centroid approach, approaching the best overlap an axis-aligned rectangle can achieve.
We recommend this sampling-based conversion as the default way to apply point-based spatial scan statistics to region-aggregated data.
Authors: Foad Namjoo, Drew McClelland, Michael Matheny, Jeff M. Phillips Library: pyScan · Paper: under review — arXiv link coming soon