[WIP] Attempt Wikidata POIs by migurski · Pull Request #580 · protomaps/basemaps

migurski · 2026-03-13T01:33:48Z

Wait for Continue Overture support: POIs, Places, Airports #579

Add Overture `theme=base / type=infrastructure / subtype=airport` runway and taxiway linestrings to the roads layer, matching OSM parity where `aeroway=runway` appears with `kind=aeroway`, `kind_detail=runway`, `min_zoom=9`. - Add `overtureAerowayKindsIndex` mapping Overture class=runway/taxiway/taxilane to kind=aeroway with appropriate kind_detail values - Extend `processOverture()` with a new branch for base/infrastructure features, emitting line geometries at min_zoom=9 (runway) or min_zoom=10 (taxiway) - Add two Overture unit tests: kind_aeroway_fromRunwayClass and kind_aeroway_fromTaxiwayClass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix populationFallback in processOverture: use 0 when population > 0, so cities/towns with real population data no longer get forced into fallback zoom levels (was hardcoded to 1, causing city minZoom=8 instead of 7) - Add wikidata lookup block to processOverture mirroring the OSM path, so entries in places.csv (Q62, Q16553, Q169943, etc.) now override minZoom and populationRank for Overture locality features - Output wikidata attribute on Overture features that have it - Add failing-first tests for SF (Q62→minZoom=2), San Jose (Q16553→4), San Mateo (Q169943→6), Saratoga (Q927163→7, pop present no fallback) - Update testOaklandCity: min_zoom 9→8 (population present, no fallback) - Use real Overture UUIDs from Oakland-visualtests.parquet in all new tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…verture data Summarize your findings up to this point in WIKIDATA.md and commit it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ndings Summarize this exploration into WIKIDATA.md and commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…istic wins Compare and contrast your two proposed disambiguation approaches; Try that combined approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…untime lookup chain Update WIKIDATA.md with these findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Propose a script that will generate a fresh copy of wikidata-website-qid.csv.gz when it is run on a schedule; yes, and if it does update WIKIDATA.md and commit both Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hold on, let's have scripts live here in tiles/ and resulting data live under data/sources/ with others Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add WebsiteQidDb: domain→QID lookup parsed from a gzipped CSV (wikidata-website-qid-2026-03.csv.gz). Overture places features have no native wikidata field, but often carry websites URLs. This enables a two-hop lookup: websites[0] → domain → Q-ID → QRank score → min_zoom. - WebsiteQidDb.java: HashMap<String,Long> backed, fromCsv uses lastIndexOf(',') to handle domain values containing commas; getQid() strips protocol/www/path before lookup - Basemap.java: download + load websiteQidDb after qrankDb; pass to Pois - Pois.java: add websiteQidDb field; fallback website→QID lookup in processOverture when wikidata tag is absent; add zoo/college/museum qrankGrading entries; recalibrate aerodrome/university thresholds so Oakland Airport→zoom 11, Oakland Zoo→zoom 12, UCB→zoom 13, OMCA→zoom 14 - Tests: WebsiteQidDbTest (9 tests), 4 new PoisOvertureTest cases with real Overture UUIDs (f66024a2 airport, a74a40ae zoo, 67e4f788 UCB, 474b271e OMCA), LayerTest fixture expanded with all four Q-IDs Prompt: "Implement the following plan: WebsiteQidDb + QRank-based Overture POI Zoom [...] when you add unit tests concerning Overture features, always include their full UUID so we can trace them back to the original dataset [...] just use CLI duckdb, we already have it" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Spotless reformatted the markdown table during make lint; committed separately since it was missed from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ence Two guards prevent brand websites from inflating POI zoom levels: 1. Category allowlist: only apply website→QID when basic_category is an institution-level feature (airport, zoo, museum, college_university, etc.). Excludes air_transport_facility_service, travel_service, transportation_location, etc. where the website resolves to a brand entity (e.g. jetblue.com → Q161086 JetBlue Airways) rather than the specific place. 2. Confidence threshold (0.9): low-confidence features are often brand counters or services miscategorised as the institution. Real airports, zoos, etc. cluster at 0.90+; junk like JetBlue-as-airport appears at 0.32. Tests: websiteQid_ineligibleCategory_noEarlyZoom (category guard) and websiteQid_lowConfidence_noEarlyZoom (confidence guard), both using real Overture UUID e67dea74 / 8b6a937e for JetBlue features at OAK. Prompt: "Do option B [...] Comment about why they are eligible in the code [...] and test [...] I still see JetBlue appearing at z12 or even z11, why? [...] good yes and test" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Drop features below confidence 0.65 (junk tier: ~127k features dominated by real estate listings, beauty salons, ATMs from uncertain sources). Within the remaining features, use confidence to break sort key ties so higher-confidence POIs win label collision resolution at the same zoom. Sort key: minZoom * 1000 - (int)(confidence * 100), so confidence=0.99 scores 99 points lower (higher priority) than confidence=0.65. Tests updated: websiteQid_ineligibleCategory_dropped and websiteQid_lowConfidence_dropped now correctly expect zero features. kind_nationalPark_fromBasicCategory switched to Pinnacles National Park (4d619bc0, confidence=0.917) since the previous Alcatraz fixture (814b8a78, confidence=0.639) falls below the new cutoff. Prompt: "Let's bring more Overture confidence into POI rendering: make higher-confidence POIs higher rendering priority, and simply omit ones below 0.65 (junk tier)" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… conflicts Kept HEAD (full WebsiteQidDb machinery) in Pois.java; the only conflict was a trivial comment difference on the QRank block. In PoisTest.java, kept HEAD's full test suite (both JetBlue drop tests + all four website→QID tests) over the cherry-pick's slimmed-down version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

….java Let's turn the repetition there into a new private method called getZoomsPops that results in minZoom, populationRank, etc. assigned. Does getZoomsPops() need both sf and sf2 args, or can it get by with just sf2? Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…/attempt-wikidata-pois

migurski and others added 13 commits March 12, 2026 14:44

Added some Oakland-specific visual test areas

9449593

Add WIKIDATA.md summarizing findings on Wikidata ID availability in O…

bf2aace

…verture data Summarize your findings up to this point in WIKIDATA.md and commit it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update WIKIDATA.md with QLever bulk export approach and match rate fi…

c0c6640

…ndings Summarize this exploration into WIKIDATA.md and commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update WIKIDATA.md with disambiguation findings; lowest Q-number heur…

423144d

…istic wins Compare and contrast your two proposed disambiguation approaches; Try that combined approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update WIKIDATA.md with integration design: distribution format and r…

9ec5166

…untime lookup chain Update WIKIDATA.md with these findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add generate-wikidata-website-qid.sh; update WIKIDATA.md

cbcef95

Propose a script that will generate a fresh copy of wikidata-website-qid.csv.gz when it is run on a schedule; yes, and if it does update WIKIDATA.md and commit both Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move generate-wikidata-website-qid.sh to tiles/; output to data/sources/

74edd5c

Hold on, let's have scripts live here in tiles/ and resulting data live under data/sources/ with others Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reformat WIKIDATA.md table (Spotless)

099c3d0

Spotless reformatted the markdown table during make lint; committed separately since it was missed from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

migurski self-assigned this Mar 13, 2026

migurski and others added 2 commits March 12, 2026 18:34

Linted and change-logged

0ef1b8a

migurski changed the title ~~Attempt Wikidata POIs~~ [WIP] Attempt Wikidata POIs Mar 13, 2026

migurski and others added 2 commits March 12, 2026 18:51

Base automatically changed from migurski/continue-overture to main March 31, 2026 18:54

Merge commit 'b2072f5aa7a38dc8282249b4a01e9c4c8d0dc129' into migurski…

4c21d4a

…/attempt-wikidata-pois

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Attempt Wikidata POIs#580

[WIP] Attempt Wikidata POIs#580
migurski wants to merge 18 commits intomainfrom
migurski/attempt-wikidata-pois

migurski commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

migurski commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

migurski commented Mar 13, 2026 •

edited

Loading