Skip to content

[WIP] Attempt Wikidata POIs#580

Draft
migurski wants to merge 18 commits intomainfrom
migurski/attempt-wikidata-pois
Draft

[WIP] Attempt Wikidata POIs#580
migurski wants to merge 18 commits intomainfrom
migurski/attempt-wikidata-pois

Conversation

@migurski
Copy link
Copy Markdown
Collaborator

@migurski migurski commented Mar 13, 2026

migurski and others added 13 commits March 12, 2026 14:44
Add Overture `theme=base / type=infrastructure / subtype=airport` runway and
taxiway linestrings to the roads layer, matching OSM parity where
`aeroway=runway` appears with `kind=aeroway`, `kind_detail=runway`, `min_zoom=9`.

- Add `overtureAerowayKindsIndex` mapping Overture class=runway/taxiway/taxilane
  to kind=aeroway with appropriate kind_detail values
- Extend `processOverture()` with a new branch for base/infrastructure features,
  emitting line geometries at min_zoom=9 (runway) or min_zoom=10 (taxiway)
- Add two Overture unit tests: kind_aeroway_fromRunwayClass and
  kind_aeroway_fromTaxiwayClass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix populationFallback in processOverture: use 0 when population > 0,
  so cities/towns with real population data no longer get forced into
  fallback zoom levels (was hardcoded to 1, causing city minZoom=8
  instead of 7)
- Add wikidata lookup block to processOverture mirroring the OSM path,
  so entries in places.csv (Q62, Q16553, Q169943, etc.) now override
  minZoom and populationRank for Overture locality features
- Output wikidata attribute on Overture features that have it
- Add failing-first tests for SF (Q62→minZoom=2), San Jose (Q16553→4),
  San Mateo (Q169943→6), Saratoga (Q927163→7, pop present no fallback)
- Update testOaklandCity: min_zoom 9→8 (population present, no fallback)
- Use real Overture UUIDs from Oakland-visualtests.parquet in all new tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…verture data

Summarize your findings up to this point in WIKIDATA.md and commit it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndings

Summarize this exploration into WIKIDATA.md and commit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istic wins

Compare and contrast your two proposed disambiguation approaches; Try that combined approach

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…untime lookup chain

Update WIKIDATA.md with these findings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Propose a script that will generate a fresh copy of wikidata-website-qid.csv.gz when it is run on a schedule; yes, and if it does update WIKIDATA.md and commit both

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hold on, let's have scripts live here in tiles/ and resulting data live under data/sources/ with others

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add WebsiteQidDb: domain→QID lookup parsed from a gzipped CSV
(wikidata-website-qid-2026-03.csv.gz). Overture places features have no
native wikidata field, but often carry websites URLs. This enables a
two-hop lookup: websites[0] → domain → Q-ID → QRank score → min_zoom.

- WebsiteQidDb.java: HashMap<String,Long> backed, fromCsv uses
  lastIndexOf(',') to handle domain values containing commas; getQid()
  strips protocol/www/path before lookup
- Basemap.java: download + load websiteQidDb after qrankDb; pass to Pois
- Pois.java: add websiteQidDb field; fallback website→QID lookup in
  processOverture when wikidata tag is absent; add zoo/college/museum
  qrankGrading entries; recalibrate aerodrome/university thresholds so
  Oakland Airport→zoom 11, Oakland Zoo→zoom 12, UCB→zoom 13, OMCA→zoom 14
- Tests: WebsiteQidDbTest (9 tests), 4 new PoisOvertureTest cases with
  real Overture UUIDs (f66024a2 airport, a74a40ae zoo, 67e4f788 UCB,
  474b271e OMCA), LayerTest fixture expanded with all four Q-IDs

Prompt: "Implement the following plan: WebsiteQidDb + QRank-based
Overture POI Zoom [...] when you add unit tests concerning Overture
features, always include their full UUID so we can trace them back to
the original dataset [...] just use CLI duckdb, we already have it"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spotless reformatted the markdown table during make lint; committed
separately since it was missed from the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ence

Two guards prevent brand websites from inflating POI zoom levels:

1. Category allowlist: only apply website→QID when basic_category is an
   institution-level feature (airport, zoo, museum, college_university,
   etc.). Excludes air_transport_facility_service, travel_service,
   transportation_location, etc. where the website resolves to a brand
   entity (e.g. jetblue.com → Q161086 JetBlue Airways) rather than the
   specific place.

2. Confidence threshold (0.9): low-confidence features are often brand
   counters or services miscategorised as the institution. Real airports,
   zoos, etc. cluster at 0.90+; junk like JetBlue-as-airport appears at
   0.32.

Tests: websiteQid_ineligibleCategory_noEarlyZoom (category guard) and
websiteQid_lowConfidence_noEarlyZoom (confidence guard), both using real
Overture UUID e67dea74 / 8b6a937e for JetBlue features at OAK.

Prompt: "Do option B [...] Comment about why they are eligible in the
code [...] and test [...] I still see JetBlue appearing at z12 or even
z11, why? [...] good yes and test"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop features below confidence 0.65 (junk tier: ~127k features dominated
by real estate listings, beauty salons, ATMs from uncertain sources).
Within the remaining features, use confidence to break sort key ties so
higher-confidence POIs win label collision resolution at the same zoom.

Sort key: minZoom * 1000 - (int)(confidence * 100), so confidence=0.99
scores 99 points lower (higher priority) than confidence=0.65.

Tests updated: websiteQid_ineligibleCategory_dropped and
websiteQid_lowConfidence_dropped now correctly expect zero features.
kind_nationalPark_fromBasicCategory switched to Pinnacles National Park
(4d619bc0, confidence=0.917) since the previous Alcatraz fixture
(814b8a78, confidence=0.639) falls below the new cutoff.

Prompt: "Let's bring more Overture confidence into POI rendering: make
higher-confidence POIs higher rendering priority, and simply omit ones
below 0.65 (junk tier)"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@migurski migurski self-assigned this Mar 13, 2026
migurski and others added 2 commits March 12, 2026 18:34
Drop features below confidence 0.65 (junk tier: ~127k features dominated
by real estate listings, beauty salons, ATMs from uncertain sources).
Within the remaining features, use confidence to break sort key ties so
higher-confidence POIs win label collision resolution at the same zoom.

Sort key: minZoom * 1000 - (int)(confidence * 100), so confidence=0.99
scores 99 points lower (higher priority) than confidence=0.65.

Tests updated: websiteQid_ineligibleCategory_dropped and
websiteQid_lowConfidence_dropped now correctly expect zero features.
kind_nationalPark_fromBasicCategory switched to Pinnacles National Park
(4d619bc0, confidence=0.917) since the previous Alcatraz fixture
(814b8a78, confidence=0.639) falls below the new cutoff.

Prompt: "Let's bring more Overture confidence into POI rendering: make
higher-confidence POIs higher rendering priority, and simply omit ones
below 0.65 (junk tier)"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@migurski migurski changed the title Attempt Wikidata POIs [WIP] Attempt Wikidata POIs Mar 13, 2026
migurski and others added 2 commits March 12, 2026 18:51
… conflicts

Kept HEAD (full WebsiteQidDb machinery) in Pois.java; the only conflict
was a trivial comment difference on the QRank block. In PoisTest.java,
kept HEAD's full test suite (both JetBlue drop tests + all four
website→QID tests) over the cherry-pick's slimmed-down version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
….java

Let's turn the repetition there into a new private method called getZoomsPops that results in minZoom, populationRank, etc. assigned. Does getZoomsPops() need both sf and sf2 args, or can it get by with just sf2?

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Base automatically changed from migurski/continue-overture to main March 31, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant