-
Notifications
You must be signed in to change notification settings - Fork 83
Size-Based Geometry Prioritization for Tile Generation #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bertt
wants to merge
64
commits into
master
Choose a base branch
from
md5_implementation
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 51 commits
Commits
Show all changes
64 commits
Select commit
Hold shift + click to select a range
57a23e4
initial coding size based geometry prioritization
bertt 4443711
limit geometries to retrieve
bertt 3ce5544
improve performance
bertt 6cc9ebc
add md5 queries
bertt 8c0d428
move md5 queries
bertt c164c8a
Update md5_queries.md
bertt 8fff1f7
Update md5_queries.md
bertt 5415785
Enhance md5_queries.md with issue and todo sections
bertt 05ab363
Update md5_queries.md
bertt d0debce
Fix formatting issue in md5_queries.md
bertt 28a9333
Propose exception for first tile on z=0
bertt ec867b3
filter where hashes based on tile envelope
bertt c99b726
Document spatial indexing recommendations for MD5 hashes
bertt 4137dac
Fix header formatting and clean up index creation examples
bertt 0b5ae57
Numbered list formatting for query patterns
bertt 05d00bb
Fix heading format in md5_queries.md
bertt c86df8c
Enhance md5_queries.md with performance notes
bertt a76f0b1
Update md5_queries.md
bertt 4d7c892
fix octreetiler
bertt e20e7cf
Merge branch 'md5_implementation' of https://github.com/Geodan/pg2b3d…
bertt a507f9c
refactor writing tile
bertt 2dc043b
Update md5_queries.md
bertt c22f184
Initial plan
Copilot 9a5a388
Replace string concatenation with parameterized queries using ANY ope…
Copilot 96c10af
Update md5_queries.md to reflect parameterized query solution
Copilot b2cdb4d
Improve connection management with try-finally blocks
Copilot 4182510
Update src/b3dm.tileset/SpatialIndexChecker.cs
bertt d082ce3
Merge pull request #247 from Geodan/copilot/sub-pr-244
bertt 4be180c
Update README.md
bertt 80085d2
add md5 to index check
bertt d4a1cd7
fix releative path
bertt b94d6e5
skip redundant check
bertt 3dbcfbd
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt 39d262f
Update src/b3dm.tileset/OctreeTiler.cs
bertt c73accf
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt 88d5f9e
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt b67f67b
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt 47d38ed
update octreetiler
bertt 7748bd4
Update src/b3dm.tileset/SpatialIndexChecker.cs
bertt bf838b7
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt f893dc0
Update src/b3dm.tileset/QuadtreeTiler.cs
bertt 1145436
Update src/b3dm.tileset/OctreeTiler.cs
bertt 1dc997c
fix typo
bertt ea33e06
Initial plan
Copilot 9b0a0cb
remove todo
bertt 03c8865
update readme
bertt 4ea012d
Pass processedGeometries to LOD child tiles to maintain deduplication
Copilot 797ed34
Merge branch 'md5_implementation' into copilot/sub-pr-244-again
bertt 48aa307
Update QuadtreeTiler.cs
bertt 777f8b2
Merge pull request #248 from Geodan/copilot/sub-pr-244-again
bertt f141ec7
remove octree tilehashes
bertt cd0c4bf
update readme
bertt ce0434f
Update src/b3dm.tileset/SpatialIndexChecker.cs
bertt b833424
Merge branch 'md5_implementation' of https://github.com/Geodan/pg2b3d…
bertt cbf1522
Update README.md
bertt 028a19c
improve error handling
bertt fe04bf2
Merge branch 'md5_implementation' of https://github.com/Geodan/pg2b3d…
bertt d5495a4
improve md5 hash index check
bertt 22b9fe3
update solution
bertt 03297e6
update octreetiler
bertt 5a6b4b4
remove todo
bertt 45c749d
add option sortby - area or volume (default area)
bertt 87b3fd2
filter boundingboxes based on original projection (not epsg:4326)
bertt ec0a1b7
fix merge conflicts
bertt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| ## Queries for MD5 | ||
|
|
||
|
|
||
| ### Initial | ||
|
|
||
| 1] Get bounding box whole table (1.9 s) | ||
|
|
||
| ```sql | ||
| SELECT st_xmin(geom1),st_ymin(geom1), st_xmax(geom1), st_ymax(geom1), st_zmin(geom1), st_zmax(geom1) FROM (select st_transform(ST_3DExtent(geom), 4979) as geom1 from bertt.nantes_reconstructed_buildings ) as t | ||
| ``` | ||
| Result: | ||
|
|
||
| ``` | ||
| -1.8471041030488762 47.14626298148698 -1.1473131952502678 47.62268076404559 34.427586472817715 475.03764899302183 | ||
| ``` | ||
|
|
||
| ## Tile 0_0_0.glb | ||
|
|
||
| 2] Count geometries in bounding box (0.2s) | ||
|
|
||
| ```sql | ||
| SELECT COUNT(geom) FROM bertt.nantes_reconstructed_buildings WHERE ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.1473121952502678, 47.62268176404559, 4326), 5698) | ||
| ``` | ||
|
|
||
| Result: 385856 | ||
|
|
||
| 3] Get geometries for tile 0_0_0.glb - 1000 largest geometries in whole table (2 s) | ||
|
|
||
| ```sql | ||
| SELECT ST_AsBinary(st_transform(geom, 4978)), id , MD5(ST_AsBinary(geom)::text) as geom_hash FROM bertt.nantes_reconstructed_buildings where ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.1473121952502678, 47.62268176404559, 4326), 5698) ORDER BY ST_Area(ST_Envelope(geom)) DESC LIMIT 1000 | ||
| ``` | ||
|
|
||
| md5 hashes (for example '9759cdee666f512a0c13df8245b667f9') are remembered to be excluded in higher level (z) tile | ||
|
|
||
| potential improvement: make exception for first tile on z=0 - do not filter on envelope (all features are included) | ||
|
|
||
| ## Tile 1_0_0.glb (level 1, x=0, y=0) | ||
|
|
||
| 4] Filter the hashes from previous list to only geometries within tile 1_0_0 | ||
|
|
||
| ```sql | ||
| SELECT MD5(ST_AsBinary(geom)::text) as geom_hash | ||
| FROM bertt.nantes_reconstructed_buildings | ||
| WHERE MD5(ST_AsBinary(geom)::text) = ANY($1) | ||
| AND ST_Within( | ||
| ST_Centroid(ST_Envelope(geom)), | ||
| ST_Transform(ST_MakeEnvelope($2, $3, $4, $5, 4326), 5698) | ||
| ) | ||
| ``` | ||
|
|
||
| Note: Using parameterized query with array parameter instead of string concatenation. | ||
|
|
||
| 5] Count geometries in bounding box on level 1 excluding the geometries from tile 0_0_0.glb, including only the geometries within the tile | ||
|
|
||
| ```sql | ||
| SELECT COUNT(geom) FROM bertt.nantes_reconstructed_buildings WHERE ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) != ALL($1) | ||
| ``` | ||
|
|
||
| Note: Using parameterized query with array parameter instead of string concatenation. | ||
|
|
||
| Result: 235787 | ||
|
|
||
| 6] Get geometries for tile 1_0_0.glb - 1000 largest geometries in tile 1_0_0 | ||
|
|
||
| ```sql | ||
| SELECT ST_AsBinary(st_transform(geom, 4978)), id , MD5(ST_AsBinary(geom)::text) as geom_hash FROM bertt.nantes_reconstructed_buildings where ST_Centroid(ST_Envelope(geom)) && st_transform(ST_MakeEnvelope(-1.847105103048876, 47.14626198148698, -1.497208649149572, 47.384471872766284, 4326), 5698) AND MD5(ST_AsBinary(geom)::text) != ALL($1) ORDER BY ST_Area(ST_Envelope(geom)) DESC LIMIT 1000 | ||
| ``` | ||
|
|
||
| Note: Using parameterized query with array parameter instead of string concatenation. | ||
|
|
||
| ## Issue | ||
|
|
||
| List of hashes can get long (maximum z*1000 items). Previously this was handled with string concatenation which could lead to performance issues and potential SQL injection vulnerabilities. | ||
|
|
||
| **Solution**: Now using parameterized queries with PostgreSQL's `= ANY()` and `!= ALL()` operators for better performance and security. | ||
|
|
||
| ## Spatial indexing | ||
|
|
||
| Recommended Indexes | ||
|
|
||
| 1. Spatial Index with MD5 Hash (Composite) | ||
|
|
||
| CREATE INDEX idx_geom_centroid_hash ON the_table | ||
| USING btree(MD5(ST_AsBinary(geom_triangle)::text)); | ||
|
|
||
| 2. Spatial Index (GIST) - Still Required | ||
|
|
||
| CREATE INDEX idx_geom_centroid_spatial ON the_table | ||
| USING gist(ST_Centroid(ST_Envelope(geom_triangle))); | ||
|
|
||
| Rationale | ||
|
|
||
| The queries now use three main patterns: | ||
|
|
||
| 1] Spatial filtering with MD5 hash exclusion (GetGeometrySubset): WHERE ST_Centroid(ST_Envelope(geom_triangle)) && <envelope> | ||
| AND MD5(ST_AsBinary(geom_triangle)::text) != ALL($1) | ||
|
|
||
| 2] MD5 hash filtering with spatial validation (FilterHashesByEnvelope): WHERE MD5(ST_AsBinary(geom_triangle)::text) = ANY($1) | ||
| AND ST_Within(ST_Centroid(ST_Envelope(geom_triangle)), <envelope>) | ||
|
|
||
| 3] Hash-only filtering (GetGeometriesBoundingBox): WHERE MD5(ST_AsBinary(geom_triangle)::text) = ANY($1) | ||
|
|
||
| Performance Notes: | ||
|
|
||
| 1] The GIST spatial index handles the ST_Centroid(ST_Envelope(geom_triangle)) predicates | ||
|
|
||
| 2] The MD5 hash BTREE index handles the MD5(ST_AsBinary(geom_triangle)::text) = ANY/!= ALL predicates | ||
|
|
||
| 3] PostgreSQL will use both indexes (bitmap index scan) for queries with both predicates | ||
|
|
||
| 4] Using parameterized queries with ANY/ALL operators provides better performance than string-concatenated IN/NOT IN clauses | ||
|
|
||
| Optional: Materialized Hash Column | ||
|
|
||
| ## Solution | ||
|
|
||
| The hash filtering now uses PostgreSQL's `= ANY(@param)` operator with array parameters instead of string concatenation: | ||
|
|
||
| 1. **Hash Inclusion (IN clause)**: Changed from `MD5(...) IN ('hash1', 'hash2', ...)` to `MD5(...) = ANY(@hashes)` with parameterized array | ||
| 2. **Hash Exclusion (NOT IN clause)**: Changed from `MD5(...) NOT IN ('hash1', 'hash2', ...)` to `MD5(...) != ALL(@excludeHashes)` with parameterized array | ||
|
|
||
| Benefits: | ||
| - Eliminates SQL injection risk (even though MD5 hashes are predictable) | ||
| - Better performance with large hash lists | ||
| - Cleaner, more maintainable code | ||
| - Proper use of parameterized queries | ||
|
|
||
| ## Todo | ||
|
|
||
| - ~~idea: make a temporary blacklist table with the to be exluded hashes?~~ (Solved using parameterized arrays) | ||
|
|
||
| - idea: force use of id column (longs)? | ||
|
|
||
| - ~~Other solutions?~~ (Implemented using `ANY` and `ALL` operators) |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,23 +1,36 @@ | ||
| using Npgsql; | ||
| using System.Collections.Generic; | ||
| using System.Linq; | ||
| using Npgsql; | ||
| using Wkx; | ||
|
|
||
| namespace B3dm.Tileset; | ||
|
|
||
| public static class FeatureCountRepository | ||
| { | ||
| public static int CountFeaturesInBox(NpgsqlConnection conn, string geometry_table, string geometry_column, Point from, Point to, string query, int source_epsg, bool keepProjection = false) | ||
| public static int CountFeaturesInBox(NpgsqlConnection conn, string geometry_table, string geometry_column, Point from, Point to, string query, int source_epsg, bool keepProjection = false, HashSet<string> excludeHashes = null) | ||
| { | ||
| var select = $"COUNT({geometry_column})"; | ||
| var where = GeometryRepository.GetWhere(geometry_column, from, to, query, source_epsg, keepProjection); | ||
|
|
||
| // Add hash exclusion filter using parameterized query | ||
| if (excludeHashes != null && excludeHashes.Count > 0) { | ||
| where += $" AND MD5(ST_AsBinary({geometry_column})::text) != ALL(@excludeHashes)"; | ||
| } | ||
|
|
||
| var sql = $"SELECT {select} FROM {geometry_table} WHERE {where}"; | ||
| conn.Open(); | ||
| var cmd = new NpgsqlCommand(sql, conn); | ||
| var reader = cmd.ExecuteReader(); | ||
| reader.Read(); | ||
| var count = reader.GetInt32(0); | ||
| reader.Close(); | ||
| conn.Close(); | ||
| return count; | ||
| try { | ||
| using var cmd = new NpgsqlCommand(sql, conn); | ||
| if (excludeHashes != null && excludeHashes.Count > 0) { | ||
| cmd.Parameters.AddWithValue("excludeHashes", excludeHashes.ToArray()); | ||
| } | ||
| using var reader = cmd.ExecuteReader(); | ||
| reader.Read(); | ||
| var count = reader.GetInt32(0); | ||
| return count; | ||
| } | ||
| finally { | ||
| conn.Close(); | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.