Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlGeometry validation flag behaviour should be documented #12

Closed
stijnherreman opened this issue Sep 2, 2020 · 17 comments
Closed

SqlGeometry validation flag behaviour should be documented #12

stijnherreman opened this issue Sep 2, 2020 · 17 comments
Labels
bug Something isn't working

Comments

@stijnherreman
Copy link

There is an annoying behaviour regarding validation of SqlGeometry that may cause surprises. Validation of the data does not always happen, depending on the method of construction. SqlGeometry.Deserialize (as recommended in #10) is one method where validation does not happen.

This is what I documented in our project, but it may need some rewording to explain it in a better way.

SqlGeometry.STIsValid() does not perform validation, it simply checks the 'V' flag of the geometry.
SqlGeometry.IsValidDetailed() only performs validation if the 'V' flag is already set to false. Otherwise, it returns early and says the geometry is valid.

SqlServerBytesWriter maps Geometry.IsValid to the 'V' vlag, so now STIsValid() represents the NTS validity instead of the SQL Server validity.

This means that STIsValid() may return false while IsValidDetailed() says the geometry is valid.

It also means that both STIsValid() and IsValidDetailed() may indicate that the geometry is valid, while it is actually considered invalid by SQL Server.
If the geometry is later on serialized and deserialized (e.g. to WKT and back), only then will STIsValid() and IsValidDetailed() indicate that the geometry is invalid.

Users who are dealing with possibly invalid geometries and have SqlGeometry as their final output, should therefore avoid this module and use a different method like WKB:

var writer = new WKBWriter(ByteOrder.LittleEndian, handleSRID: false, emitZ: true, emitM: true);
var sqlGeometry = SqlGeometry.STGeomFromWKB(new SqlBytes(writer.Write(source)), source.SRID);
@airbreather
Copy link
Member

Can you please provide an example of a geometry that NTS claims is valid, but which SQL Server would claim is invalid?

@stijnherreman
Copy link
Author

Sure, here is one example: LINESTRING (231181.381131 203495.841423, 231200.520569 203503.928265, 231200.52167 203503.928742, 231200.520569 203503.928265, 231200.52167 203503.928742, 231208.708012 203506.628348, 231197.573664 203519.33647)

A bit easier to read:

231181.381131 203495.841423,
231200.520569 203503.928265,
231200.52167 203503.928742,
231200.520569 203503.928265,
231200.52167 203503.928742,
231208.708012 203506.628348,
231197.573664 203519.33647

NTS considers this valid, but SQL Server does not:

24413: Not valid because of two overlapping edges in curve (1).

If needed, I can also find some test data for the other case, where NTS claims it is invalid but SQL Server does not. I would need some more time for this though, I deal with a lot of test data and I don't remember exactly which test cases this occurred in.

@stijnherreman
Copy link
Author

For reference, here's a related issue that I unfortunately did not manage to get back to: NetTopologySuite/NetTopologySuite#267

The person who last commented also observed this behaviour of the 'V' flag.

@airbreather
Copy link
Member

OK, I see -- based on some testing, MSSQL seems to think that a line string is invalid if there exists a segment (of nonzero length) that appears between more than one pair of consecutive points.

I'm not sure if that's an OGC rule or not, which matters here because:

  1. if this is an OGC rule, then there's a bug in NTS (and probably JTS too) and should be fixed there
  2. if it's not an OGC rule, then it's this library's responsibility to change this bit to incorporate any extra rules that MSSQL has above and beyond the ones that we have, and I'd probably want to expose some way for callers to run those rules directly so that these "extra" invalid situations can at least be detected before pushing them to the database.

Even if it is an OGC thing that NTS + JTS are missing, it still might make sense to add a workaround here in the meantime.

@airbreather airbreather added the bug Something isn't working label Sep 2, 2020
@stijnherreman
Copy link
Author

Sounds good! I wasn't sure if performing extra validation in this library would be OK, because I assume that speed is the focus of this library.

@airbreather
Copy link
Member

I wasn't sure if performing extra validation in this library would be OK, because I assume that speed is the focus of this library.

Not at the expense of correctness.


Some examples that we can use as test cases:

-- intersects itself in 3 places, multiple times, with one segment intersecting the line multiple times
-- still valid, because the intersections are all at individual points
SELECT geometry::STGeomFromText('LINESTRING (0 0, 0 2, 1 1, -1 1, 2 2)', 0).IsValidDetailed()

-- topologically equal to the previous one, but with some points repeated
-- still valid, because it still describes each segment only once
SELECT geometry::STGeomFromText('LINESTRING (0 0, 0 1, 1 1, 0 2, 0 1, -1 1, 2 2)', 0).IsValidDetailed()

-- essentially the same line as the others, but it crosses over one of its segments
-- not valid, because the segment between (0, 0) and (0, 1) appears multiple times
SELECT geometry::STGeomFromText('LINESTRING (0 1, 1 1, 0 2, 0 0, 0 1, -1 1, 2 2)', 0).IsValidDetailed()

@bjornharrtell
Copy link

This can also be the other way round, that NTS will flag the geometry as not valid but if imported into MSSQL through other means and validated inside MSSQL it doesn't see it is invalid. The case where I found this is for a polygon with self intersecting (touching) shell at a point (creating a "hole") which is an OGC invalid case detected by NTS.

I find it hard to determine if this is by design or a bug in MSSQL as they carefully seem to avoid to be clear about what they mean with "well-formed" at https://docs.microsoft.com/en-us/sql/t-sql/spatial-geometry/stisvalid-geometry-data-type?view=sql-server-ver15.

FObermaier added a commit that referenced this issue Apr 13, 2021
@FObermaier
Copy link
Member

How about creating a IsValidSqlSeverOp class derived from NTS' IsValidOp and use that in order to check validity?
@bjornharrtell's comment can be fixed by using NTS' IsValidOp and setting IsSelfTouchingRingFormingHoleValid = true.

@bjornharrtell
Copy link

@FObermaier are you sure? To me the definition of validity as per MSSQL is undefined, the case I found was just one example.

@FObermaier
Copy link
Member

Invalidity for SQLServer (geography) is described here:

@bjornharrtell
Copy link

And it lists Not valid, reason unknown. as one possible reason. :D

@bjornharrtell
Copy link

Seriously, my point is that we cannot assume JTS/NTS validity is the same as MSSQL validity.

@FObermaier
Copy link
Member

Seriously, my point is that we cannot assume JTS/NTS validity is the same as MSSQL validity.

That is why I suggested to create and use IsValidSqlServerOp for this library.

@bjornharrtell
Copy link

@FObermaier but you propose to derive it from IsValidOp which I'm saying is a bad idea, you will need Microsofts implementation.

@stijnherreman
Copy link
Author

Last year I started writing examples for each validation rule in the OGC Simple Features standard, with the goal of then comparing NTS and MSSQL to see where they differ.
Unfortunately I didn't get very far and I won't be doing any GIS related work for the foreseeable future, but maybe this can serve as a starting point for someone else to look into the issue.


6.1.4.1
A Point is a 0-dimensional geometric object and represents a single location in coordinate space. A Point has an x-coordinate value, a y-coordinate value.

6.1.5
A MultiPoint is simple if no two Points in the MultiPoint are equal (have identical coordinate values in X and Y).

Valid
MULTIPOINT ((0 0), (1 0))

Invalid
MULTIPOINT ((0 0), (0 0))

6.1.6.1
A Curve is simple if it does not pass through the same Point twice with the possible exception of the two end points [...]

Valid
LINESTRING ((0 0), (1 0))
LINESTRING ((0 0), (1 0), (1 1), (0 0))

Invalid
LINESTRING ((0 0), (1 0), (1 1), (0, -1))

6.1.8.1
A MultiCurve is simple if and only if all of its elements are simple and the only intersections between any two elements occur at Points that are on the boundaries of both elements.

Valid

6.1.10.1
A simple Surface may consists of a single “patch” that is associated with one “exterior boundary” and 0 or more “interior” boundaries.

6.1.11.1
The assertions for Polygons (the rules that define valid Polygons) are as follows:
a) Polygons are topologically closed;
b) The boundary of a Polygon consists of a set of LinearRings that make up its exterior and interior boundaries;
c) No two Rings in the boundary cross and the Rings in the boundary of a Polygon may intersect at a Point but only as a tangent [...]; Note: This last condition says that at a point common to the two curves, nearby points cannot be common. This forces each common point to be a point of tangency.
d) A Polygon may not have cut lines, spikes or punctures [...];
e) The interior of every Polygon is a connected point set;
f) The exterior of a Polygon with 1 or more holes is not connected. Each hole defines a connected component of the exterior.

6.1.14
The assertions for MultiPolygons are as follows.
a) The interiors of 2 Polygons that are elements of a MultiPolygon may not intersect.
b) The boundaries of any 2 Polygons that are elements of a MultiPolygon may not “cross” and may touch at only a finite number of Points.
c) A MultiPolygon is defined as topologically closed.
d) A MultiPolygon may not have cut lines, spikes or punctures, a MultiPolygon is a regular closed Point set [...]
e) The interior of a MultiPolygon with more than 1 Polygon is not connected; the number of connected components of the interior of a MultiPolygon is equal to the number of Polygons in the MultiPolygon.

@bjornharrtell
Copy link

I was hit by this again very recently with the result of geometries in SQL Server flagged as invalid but not invalid according to SQL Server when checked with IsValidDetailed(). I agree with the initial assement in this issue that writing geometries should go through WKB to make SQL Server determine the validity.

@FObermaier
Copy link
Member

OP request served with bc17118

kristofdegrave added a commit to kristofdegrave/NetTopologySuite.IO.SqlServerBytes that referenced this issue Feb 22, 2022
…n written to SQL.

This allows to override the default IsValid of the NTS Geometry and provide an own strategy. This way you can use the IsValidOp with the option SelfTouchingRingFormingHoleValid flagged. Or just make them all valid following the JTS way. Either way the developer will have more control, and if we would manage to provide a IsValidSqlServerOp, it could be passed this way.
FObermaier pushed a commit that referenced this issue Mar 4, 2022
This allows to override the default IsValid of the NTS Geometry and provide an own strategy. This way you can use the IsValidOp with the option SelfTouchingRingFormingHoleValid flagged. Or just make them all valid following the NTS way. Either way the developer will have more control, and if we would manage to provide a IsValidSqlServerOp, it could be passed this way.

relates to #12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants