Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

un-mixing SSURGO / STATSGO in SDA #36

Closed
smroecker opened this issue Nov 14, 2017 · 8 comments
Closed

un-mixing SSURGO / STATSGO in SDA #36

smroecker opened this issue Nov 14, 2017 · 8 comments
Assignees

Comments

@smroecker
Copy link
Member

get_component_from_SDA(WHERE = "compname = 'Miami'") is returning duplicates even though the duplicates argument is set to FALSE. Apparently it's also pulling STATSGO mapunit types. Need to include mapunit type in get or remove and write separate get for STASTGO.

@dylanbeaudette
Copy link
Member

Original query, with mu.mukey added, ~ 5 seconds run-time:

SELECT DISTINCT mu.nationalmusym, compname, comppct_r, compkind, majcompflag, localphase, slope_r, tfact, wei, weg, drainagecl, elev_r, aspectrep, map_r, airtempa_r, reannualprecip_r, ffd_r, nirrcapcl, nirrcapscl, irrcapcl, irrcapscl, frostact, hydgrp, corcon, corsteel, taxclname, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, taxpartsize, taxpartsizemod, taxceactcl, taxreaction, taxtempcl, taxmoistscl, taxtempregime, soiltaxedition, cokey, mu.mukey 
FROM legend l 
INNER JOIN mapunit mu ON mu.lkey = l.lkey 

INNER JOIN (
  SELECT MIN(nationalmusym) nationalmusym2, MIN(mukey) AS mukey2 
    FROM mapunit
    GROUP BY nationalmusym
 ) AS mu2 ON mu2.nationalmusym2 = mu.nationalmusym 

 INNER JOIN (
SELECT compname, comppct_r, compkind, majcompflag, localphase, slope_r, tfact, wei, weg, drainagecl, elev_r, aspectrep, map_r, airtempa_r, reannualprecip_r, ffd_r, nirrcapcl, nirrcapscl, irrcapcl, irrcapscl, frostact, hydgrp, corcon, corsteel, taxclname, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, taxpartsize, taxpartsizemod, taxceactcl, taxreaction, taxtempcl, taxmoistscl, taxtempregime, soiltaxedition, cokey , mukey AS mukey2 
FROM component
) AS c ON c.mukey2 = mu2.mukey2 
WHERE compname = 'Amador' 
ORDER BY cokey, compname, comppct_r DESC ;

Modified, based on my interpretation of the intended result, ~ 0.5 seconds run-time:

SELECT mu.nationalmusym, compname, comppct_r, compkind, majcompflag, localphase, slope_r, tfact, wei, weg, drainagecl, elev_r, aspectrep, map_r, airtempa_r, reannualprecip_r, ffd_r, nirrcapcl, nirrcapscl, irrcapcl, irrcapscl, frostact, hydgrp, corcon, corsteel, taxclname, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, taxpartsize, taxpartsizemod, taxceactcl, taxreaction, taxtempcl, taxmoistscl, taxtempregime, soiltaxedition, cokey, mu.mukey
FROM legend l 
INNER JOIN mapunit mu ON mu.lkey = l.lkey 
INNER JOIN component co ON mu.mukey = co.mukey
WHERE compname = 'Amador' 
ORDER BY cokey, compname, comppct_r DESC ;

And, further simplified:

SELECT mu.nationalmusym, compname, comppct_r, compkind, majcompflag, localphase, slope_r, tfact, wei, weg, drainagecl, elev_r, aspectrep, map_r, airtempa_r, reannualprecip_r, ffd_r, nirrcapcl, nirrcapscl, irrcapcl, irrcapscl, frostact, hydgrp, corcon, corsteel, taxclname, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, taxpartsize, taxpartsizemod, taxceactcl, taxreaction, taxtempcl, taxmoistscl, taxtempregime, soiltaxedition, cokey, mu.mukey
FROM 
mapunit mu
INNER JOIN component co ON mu.mukey = co.mukey
WHERE compname = 'Amador' 
ORDER BY cokey, compname, comppct_r DESC ;

All three queries return the same data, 58 rows worth.

I get 48 rows of data when querying the same data in the SoilWeb copy of FY2018 SSURGO, and 10 rows from the SoilWeb copy of STATSGO.

Ok, jut as Stephen suspected--only took me that much typing to figure out what was clearly stated in the description.

Next query, this one works as expected:

SELECT mu.nationalmusym, compname, comppct_r, compkind, majcompflag, localphase, slope_r, tfact, wei, weg, drainagecl, elev_r, aspectrep, map_r, airtempa_r, reannualprecip_r, ffd_r, nirrcapcl, nirrcapscl, irrcapcl, irrcapscl, frostact, hydgrp, corcon, corsteel, taxclname, taxorder, taxsuborder, taxgrtgroup, taxsubgrp, taxpartsize, taxpartsizemod, taxceactcl, taxreaction, taxtempcl, taxmoistscl, taxtempregime, soiltaxedition, cokey, mu.mukey
FROM legend
INNER JOIN mapunit mu ON mu.lkey = legend.lkey
INNER JOIN component co ON mu.mukey = co.mukey
WHERE compname = 'Amador'
-- critical: filter out STATSGO data
AND legend.areasymbol != 'US'
ORDER BY cokey, compname, comppct_r DESC ;

That is better!

image

Looks like either something recently changed in SDA, or, we have been doing it wrong for a couple of years now. Yikes.

@dylanbeaudette
Copy link
Member

A couple of suggestions / questions for this and related SDA functions:

  • there aren't many cases when DISTINCT is required and typically suggests a non-specific join condition
  • I don't understand the purpose of the duplicates argument: does this imply some level of additional filtering for a non-specific WHERE clause?

Opened #37 for the follow-up work in manual pages and tutorials.

@dylanbeaudette
Copy link
Member

I don't mind fixing this in a branch and then submitting a pull request or changing directly. Unfortunately we just missed the last submission to CRAN (Monday).

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Dec 4, 2017

Update from Phil on geometry

Yes, all spatial intersection tests apply to only SSURGO data - the spatial intersections are performed against the sapolygon, mupoint, muline and/or mupolygon tables as appropriate.

Both SSURGO and STATSGO non-spatial data are in the “tabular” tables (legend, mapunit, component, etc.), the STATSGO spatial data are only in the gsmmupolygon table. In order to work directly with STATSGO data you have the option to use the intrinsic SQL Server geometry functions against the gsmmupolygon. For example, given a WGS84 WKT string describing an area in Wayne County OH you can perform the equivalent of “SDA_Get_Mukey_from_intersection_with_WktWgs84()” with the following:

select distinct mukey from gsmmupolygon 
where mupolygongeo.STIntersects(
       geometry::STPolyFromText(
              'polygon(
                     (-81.92624 40.82873,-81.92624 40.90403,-81.75872 40.90403,-81.75872 40.82873,-81.92624 40.82873))', 
              4326)
) = 1

The intrinsic functions are documented by Microsoft, start at https://technet.microsoft.com/en-us/library/bb933973%28v=sql.110%29.aspx

@dylanbeaudette dylanbeaudette changed the title get_component_from_SDA() returning duplicates un-mixing SSURGO / STATSGO in SDA Dec 4, 2017
@dylanbeaudette
Copy link
Member

Should queries that hit SDA implicitly filter STATSGO (tabular) records or leave it to a toggle in the function arguments?

@dylanbeaudette
Copy link
Member

Any updates? Happy to test / check.

@smroecker
Copy link
Member Author

I don't see this as an issue anymore. I think originally I was confused as to why I was seeing STASTGO and SSURGO data being returned together, and the second issue now being discussed in issue #38 . If users want to exclude STATSGO from their results, they can simply include "areasymbol != 'US'" in their WHERE argument. Although we could add a toggle as you suggest above. I think the best course I action is to describe in the details and example section of the help file. I will get on that, and then close this issue.

example:
test <- get_mapunit_from_SDA(WHERE = "areasymbol != 'US' AND muname LIKE '%Miami%'")

@smroecker
Copy link
Member Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants