Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.

pixels are being dropped during estimation #389

Closed
robinkraft opened this issue Jul 20, 2013 · 7 comments
Closed

pixels are being dropped during estimation #389

robinkraft opened this issue Jul 20, 2013 · 7 comments
Labels

Comments

@robinkraft
Copy link
Contributor

Summary

About 300k pixels are being dropped during estimation. Here's the map (it's the "missing pixels" visualization in CartoDB):

http://cdb.io/1aybQpl

Initial analysis

It almost looks like specific tiles are being dropped - the edges of these groups of pixels are way too clean, and don't correspond to the edges of ecoregions or gadm admin areas.

But I don't think that's what's actually happening - these pixels show up as gaps in the data. The hole in India is surrounded by non-missing FORMA data from the same tile, for example.

Missing pixels

missing_pixels___cartodb

Gaps in GFW site

global_forest_watch_2 0-3

Indian gap only affects part of tile

image

Test data/info

Missing tiles

Here are the missing tiles I uncovered (there could be more):

8 6
10 6
13 11
13 12
22 11
27 7
29 7
25 7
33 10
Debugging

There's about 1.6gb of data for the pixels that get dropped. To run the estimation step for these pixels, do this from a cluster:

(use 'forma.hadoop.jobs.forma)
(in-ns 'forma.hadoop.jobs.forma)

(let [beta-src (hfs-seqfile "s3n://pailbucket/all-betas")
      dynamic-src (hfs-seqfile "s3n://pailbucket/output/run-2013-07-18/neighbors-never-estimated")
      static-src (hfs-seqfile "s3n://pailbucket/output/run-2013-07-18/static-never-estimated")
      sink (hfs-seqfile "s3n://mybucket/mypath" :sinkmode :replace)]
  (?- sink (forma-estimate {:nodata -9999.0} beta-src dynamic-src static-src)))

;; don't forget to change the sink to something useful

Conclusion

We really need to fix this since it's a long-standing bug (since late last year at least!) and so our coverage is incomplete! But we can still do updates and whatnot until that happens - no one has noticed the holes so far, and hopefully we can fix this before they do.

@eightysteele
Copy link
Contributor

Nice issue dude. Also I like the red arrows. So big.

On Fri, Jul 19, 2013 at 6:24 PM, Robin Kraft [email protected]:

Summary

About 300k pixels are being dropped during estimation. Here's the map
(it's the "missing pixels" visualization in CartoDB):

http://cdb.io/1aybQpl
Initial analysis

It almost looks like specific tiles are being dropped - the edges of these
groups of pixels are way too clean, and don't correspond to the edges of
ecoregions or gadm admin areas.

But I don't think that's what's actually happening - these pixels show up
as gaps in the data. The hole in India is surrounded by non-missing FORMA
data from the same tile, for example.
Missing pixels

[image: missing_pixels___cartodb]https://f.cloud.github.com/assets/428784/829322/6eacd1ce-f0d3-11e2-9590-247cb404738d.png
Gaps in GFW site

[image: global_forest_watch_2 0-3]https://f.cloud.github.com/assets/428784/829320/4b984ccc-f0d3-11e2-98f6-db9a9aac42aa.png
Indian gap only affects part of tile

[image: image]https://f.cloud.github.com/assets/428784/829402/ee973e34-f0d7-11e2-9da6-559021858c58.png
Test data/info Missing tiles

Here are the missing tiles I uncovered (there could be more):

8 6
10 6
13 11
13 12
22 11
27 7
29 7
25 7
33 10

Debugging

There's about 1.6gb of data for the pixels that get dropped. To run the
estimation step for these pixels, do this from a cluster:

(use 'forma.hadoop.jobs.forma)(in-ns 'forma.hadoop.jobs.forma)
(let [beta-src (hfs-seqfile "s3n://pailbucket/all-betas")
dynamic-src (hfs-seqfile "s3n://pailbucket/output/run-2013-07-18/neighbors-never-estimated")
static-src (hfs-seqfile "s3n://pailbucket/output/run-2013-07-18/static-never-estimated")
sink (hfs-seqfile "s3n://mybucket/mypath" :sinkmode :replace)](?- sink %28forma-estimate {:nodata -9999.0} beta-src dynamic-src static-src%29))
;; don't forget to change the sink to something useful

Conclusion

We really need to fix this since it's a long-standing bug (since late last
year at least!) and so our coverage is incomplete! But we can still do
updates and whatnot until that happens - no one has noticed the holes so
far, and hopefully we can fix this before they do.


Reply to this email directly or view it on GitHubhttps://github.com//issues/389
.

@robinkraft
Copy link
Contributor Author

Turns out these pixels are dropped because their trends values are bad - the short-stat is always the same value, and the long-stat and t-stat contain missing values.

["500" 9 6 38 1592 827
[-61.55873192436046 -61.55873192436046 -61.55873192436046 -61.55873192436046 -61.55873192436046
....]
[-9999.0 -9999.0 -9999.0 -9999.0 -9999.0 ...]
[-9999.0 -9999.0 -9999.0 -9999.0 -9999.0 ...]
[0.7689173428291797 0.807434980881081 0.8212616584246215 0.8566016818004791 0.8321855724693694 ...]]

Anything with a nodata value (-9999.0) is dropped in forma-estimate.

Those nodata values are generated when the trends values are added to the pail - the thrift schema doesn't allow nil values.

The nils are generated if trends-characteristics encounters a singular matrix.

Since a singular matrix is quite rare, it's strange that these pixels are all clustered near each other. It makes me think that the raw NDVI values are screwed up somehow.

@robinkraft
Copy link
Contributor Author

So it's actually an issue with rain - the rain values are all -999, the nodata value for that dataset. That seems unlikely given the location of the pixels (why no data on the Indian coast?!?) so I'm running the rain preprocessing to see how far back the issue actually goes in the workflow.

@robinkraft
Copy link
Contributor Author

I'll be examining pixel "500" 25 7 20 664, or 17.23124999999999, 73.37888587401405, in Maharashtra, India.

@robinkraft
Copy link
Contributor Author

Turns out that the rain data is actually all -999s for that location. We're investigating whether this is an issue in the raw data or an artifact of our workflow.

@robinkraft
Copy link
Contributor Author

Looks like there may be an issue with how we handle the rain data, possibly with reprojection. That spot in India should have data, even if only zeros. The image below shows data from 2013 on the right (anything grey is good data i.e. not NODATA), and the point in India in Google maps on the left.

previewscreensnapz008

@robinkraft
Copy link
Contributor Author

We now have a workaround (#393) that ignores constant cofactors in the long-trends function that was causing this issue.

#391 is still open so that we will also address any reprojection issues that could be a more fundamental problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants