Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.

fire counts should be by day #2

Open
robinkraft opened this issue Jan 25, 2012 · 8 comments
Open

fire counts should be by day #2

robinkraft opened this issue Jan 25, 2012 · 8 comments
Labels

Comments

@robinkraft
Copy link
Contributor

The fires data are combined from Terra and Aqua. For a given pixel and fire, the fire could be included twice or more in the same day if both sensors picked it up. Indeed, in some places the same sensor might pass over twice - once during the day, once at night. Additional detections don't necessarily represent additional fires, so it seems best to have booleans for the different fire filter rules for a given day.

Corner case: a fire is picked up by Terra with 335k and 35 confidence. It's picked up by Aqua with 329k and 54 confidence. So we need to decide if we should count ANY detection above the thresholds as a detection for that day, or whether we only count a fire day if there is a fire that meets the threshold of interest. In the above case, for one fire we'd get a hit on the >=330k filter, and one for the >= 50 confidence filter, but no hit for the >=330k AND >= 50 conf case.

I vote for having a boolean be flipped to true for a given day as each criteria is met, even if by "different" fire detections. But I don't have a theoretical basis for thinking that, it just feels more right.

@sritchie
Copy link
Contributor

We'll store the full counts in the database; when do we the aggregation we can set anything above 1 down to 1 to get a count of days that had fires. This is more flexible, as we always have the "how many were counted" information around.

1 and 0 beats true and false since we can (apply merge-with + fire-seq)

@danhammer
Copy link
Contributor

This is really interesting. The current code may actually reflect the data
generating process better than any other (data reducing) option.

Let A be the event of brightness >= 300K, let B be the event that
confidence >= 50, and let C be the event of both. We expect all
coefficients on the variables representing A, B, and C to be positive (at
this point, we know this to be true). Suppose that A=True, and B=True, but
C=False. This would add to the probability of clearing; but not as much as
if both attributes were observed by a single satellite. The question,
then, is whether the probability increment of (A and B) should be higher
than (A or B). I think yes, since we know that there is measurement error
in the satellites (and this would induce downward bias). I say leave it as
is. But that's my opinion. To do anything else, would require adding
many, many more variables (representing more combinations of different fire
strata). I don't think we want to go down that path -- for many reasons.

On Wed, Jan 25, 2012 at 5:20 PM, robinkraft <
[email protected]

wrote:

The fires data are combined from Terra and Aqua. For a given pixel and
fire, the fire could be included twice or more in the same day if both
sensors picked it up. Indeed, in some places the same sensor might pass
over twice - once during the day, once at night. Additional detections
don't necessarily represent additional fires, so it seems best to have
booleans for the different fire filter rules for a given day.

Corner case: a fire is picked up by Terra with 335k and 35 confidence.
It's picked up by Aqua with 329k and 54 confidence. So we need to decide if
we should count ANY detection above the thresholds as a detection for that
day, or whether we only count a fire day if there is a fire that meets the
threshold of interest. In the above case, for one fire we'd get a hit on
the >=330k filter, and one for the >= 50 confidence filter, but no hit for
the >=330k AND >= 50 conf case.

I vote for having a boolean be flipped to true for a given day as each
criteria is met, even if by "different" fire detections. But I don't have a
theoretical basis for thinking that, it just feels more right.


Reply to this email directly or view it on GitHub:
https://github.com/sritchie/forma-clj/issues/2

@robinkraft
Copy link
Contributor Author

Interesting ... So currently the period aggregation adds up all the fires that happened in a given period. Each detection is stored separately, and a detection's rule attributes (>=330, >=50, both, no filter) add to the period's counts for each of these rules.

Does this reflect what you're describing above? Aggregating 1/0 by day before aggregating by period is similar, but without explicit double counting. Yes a fire could be detected at 11pm and 1am and get counted on different days, but that seems less likely. We could test this.

I guess the question is: Do we care about double counting? Or put another way, does knowing that a fire was detected more than once on the same day add meaningful information to the system? It's possible that multiple detections in a day reflect multiple fires within a 1km pixel, but to me it seems that the intensity and duration of fires is more meaningful than the raw count.

Here's the aggregation function for your reference.

https://github.com/sritchie/forma-clj/blob/develop/src/clj/forma/schema.clj#L106

@danhammer
Copy link
Contributor

Thanks. I think, for now, this reflects what we'd like to do. There are a
lot of strange things, random things that may happen -- but that's what our
spatial smoothing has been built for. I think that we would have to think
hard about moving from the default -- which we should and will do.

On Wed, Jan 25, 2012 at 6:12 PM, robinkraft <
[email protected]

wrote:

Interesting ... So currently the period aggregation adds up all the fires
that happened in a given period. Each detection is stored separately, and a
detection's rule attributes (>=330, >=50, both, no filter) add to the
period's counts for each of these rules.

Does this reflect what you're describing above? Aggregating 1/0 by day
before aggregating by period is similar, but without explicit double
counting. Yes a fire could be detected at 11pm and 1am and get counted on
different days, but that seems less likely. We could test this.

I guess the question is: Do we care about double counting? Or put another
way, does knowing that a fire was detected more than once on the same day
add meaningful information to the system? It's possible that multiple
detections in a day reflect multiple fires within a 1km pixel, but to me it
seems that the intensity and duration of fires is more meaningful than the
raw count.

Here's the aggregation function for your reference.

https://github.com/sritchie/forma-clj/blob/develop/src/clj/forma/schema.clj#L106


Reply to this email directly or view it on GitHub:
https://github.com/sritchie/forma-clj/issues/2#issuecomment-3662019

@sritchie
Copy link
Contributor

sritchie commented Feb 4, 2012

Is this still an issue?

@robinkraft
Copy link
Contributor Author

It's tabled for now, but it's something we might want to revisit. I'm not convinced raw fire counts are the right way to go, but at the end of the day if 500m data look good then I guess it doesn't matter.

That said, the other day I learned about a 250m fires product that's supposed to be coming out soon. It's based on MODIS thermal anomalies but has some additional secret sauce.

On Feb 4, 2012, at 3:55 PM, Sam Ritchie wrote:

Is this still an issue?


Reply to this email directly or view it on GitHub:
https://github.com/sritchie/forma-clj/issues/2#issuecomment-3814402

@danhammer
Copy link
Contributor

@robinkraft is this still an issue? should we reframe, given today's convo?

@robinkraft
Copy link
Contributor Author

I think it's still an issue, even if we use additional info on fires like recentness (#125). There's no way to distinguish fires detected twice vs. separate fires the same day. I recommend switching to simply removing duplicate fire days.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants