Data request example (from Markus Erhard Schorn) and my solution strategy #45

tz05 · 2019-09-24T13:13:01Z

The original request (from Markus Erhard Schorn):

a) a tree-level datafile

one row for each measurement (i.e. if a tree was remeasured once, the file contains two lines for this tree)
each row contains a unique tree ID, which is the same for all measurements of a single tree (currently not the case in raw FIA data)
each row contains a unique plot ID, which is the same for all the trees in a single plot and for all the measurements of those trees, matching the plot ID in the plot-level datafile (see below)
if possible, data from plot-level datafile (important columns as mentioned below) already merged in this file

b) a plot-level datafile

in the best case one file containing merged data from XX_PLOT.csv and XX_COND.csv
important columns for me: FORTYPCD, STDAGE, SITECLCD from XX_COND.csv; INVYR, MEASYEAR, MEASMON, MEASDAY from XX_PLOT.csv; but it could be all the columns from both XX_PLOT.csv and XX_COND.csv as well
one row for each census
each row contains a unique plot ID (see above)

I need this for data from WA and OR. For training purposes, the data could be filtered for FORTYPCD == 201 already, but I'd rather have the full dataset and do that myself later on. Also you could filter for remeasured plots only, but the same applies here.

Description of my data product (with R):

It is tree-level data, with plot-level information merged (as Markus preferred). Fields in TREE data are all included in the files.
The file contains a field of “ID”, which is for trees. It is unique for each tree and same in different surveys. It is the TREE’s CN in the EARLIEST survey it appears.
The fields from PLOT and COND data include those fields mentioned in Markus’ Email.
Because Markus wanted to merge PLOT fields and COND fields (and further merge to TREE’s fields), and he wanted each row to be for each census of a plot, I excluded plots containing >1 conditions to avoid the cases where the merged PLOT and COND data contain multiple rows for each plot.
PLT_CN and COND_CN (IDs for plot and condition records) are same as used in PLOT and COND data.

ethanwhite · 2019-09-24T13:21:46Z

Thanks @tz05! You can also add the code here either by dragging and dropping the file or by putting it in a comment using ``` (3 backticks) on the line before and the line after the code.

For example:

data <- read.csv("mydatafile.csv")

henrykironde · 2019-09-25T21:03:06Z

Thanks @tz05, will have a look at this and get back to you if I do get any question.

henrykironde · 2019-11-03T02:45:55Z

@tz05 I want to confirm what I think is the goal of the code.
Get tree data from XX_TREE.csv and merge with XX_PLOT.csv using PLT_CN
and merge XX_COND.csv using XX_TREE.csv's CN.
And if so, we are supposed to have 969305 records.
Is this correct.

tz05 · 2019-11-03T09:11:15Z

My suggestion is merging PLOT.csv and COND.csv first and then merging it with TREE.csv.

Merging of PLOT.csv and COND.csv needs to use CN (in PLOT) and PLT_CN (in COND). The CN field in COND is irrelevant in this merging. And in this step, I filtered out the plots which contain multiple (>1) conditions. This is because for these plots, tree records cannot easily be merged to the plot+condition records. A tree should only associate with one condition; but for these plots, it is hard (if not impossible) to tell which "condition" a tree is associated with.

Merging this plot+condition data with TREE data needs to use PLT_CN (in TREE) and CN (in PLOT).

@tz05 I want to confirm what I think is the goal of the code.
Get tree data from XX_TREE.csv and merge with XX_PLOT.csv using PLT_CN
and merge XX_COND.csv using XX_TREE.csv's CN.
And if so, we are supposed to have 969305 records.
Is this correct.

henrykironde · 2019-11-06T02:08:59Z

henrykironde · 2019-11-06T02:12:12Z

I hope this is what the end goal is. I will have to add some code to perform specific attribute filtering at this level for individual files.

tz05 · 2019-11-06T04:15:23Z

About your diagram, two things need to be pay attention to:

The tree with CN of 1 in TREE table should not be in the final table, because its plot (CN of 45 in PLOT table) has multiple conditions. But the same tree in the previous survey might be in the final table if by that time the plot has only one single condition.
The final table needs an ID field and each tree has to have one unique ID. That means the tree records with CN of 2 and 5 in TREE table have to share the same ID. So do the tree records with CN of 3 and 4. In my R script, my strategy for the ID values was to use a tree's CN in its earliest survey. So its value won't change over time even when new surveys are included and the dataset is updated.

Wish these comments helpful.

tz05 · 2020-02-12T16:59:49Z

To keep a record, I uploaded my demonstration pdf file here for reference.
illustration.pdf

tz05 mentioned this issue Nov 6, 2019

![UNADJUSTEDNONRAW_thumb_22a](https://user-images.githubusercontent.com/5192965/68262168-6f990f00-0010-11ea-9a39-265e294ce4b8.jpg) #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data request example (from Markus Erhard Schorn) and my solution strategy #45

Data request example (from Markus Erhard Schorn) and my solution strategy #45

tz05 commented Sep 24, 2019

ethanwhite commented Sep 24, 2019

henrykironde commented Sep 25, 2019

henrykironde commented Nov 3, 2019 •

edited

Loading

tz05 commented Nov 3, 2019

henrykironde commented Nov 6, 2019

henrykironde commented Nov 6, 2019

tz05 commented Nov 6, 2019

tz05 commented Feb 12, 2020

Data request example (from Markus Erhard Schorn) and my solution strategy #45

Data request example (from Markus Erhard Schorn) and my solution strategy #45

Comments

tz05 commented Sep 24, 2019

ethanwhite commented Sep 24, 2019

henrykironde commented Sep 25, 2019

henrykironde commented Nov 3, 2019 • edited Loading

tz05 commented Nov 3, 2019

henrykironde commented Nov 6, 2019

henrykironde commented Nov 6, 2019

tz05 commented Nov 6, 2019

tz05 commented Feb 12, 2020

henrykironde commented Nov 3, 2019 •

edited

Loading