Skip to content

Commit

Permalink
Added readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mdaeron committed May 11, 2020
1 parent 453ebf7 commit a2beacf
Showing 1 changed file with 373 additions and 0 deletions.
373 changes: 373 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
# D47crunch

Python library for processing and standardizing carbonate clumped-isotope analyses, from low-level data out of a dual-inlet mass spectrometer to final, “absolute” Δ<sub>47</sub> values with fully propagated analytical error estimates.

All questions and suggestions are welcome and should be directed at [Mathieu Daëron](mailto:[email protected]?subject=[D47crunch]).

## 1. Requirements

Python 3, [numpy], [lmfit]. We recommend installing the [Anaconda] distribution.

[numpy]: https://numpy.org
[lmfit]: https://lmfit.github.io
[Anaconda]: https://www.anaconda.com/distribution

## 2. Installation

This should do the trick:

```bash
pip install D47crunch
```

Alternatively:

1. download [D47crunch-master.zip]
2. unzip it
3. rename the resulting directory to `D47crunch`
4. move the `D47crunch` directory to somewhere in your `PYTHONPATH` or to your current working directory

[D47crunch-master.zip]: https://github.com/mdaeron/D47crunch/archive/master.zip

## 3. Documentation

For the full API documentation, see [https://github.com/mdaeron/D47crunch/docs/index.html].

For a short tutorial see below.

[https://github.com/mdaeron/D47crunch/docs/index.html]: https://github.com/mdaeron/D47crunch/docs/index.html

## 4. Usage

### 4.1 Import data

Start with some raw data stored as CSV in a file named `rawdata.csv` (spaces after commas are optional). Each line corresponds to a single analysis.

The only required fields are a sample identifier (`Sample`), and the working-gas delta values `d45`, `d46`, `d47`. If no session information is provided, all analuses will be treated as belonging to a single analytical session. Alternatively, to group analyses into sessions, provide session identifiers in a `Session` field. If not specified by the user, a unique identifier (`UID`) will be assigned automatically to each analysis. Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to VSMOW, with λ equal to `D47data.lambda_17`), and are assumed to be zero otherwise. Working-gas deltas `d48` and `d49` may also be provided, and are otherwise treated as `NaN`.

Example `rawdata.csv` file:

```
UID, Session, Sample, d45, d46, d47, d48, d49
A01, Session1, ETH-1, 5.79502, 11.62767, 16.89351, 24.56708, 0.79486
A02, Session1, IAEA-C1, 6.21907, 11.49107, 17.27749, 24.58270, 1.56318
A03, Session1, ETH-2, -6.05868, -4.81718, -11.63506, -10.32578, 0.61352
A04, Session1, IAEA-C2, -3.86184, 4.94184, 0.60612, 10.52732, 0.57118
A05, Session1, ETH-3, 5.54365, 12.05228, 17.40555, 25.96919, 0.74608
A06, Session1, ETH-2, -6.06706, -4.87710, -11.69927, -10.64421, 1.61234
A07, Session1, ETH-1, 5.78821, 11.55910, 16.80191, 24.56423, 1.47963
A08, Session1, IAEA-C2, -3.87692, 4.86889, 0.52185, 10.40390, 1.07032
A09, Session1, ETH-3, 5.53984, 12.01344, 17.36863, 25.77145, 0.53264
A10, Session1, IAEA-C1, 6.21905, 11.44785, 17.23428, 24.30975, 1.05702
A11, Session2, ETH-1, 5.79958, 11.63130, 16.91766, 25.12232, 1.25904
A12, Session2, IAEA-C1, 6.22514, 11.51264, 17.33588, 24.92770, 2.54331
A13, Session2, ETH-2, -6.03042, -4.74644, -11.52551, -10.55907, 0.04024
A14, Session2, IAEA-C2, -3.83702, 4.99278, 0.67529, 10.73885, 0.70929
A15, Session2, ETH-3, 5.53700, 12.04892, 17.42023, 26.21793, 2.16400
A16, Session2, ETH-2, -6.06820, -4.84004, -11.68630, -10.72563, 0.04653
A17, Session2, ETH-1, 5.78263, 11.57182, 16.83519, 25.09964, 1.26283
A18, Session2, IAEA-C2, -3.85355, 4.91943, 0.58463, 10.56221, 0.71245
A19, Session2, ETH-3, 5.52227, 12.01174, 17.36841, 26.19829, 1.03740
A20, Session2, IAEA-C1, 6.21937, 11.44701, 17.26426, 24.84678, 0.76866
```

First create a `D47data` object named `foo` and import `rawdata.csv`:

```python
import D47crunch

foo = D47crunch.D47data()
foo.read('rawdata.csv')

print('foo contains:')
print(f'{len(foo)} analyses')
print(f'{len({r["Sample"] for r in foo})} samples')
print(f'{len({r["Session"] for r in foo})} sessions')

# output:
# foo contains:
# 20 analyses
# 5 samples
# 2 sessions
```

We can inspect the elements of `foo`:

```python
r = foo[0]
for k in r:
print(f'r["{k}"] = {repr(r[k])}')

# output:
# r["UID"] = 'A01'
# r["Session"] = 'Session1'
# r["Sample"] = 'ETH-1'
# r["d45"] = 5.79502
# r["d46"] = 11.62767
# r["d47"] = 16.89351
# r["d48"] = 24.56708
# r["d49"] = 0.79486
```

### 4.2 Working gas composition

There are two ways to define the isotpic composition of the working gas.

#### 4.2.1 Option 1: explicit definition

Directly writing to fields `d13Cwg_VPDB` and `d18Owg_VSMOW`:

```python
for r in foo:
if r['Session'] == 'Session1':
r['d13Cwg_VPDB'] = -3.75
r['d18Owg_VSMOW'] = 25.14
elif r['Session'] == 'Session2':
r['d13Cwg_VPDB'] = -3.74
r['d18Owg_VSMOW'] = 25.17
```

#### 4.2.2 Option 2: based on the known composition of a sample:

```python
# The 2 code lines below are the default settings. It is thus not
# necessary to include them unless you wish to use different values.

foo.SAMPLE_CONSTRAINING_WG_COMPOSITION = ('ETH-3', 1.71, -1.78)
foo.ALPHA_18O_ACID_REACTION = 1.00813 # (Kim et al., 2007), calcite at 90 °C

# Compute the WG composition for each session:
foo.wg()

```

### 4.3 Crunch the data

Now compute δ<sup>13</sup>C, δ<sup>18</sup>Ο, and raw Δ<sub>47</sub>, Δ<sub>48</sub>, Δ<sub>49</sub> values. Note that δ<sup>18</sup>Ο is the CO<sub>2</sub> composition. The user is responsible for any acid fractionation correction.

```python
foo.crunch()

r = foo[0]
for k in r:
print(f'r["{k}"] = {r[k]}')

# output:
# r["UID"] = A01
# r["Session"] = Session1
# r["Sample"] = ETH-1
# r["d45"] = 5.79502
# r["d46"] = 11.62767
# r["d47"] = 16.89351
# r["d48"] = 24.56708
# r["d49"] = 0.79486
# r["d13Cwg_VPDB"] = -3.7555729459832765
# r["d18Owg_VSMOW"] = 25.1145492463934
# r["D17O"] = 0.0
# r["d13C_VPDB"] = 1.9948594073404546
# r["d18O_VSMOW"] = 37.03357105550355
# r["D47raw"] = -0.5746856128030498
# r["D48raw"] = 1.1496833191546596
# r["D49raw"] = -27.690248970251407
```

### 4.4 Oxygen-17 correction parameters

Note that this crunching step uses the IUPAC oxygen-17 correction parameters, as recommended by [Daëron et al. (2016)](https://dx.doi.org/10.1016/j.chemgeo.2016.08.014) and [Schauer et al. (2016)](https://dx.doi.org/10.1002/rcm.7743):

```python
R13_VPDB = 0.01118 # (Chang & Li, 1990)
R18_VSMOW = 0.0020052 # (Baertschi, 1976)
lambda_17 = 0.528 # (Barkan & Luz, 2005)
R17_VSMOW = 0.00038475 # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB)
R18_VPDB = R18_VSMOW * 1.03092
R17_VPDB = R17_VSMOW * 1.03092 ** lambda_17
```

To use different numerical values for these parameters, change them before performing `foo.crunch()`:

```python
# to change the lambda value to 0.5164,
# leaving the other parameters unchanged:
foo.lambda_17 = 0.5164
```

### 4.5 Reference frame

The nominal Δ<sub>47</sub> values assigned to the anchor samples are defined in `foo.Nominal_D47`, which may be redefined arbitrarily:

```python
print(foo.Nominal_D47) # default values from Bernasconi et al. (2018)
# output:
# {'ETH-1': 0.258, 'ETH-2': 0.256, 'ETH-3': 0.691}

foo.Nominal_D47 = {
"Foo-1": 0.232,
"Foo-2": 0.289,
"Foo-3": 0.455,
"Foo-4": 0.704,
}

print(foo.Nominal_D47)
# output:
# {'Foo-1': 0.232, 'Foo-2': 0.289, 'Foo-3': 0.455, 'Foo-4': 0.704}
```

### 4.6 Standardization (`pooled`)

### 4.6.1 Default method (`pooled`)

The default standardization approach computes the best-fit standardization parameters (a,b,c) for each session, along with the best-fit Δ<sub>47</sub> values of unknown samples, using a pooled regression model taking into account the relative mapping of all samples (anchors and unknowns) in (δ<sub>47</sub>, Δ<sub>47</sub>) space.

```python
foo.standardize()
```

The following text is output:

```
-------------------------------- -----------
N samples (anchors + unknowns) 5 (3 + 2)
N analyses (anchors + unknowns) 20 (12 + 8)
Repeatability of δ13C_VPDB 13.8 ppm
Repeatability of δ18O_VSMOW 41.9 ppm
Repeatability of Δ47 (anchors) 10.7 ppm
Repeatability of Δ47 (unknowns) 3.4 ppm
Repeatability of Δ47 (all) 8.6 ppm
Model degrees of freedom 12
Student's 95% t-factor 2.18
-------------------------------- -----------
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
Session Na Nu d13Cwg_VPDB d18Owg_VSMOW r_d13C r_d18O r_D47 a ± SE 1e3 x b ± SE c ± SE
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
Session1 6 4 -3.756 25.115 0.0035 0.0415 0.0066 0.838 ± 0.016 3.340 ± 0.247 -0.859 ± 0.007
Session2 6 4 -3.743 25.117 0.0174 0.0490 0.0119 0.815 ± 0.015 4.601 ± 0.246 -0.847 ± 0.007
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
------- - --------- ---------- ------ ------ -------- ------ --------
Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene
------- - --------- ---------- ------ ------ -------- ------ --------
ETH-1 4 2.00 37.00 0.2580 0.0096
ETH-2 4 -10.03 20.18 0.2560 0.0154
ETH-3 4 1.71 37.45 0.6910 0.0039
IAEA-C1 4 2.46 36.88 0.3624 0.0061 ± 0.0133 0.0031 0.901
IAEA-C2 4 -8.04 30.18 0.7246 0.0082 ± 0.0178 0.0037 0.825
------- - --------- ---------- ------ ------ -------- ------ --------
```

### 4.6.1 `D47data.sessions`

Under the hood, the normalization step does many things. It stores session information in `foo.sessions`:

```python
print([k for k in foo.sessions])
# output: ['Session1', 'Session2']

for k in foo.sessions['Session1']:
if k == 'data':
print(f"{k:>16}: [...] (too large to print)")
else:
print(f"{k:>16}: {foo.sessions['Session1'][k]}")
# output:
# data: [...] (too large to print)
# scrambling_drift: False
# slope_drift: False
# wg_drift: False
# d13Cwg_VPDB: -3.7555729459832765
# d18Owg_VSMOW: 25.1145492463934
# Na: 6
# Nu: 4
# a: 0.8381700022050721
# SE_a: 0.015603758280720111
# b: 0.0033401762623331823
# SE_b: 0.00024740622188942793
# c: -0.8586982120784981
# SE_c: 0.006737855778815339
# a2: 0.0
# b2: 0.0
# c2: 0.0
# r_d13C_VPDB: 0.0035270933192414504
# r_d18O_VSMOW: 0.04146499779427257
# r_D47: 0.006638319347677248
```

each element of `foo.sessions` has the following attributes:

+ `data`: list of all the analyses in this session
+ `scrambling_drift`, `slope_drift`, `wg_drift`: whether parameters `a`, `b`,`c` are allowed to drift (change linearly with with time)
+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: working gas composition
+ `Na`: number of anchor analyses in this session
+ `Nu`: number of unknown analyses in this session
+ `a`,`SE_a`: best-fit value and model SE of scrambling factor
+ `b`,`SE_b`: best-fit value and model SE of compositional slope
+ `c`,`SE_c`: best-fit value and model SE of working gas offset
+ `a2`,`b2`,`c2`: drift rates (per unit of `TimeTag`) of `a`,`b`, `c`. If `TimeTag` is one of the fields in the raw data, this will be used, otherwise `TimeTag` starts at 0 for each session and increases by 1 for each analysis, in the listed order (thus beware of datasets ordered by sample name).
+ `r_d13C_VPDB`, `r_d18O_VSMOW`, `r_D47`: repeatabilities for `d13C_VPDB`, `d18O_VSMOW`, `D47` in this session

### 4.6.2 `D47data.samples`, `D47data.anchors`, and `D47data.unknowns`

Additional information about the samples is stored in `foo.samples` (the same information can also be accessed via `foo.anchors` and `foo.unknowns`):

```python
print([k for k in foo.samples])
# output:
# ['ETH-1', 'ETH-2', 'ETH-3', 'IAEA-C1', 'IAEA-C2']

for k in foo.samples['IAEA-C1']:
if k == 'data':
print(f"{k:>12}: [...] (too large to print)")
else:
print(f"{k:>12}: {foo.samples['IAEA-C1'][k]}")
# output:
# data: [...] (too large to print)
# N: 4
# SD_D47: 0.003120794222015294
# d13C_VPDB: 2.4606390899379327
# d18O_VSMOW: 36.87682448377142
# D47: 0.36241877475632883
# SE_D47: 0.006107113137661028
# p_Levene: 0.9011524351870661
```

Each element of `foo.samples` has the following attributes:

+ `N`: total number of analyses in the whole data set
+ `SD_D47`: the sample SD of Δ<sub>47</sub> for this sample
+ `d13C_VPDB`, `d18O_VSMOW`: average δ<sup>13</sup>C, δ<sup>18</sup>Ο values for the analyte CO<sub>2</sub>.
+ `D47`, `SE_D47`: best-fit value and model SE for the Δ<sub>47</sub> of this sample
+ `p_Levene`: p-value for a [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test) of whether the observed Δ<sub>47</sub> variance for this sample is significantly larger than that for ETH-3 (to change the reference sample to compare with, e.g. to ETH-1: `foo.LEVENE_REF_SAMPLE = 'ETH-1'`).

### 4.6.3 `D47data.repeatability`

The overall analytical repeatabilities are now saved to `foo.repeatability`:

```python
for k in foo.repeatability:
print(f"{k:>12}: {foo.repeatability[k]}")

# output:
# r_d13C_VPDB: 0.013821704833171146
# r_d18O_VSMOW: 0.04191487414887982
# r_D47a: 0.010690471302409636
# r_D47u: 0.0034370447628642863
# r_D47: 0.008561367687546161
```

+ `r_d13C_VPDB`: Analytical repeatability of δ<sup>13</sup>C for all samples
+ `r_d18O_VSMOW`: Analytical repeatability of δ<sup>18</sup>O for all samples (CO<sub>2</sub> values)
+ `r_D47a`: Analytical repeatability of Δ<sub>47</sub> for anchor samples only
+ `r_D47u`: Analytical repeatability of Δ<sub>47</sub> for unknown samples only
+ `r_D47`: Analytical repeatability of Δ<sub>47</sub> for all samples.

### 4.6.4 `D47data.result`

By default `foo.normalize()` uses the [`lmfit.Minimizer.leastsq()`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.Minimizer) method, which returns an instance of [`lmfit.MinimizerResult`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.MinimizerResult). This `MinimizerResult`instance is stored in `foo.result`. A detailed report may be printed using `foo.report()`

```python
print(type(foo.normalization))
# output:
# <class 'lmfit.minimizer.MinimizerResult'>
```

0 comments on commit a2beacf

Please sign in to comment.