Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gallery: Add Censored and Truncated distributions. #490

Merged
merged 4 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/examples/censored_distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Censored Distribution

This is not a distribution per se, but a modifier of univariate distributions.

A censored distribution arises when the observed data is limited to a certain range, and values outside this range are not recorded. For instance, in a study aiming to measure the impact of a drug on mortality rates it may be known that an individual's age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75. Censoring can also happen when a value falls outside the range of a measuring instrument. For example, if a bathroom scale only measures up to 140 kg, and a 160-kg person is weighed, the observer would only know that the individual's weight is at least 140 kg.


## Probability Density Function (PDF):

```{code-cell}
---
tags: [remove-input]
mystnb:
image:
alt: Censored Distribution PDF
---

import arviz as az
from preliz import Normal, Censored
az.style.use('arviz-doc')
Censored(Normal(0, 1), -1, 1).plot_pdf(support=(-4, 4))
Normal(0, 1).plot_pdf(alpha=0.5);
```

## Cumulative Distribution Function (CDF):

```{code-cell}
---
tags: [remove-input]
mystnb:
image:
alt: Censored Distribution CDF
---

Censored(Normal(0, 1), -1, 1).plot_cdf(support=(-4, 4))
Normal(0, 1).plot_cdf(alpha=0.5);
```


## Key properties and parameters:


**Probability Density Function (PDF):**

Given a base distribution with cumulative distribution function (CDF) and probability density mass/function (PDF). The pdf of a Censored distribution is:

$$
\begin{cases}
0 & \text{for } x < \text{lower}, \\
\text{CDF}(\text{lower}) & \text{for } x = \text{lower}, \\
\text{PDF}(x) & \text{for } \text{lower} < x < \text{upper}, \\
1-\text{CDF}(\text{upper}) & \text {for } x = \text{upper}, \\
0 & \text{for } x > \text{upper},
\end{cases}
$$

where `lower` and `upper` are the lower and upper bounds of the censored distribution, respectively.

**Cumulative Distribution Function (CDF):**

The given expression can be written mathematically as:


$$
\begin{cases}
0 & \text{for } x < \text{lower}, \\
\text{CDF}(x) & \text{for } \text{lower} < x < \text{upper}, \\
1 & \text{for } x > \text{upper},
\end{cases}
$$

where `lower` and `upper` are the lower and upper bounds of the censored distribution, respectively.


```{seealso}
:class: seealso


**Related Distributions:**

- [Truncated](truncated_distribution.md) - In a truncated distribution, values outside the range are set to the nearest bound, while in a censored distribution, they are not recorded.

```

## References

- Wikipedia - [Censored distribution](https://en.wikipedia.org/wiki/Censoring_(statistics))
96 changes: 96 additions & 0 deletions docs/examples/truncated_distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
display_name: Python 3
language: python
name: python3
---
# Truncated Distribution

This is not a distribution per se, but a modifier of univariate distributions.

Truncated distributions arise in cases where the ability to record, or even to know about, occurrences is limited to values which lie above or below a given threshold or within a specified range. For example, if the dates of birth of children in a school are examined, these would typically be subject to truncation relative to those of all children in the area given that the school accepts only children in a given age range on a specific date. There would be no information about how many children in the locality had dates of birth before or after the school's cutoff dates if only a direct approach to the school were used to obtain information.

## Probability Density Function (PDF):

```{code-cell}
---
tags: [remove-input]
mystnb:
image:
alt: Truncated Distribution PDF
---

import arviz as az
from preliz import Gamma, Truncated
az.style.use('arviz-doc')
Truncated(Gamma(mu=2, sigma=1), 1, 4.5).plot_pdf()
Gamma(mu=2, sigma=1).plot_pdf();
```

## Cumulative Distribution Function (CDF):

```{code-cell}
---
tags: [remove-input]
mystnb:
image:
alt: Trucated Distribution CDF
---

Truncated(Gamma(mu=2, sigma=1), 1, 4.5).plot_cdf()
Gamma(mu=2, sigma=1).plot_cdf();
```


## Key properties and parameters:


**Probability Density Function (PDF):**

Given a base distribution with cumulative distribution function (CDF) and probability density mass/function (PDF). The pdf of a Truncated distribution is:

$$
\begin{cases}
0 & \text{for } x < \text{lower}, \\
\frac{\text{PDF}(x)}{\text{CDF}(upper) - \text{CDF}(lower)}
& \text{for } \text{lower} <= x <= \text{upper}, \\
0 & \text{for } x > \text{upper},
\end{cases}
$$

where `lower` and `upper` are the lower and upper bounds of the truncated distribution, respectively.

**Cumulative Distribution Function (CDF):**

The given expression can be written mathematically as:


$$
\begin{cases}
0 & \text{if } x_i < \text{lower} \\
1 & \text{if } x_i > \text{upper} \\
\frac{\text{CDF}(x_i) - \text{CDF}(\text{lower})}{\text{CDF}(\text{upper}) - \text{CDF}(\text{lower})} & \text{if } \text{lower} \leq x_i \leq \text{upper}
\end{cases}
$$

where `lower` and `upper` are the lower and upper bounds of the truncated distribution, respectively.


```{seealso}
:class: seealso


**Related Distributions:**

- [Censored](censored_distribution.md) - In a censored distribution, values outside the range are not recorded, while in a truncated distribution, they are set to the nearest bound.
- [TruncatedNormal](truncated_normal_distribution.md) - A truncated normal distribution is a normal distribution that has been restricted to a specific range.

```

## References

- Wikipedia - [Truncated distribution](https://en.wikipedia.org/wiki/Truncated_distribution)
10 changes: 5 additions & 5 deletions preliz/distributions/censored.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ class Censored(DistributionTransformer):
.. math::

\begin{cases}
0 & \text{for } x < lower, \\
\text{CDF}(lower) & \text{for } x = lower, \\
\text{PDF}(x) & \text{for } lower < x < upper, \\
1-\text{CDF}(upper) & \text {for} x = upper, \\
0 & \text{for } x > upper,
0 & \text{for } x < \text{lower}, \\
\text{CDF}(\text{lower}) & \text{for } x = \text{lower}, \\
\text{PDF}(x) & \text{for } \text{lower} < x < \text{upper}, \\
1-\text{CDF}(\text{upper}) & \text {for } x = \text{upper}, \\
0 & \text{for } x > \text{upper},
\end{cases}

.. plot::
Expand Down
8 changes: 4 additions & 4 deletions preliz/distributions/truncated.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ class Truncated(DistributionTransformer):
.. math::

\begin{cases}
0 & \text{for } x < lower, \\
\frac{\text{PDF}(x, dist)}{\text{CDF}(upper, dist) - \text{CDF}(lower, dist)}
& \text{for } lower <= x <= upper, \\
0 & \text{for } x > upper,
0 & \text{for } x < \text{lower}, \\
\frac{\text{PDF}(x)}{\text{CDF}(upper) - \text{CDF}(lower)}
& \text{for } \text{lower} <= x <= \text{upper}, \\
0 & \text{for } x > \text{upper},
\end{cases}

.. plot::
Expand Down
Loading