Skip to content

Commit

Permalink
Merge pull request #52 from Forced-Alignment-and-Vowel-Extraction/dev
Browse files Browse the repository at this point in the history
v0.3.0
  • Loading branch information
JoFrhwld authored Apr 3, 2024
2 parents c05c64f + ce07e95 commit 655bc81
Show file tree
Hide file tree
Showing 30 changed files with 731 additions and 94 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,4 +162,5 @@ cython_debug/

notebooks/
.vscode/
poetry.lock
poetry.lock
cov.xml
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# Getting started with `fave-recode`


![PyPI](https://img.shields.io/pypi/v/fave-recode.png)
[![codecov](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode/graph/badge.svg?token=C23B1H3DAX)](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode)
[![Maintainability](https://api.codeclimate.com/v1/badges/2375ddfef5d77ba1681d/maintainability.png)](https://codeclimate.com/github/Forced-Alignment-and-Vowel-Extraction/fave-recode/maintainability)
[![FAVE Python
CI](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml/badge.svg?branch=dev)](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml)
[![Build
Docs](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/build-docs.yml/badge.svg)](https://forced-alignment-and-vowel-extraction.github.io/fave-recode/)
[![DOI](https://zenodo.org/badge/605740158.svg)](https://zenodo.org/badge/latestdoi/605740158)

The idea behind `fave-recode` is that no matter how much you may adjust
the dictionary of a forced-aligner, you may still want to make
Expand Down Expand Up @@ -41,6 +44,7 @@ fave_recode --help
-d, --output_dest PATH An output directory

Other options:
-a, --parser TEXT Label set parser. Built in options are cmu_parser
-s, --scheme TEXT Recoding scheme. Built in options are cmu2labov
and cmu2phila [required]
-r, --recode_stem TEXT Stem to append to recoded TextGrid file names
Expand All @@ -59,7 +63,7 @@ ls data
KY25A_1.TextGrid josef-fruehwald_speaker.TextGrid

``` bash
fave_recode -i data/josef-fruehwald_speaker.TextGrid -s cmu2phila
fave_recode -i data/josef-fruehwald_speaker.TextGrid -s cmu2phila -a cmu_parser

ls data
```
Expand Down
7 changes: 4 additions & 3 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ title: Getting started with `fave-recode`
engine: jupyter
format: gfm
---

![PyPI](https://img.shields.io/pypi/v/fave-recode)
[![codecov](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode/graph/badge.svg?token=C23B1H3DAX)](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode)
[![Maintainability](https://api.codeclimate.com/v1/badges/2375ddfef5d77ba1681d/maintainability)](https://codeclimate.com/github/Forced-Alignment-and-Vowel-Extraction/fave-recode/maintainability)
[![FAVE Python CI](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml/badge.svg?branch=dev)](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml)
[![Build Docs](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/build-docs.yml/badge.svg)](https://forced-alignment-and-vowel-extraction.github.io/fave-recode/)
[![DOI](https://zenodo.org/badge/605740158.svg)](https://zenodo.org/badge/latestdoi/605740158)

The idea behind `fave-recode` is that no matter how much you may adjust the dictionary of a forced-aligner, you may still want to make programmatic changes to the output.

Expand Down Expand Up @@ -47,13 +48,13 @@ ls data
```

```bash
fave_recode -i data/josef-fruehwald_speaker.TextGrid -s cmu2phila
fave_recode -i data/josef-fruehwald_speaker.TextGrid -s cmu2phila -a cmu_parser

ls data
```
```{python}
#| echo: false
!fave_recode -i docs/getting-started/data/josef-fruehwald_speaker.TextGrid -s cmu2phila
!fave_recode -i docs/getting-started/data/josef-fruehwald_speaker.TextGrid -s cmu2phila -a cmu_parser
!ls docs/getting-started/data
```

Expand Down
8 changes: 8 additions & 0 deletions docs/_extensions/jofrhwld/codeblocklabel/_extension.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
title: Codeblocklabel
author: Josef Fruehwald
version: 1.0.0
quarto-required: ">=1.3.0"
contributes:
filters:
- codeblocklabel.lua

10 changes: 10 additions & 0 deletions docs/_extensions/jofrhwld/codeblocklabel/codeblocklabel.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.langname {
margin-bottom: 0%;
padding-bottom: 0%;
font-style: italic;
font-size:smaller;
}

.sourceCode[id]{
margin-top: 0%;
}
30 changes: 30 additions & 0 deletions docs/_extensions/jofrhwld/codeblocklabel/codeblocklabel.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@

-- function Div(el)
-- if el.content[1].t == "CodeBlock" then
-- return pandoc.Para("CodeBlock!")
-- end
-- end

quarto.doc.add_html_dependency({
name = 'codenamelabel',
stylesheets = {'codeblocklabel.css'}
})

function CodeBlock(block)
local newblock = block
if (FORMAT:match "html") and
(block.classes[1]) then
local langname = block.classes[1]
out = {pandoc.Div(
pandoc.RawInline("html",
"<pre class='langname'>"..block.classes[1].."</pre>"
),
pandoc.Attr("", {"langname"}, {})
),
newblock
}
else
out = newblock
end
return out
end
17 changes: 14 additions & 3 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ project:
website:
page-navigation: true
image: assets/logo.png
favicon: assets/logo.png
navbar:
left:
- file: getting-started/overview.qmd
Expand Down Expand Up @@ -35,14 +36,19 @@ website:
contents:
- getting-started/condition-attributes.qmd
- getting-started/condition-relations.qmd

- section: Labelset Parsing
contents:
- getting-started/label_set_parser.qmd

format:
html:
theme:
light: flatly
dark: darkly
light: [flatly, styles/light.scss]
dark: [darkly, styles/dark.scss]
css: styles/styles.css
toc: true
filters:
- codeblocklabel

# tell quarto to read the generated sidebar
metadata-files:
Expand Down Expand Up @@ -75,6 +81,11 @@ quartodoc:
- rule_classes.Condition
- rule_classes.Rule
- rule_classes.RuleSet
- title: Label Set Parsers
desc: Label set parsers
contents:
- labelset_parser.LabelSetParser
- labelset_parser.LabelSetParserProperties
- title: Relations
- subtitle: "`in`, `not in`"
contents:
Expand Down
69 changes: 69 additions & 0 deletions docs/getting-started/label_set_parser.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Label Set Parsers
engine: jupyter
toc: true
---

There are some properties of label sets that you might want to include in your output labels.
For example, the CMU dictionary encodes vowel stress like so:

| label | meaning |
| ---- | ---- |
| `AY0` | unstressed /ay/ |
| `AY2` | secondary stressed /ay/ |
| `AY1` | primary stressed /ay/ |

A labelset parser can make these properties available so you can write a recoding rule like so:

```yaml
- rule: ay
conditions:
- attribute: label
relation: contains
set: AY
return: ay_{stress}
```
`fave_recode` has built in parser for CMU labels called `cmu_parser` that you can include like so

```bash
fave_recode \
-i data/josef-fruehwald_speaker.TextGrid \
-s cmu2phila \
-a cmu_parser
```

## Label Set Parser Basics

A labelset parser has two top level attributes

```yaml
parser: CMU
properties: []
```

- `parser` just names the parser
- `properties` is a list of properties you wish to make available.

### A property

A single property that parses primary stress out of the cmu label would look like this:

```yaml
name: stress
updates: stress
default: ""
rules:
- rule: "1"
conditions:
- attribute: label
relation: contains
set: "1"
return: "1"
```

The `rule` component is identical to [rules for recoding](rule-scheme-basics.qmd).

The `updates` field defines the variable name you want to use to access the value "1" in our recoding rule.

Unlike a recoding rule, every segment will be given some value for "stress", so a `default` value also needs to be provided.
7 changes: 5 additions & 2 deletions docs/getting-started/overview.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ aliases:
- ../index.html
engine: jupyter
---

![PyPI](https://img.shields.io/pypi/v/fave-recode)
[![codecov](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode/graph/badge.svg?token=C23B1H3DAX)](https://codecov.io/gh/Forced-Alignment-and-Vowel-Extraction/fave-recode)
[![Maintainability](https://api.codeclimate.com/v1/badges/2375ddfef5d77ba1681d/maintainability)](https://codeclimate.com/github/Forced-Alignment-and-Vowel-Extraction/fave-recode/maintainability)
[![FAVE Python CI](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml/badge.svg?branch=dev)](https://github.com/Forced-Alignment-and-Vowel-Extraction/fave-recode/actions/workflows/test-and-run.yml)
Expand Down Expand Up @@ -48,7 +48,10 @@ ls data
```

```bash
fave_recode -i data/josef-fruehwald_speaker.TextGrid -s cmu2phila
fave_recode \
-i data/josef-fruehwald_speaker.TextGrid \
-a cmu_parser \
-s cmu2phila

ls data
```
Expand Down
2 changes: 1 addition & 1 deletion docs/objects.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"project": "fave_recode", "version": "0.0.9999", "count": 16, "items": [{"name": "fave_recode.rule_classes.Condition.check_condition", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition.check_condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Condition.validate_condition", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition.validate_condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Condition", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule.apply_rule", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule.apply_rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule.validate_rule", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule.validate_rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.apply_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.apply_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.map_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.map_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.read_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.read_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet", "dispname": "-"}, {"name": "fave_recode.relations.in_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.in_relation.html#fave_recode.relations.in_relation", "dispname": "-"}, {"name": "fave_recode.relations.not_in_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.not_in_relation.html#fave_recode.relations.not_in_relation", "dispname": "-"}, {"name": "fave_recode.relations.equals_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.equals_relation.html#fave_recode.relations.equals_relation", "dispname": "-"}, {"name": "fave_recode.relations.not_equals_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.not_equals_relation.html#fave_recode.relations.not_equals_relation", "dispname": "-"}, {"name": "fave_recode.relations.rematches_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.rematches_relation.html#fave_recode.relations.rematches_relation", "dispname": "-"}, {"name": "fave_recode.relations.reunmatches_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.reunmatches_relation.html#fave_recode.relations.reunmatches_relation", "dispname": "-"}]}
{"project": "fave_recode", "version": "0.0.9999", "count": 23, "items": [{"name": "fave_recode.rule_classes.Condition.check_condition", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition.check_condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Condition.validate_condition", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition.validate_condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Condition", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.Condition.html#fave_recode.rule_classes.Condition", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule.apply_rule", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule.apply_rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule.validate_rule", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule.validate_rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.Rule", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.Rule.html#fave_recode.rule_classes.Rule", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.apply_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.apply_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.map_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.map_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet.read_ruleset", "domain": "py", "role": "function", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet.read_ruleset", "dispname": "-"}, {"name": "fave_recode.rule_classes.RuleSet", "domain": "py", "role": "class", "priority": "1", "uri": "reference/rule_classes.RuleSet.html#fave_recode.rule_classes.RuleSet", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParser.apply_parser", "domain": "py", "role": "function", "priority": "1", "uri": "reference/labelset_parser.LabelSetParser.html#fave_recode.labelset_parser.LabelSetParser.apply_parser", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParser.map_parser", "domain": "py", "role": "function", "priority": "1", "uri": "reference/labelset_parser.LabelSetParser.html#fave_recode.labelset_parser.LabelSetParser.map_parser", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParser.read_parser", "domain": "py", "role": "function", "priority": "1", "uri": "reference/labelset_parser.LabelSetParser.html#fave_recode.labelset_parser.LabelSetParser.read_parser", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParser.validate_parser", "domain": "py", "role": "function", "priority": "1", "uri": "reference/labelset_parser.LabelSetParser.html#fave_recode.labelset_parser.LabelSetParser.validate_parser", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParser", "domain": "py", "role": "class", "priority": "1", "uri": "reference/labelset_parser.LabelSetParser.html#fave_recode.labelset_parser.LabelSetParser", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParserProperties.validate_property", "domain": "py", "role": "function", "priority": "1", "uri": "reference/labelset_parser.LabelSetParserProperties.html#fave_recode.labelset_parser.LabelSetParserProperties.validate_property", "dispname": "-"}, {"name": "fave_recode.labelset_parser.LabelSetParserProperties", "domain": "py", "role": "class", "priority": "1", "uri": "reference/labelset_parser.LabelSetParserProperties.html#fave_recode.labelset_parser.LabelSetParserProperties", "dispname": "-"}, {"name": "fave_recode.relations.in_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.in_relation.html#fave_recode.relations.in_relation", "dispname": "-"}, {"name": "fave_recode.relations.not_in_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.not_in_relation.html#fave_recode.relations.not_in_relation", "dispname": "-"}, {"name": "fave_recode.relations.equals_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.equals_relation.html#fave_recode.relations.equals_relation", "dispname": "-"}, {"name": "fave_recode.relations.not_equals_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.not_equals_relation.html#fave_recode.relations.not_equals_relation", "dispname": "-"}, {"name": "fave_recode.relations.rematches_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.rematches_relation.html#fave_recode.relations.rematches_relation", "dispname": "-"}, {"name": "fave_recode.relations.reunmatches_relation", "domain": "py", "role": "function", "priority": "1", "uri": "reference/relations.reunmatches_relation.html#fave_recode.relations.reunmatches_relation", "dispname": "-"}]}
4 changes: 4 additions & 0 deletions docs/reference/_sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ website:
- reference/rule_classes.Rule.qmd
- reference/rule_classes.RuleSet.qmd
section: Rule Classes
- contents:
- reference/labelset_parser.LabelSetParser.qmd
- reference/labelset_parser.LabelSetParserProperties.qmd
section: Label Set Parsers
- contents:
- contents:
- reference/relations.in_relation.qmd
Expand Down
9 changes: 9 additions & 0 deletions docs/reference/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ Rule application classes
| [rule_classes.Rule](rule_classes.Rule.qmd#fave_recode.rule_classes.Rule) | _A rule class_ |
| [rule_classes.RuleSet](rule_classes.RuleSet.qmd#fave_recode.rule_classes.RuleSet) | A rule set class |

## Label Set Parsers

Label set parsers

| | |
| --- | --- |
| [labelset_parser.LabelSetParser](labelset_parser.LabelSetParser.qmd#fave_recode.labelset_parser.LabelSetParser) | A labelset parser object |
| [labelset_parser.LabelSetParserProperties](labelset_parser.LabelSetParserProperties.qmd#fave_recode.labelset_parser.LabelSetParserProperties) | A property of the labelset, including rules that |

## Relations

### `in`, `not in`
Expand Down
Loading

0 comments on commit 655bc81

Please sign in to comment.