-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathREADME.Rmd
174 lines (139 loc) · 5.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
output:
rmarkdown::github_document: default
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# rlena
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](http://www.repostatus.org/#wip)
[![Travis build status](https://travis-ci.org/HomeBankCode/rlena.svg?branch=master)](https://travis-ci.org/HomeBankCode/rlena)
[![Coverage status](https://coveralls.io/repos/github/HomeBankCode/rlena/badge.svg)](https://coveralls.io/r/HomeBankCode/rlena?branch=master)
## Overview
The rlena package makes it easy to work with LENA `.its` files in R.
The *Language Environment ANalysis (LENA)* system produces automatic annotations
for audio recordings. Its annotations can be exported as `.its` files, that
contain an `xml` structure. The rlena package helps with importing and preparing
`.its` files for further anlysis in R.
The `read_its_file()` fuction takes the path to an `.its` file and returns an
XML object. All other functions work with this object to extract information.
They return tidy data frames that can easily be manipulated
with the usual tools from the tidyverse:
* `gather_recordings()` - Returns all `<Recording>` nodes
* `gather_blocks()` - Returns all `<Conversation>` and `<Pause>` nodes
* `gather_conversations()`- Returns all `<Conversation>` nodes
* `gather_pauses()` - Returns all `<Pause>` nodes
* `gather_segments()` - Returns all `<Segment>` nodes
* `gather_ava_info()` - Returns AVA (Automatic Vocalization Assessment) info
* `gather_child_info()` - Returns child info (e.g. birth data, age, gender)
:warning: ***Warning:** This package is early work in progress and subject to
potentially code-breaking change.*
## Installation
```{r eval = FALSE}
# rlena is not (yet) available on CRAN, but you can install the developmental
# version from GitHub (requires devtools):
if(!require(devtools) install.packages("devtools")
devtools::install_github("HomeBankCode/rlena", dependencies = TRUE)
```
## Usage
- Load an `.its` file. For this example we download a file from
[HomeBankCode/lena-its-tools](https://github.com/HomeBankCode/lena-its-tools).
```{r, warning = FALSE}
library(rlena)
library(dplyr, warn.conflicts = FALSE)
# Download the example ITS file
url <- "https://cdn.rawgit.com/HomeBankCode/lena-its-tools/master/Example/e20160420_165405_010572.its"
tmp <- tempfile()
download.file(url, tmp)
its <- read_its_file(tmp)
```
- Extract child info and results of the Automatic Vocalization Assessment (an
index of the child's language development).
```{r}
gather_child_info(its)
gather_ava_info(its)
```
- Extract the recording information.
```{r}
# Each row corresponds to an uninterrupted recording. There is a long pause
# inbetween the two recordings. This means, that the LENA recorder was paused.
recordings <- gather_recordings(its)
recordings
```
- Extract all conversations.
```{r}
# Each row of the returned data frame corresponds to
# one conversation node in the `.its` file. The columns contain the node's
# attributes, such as the number of adult words and child vocalizations.
conversations <- gather_conversations(its)
conversations
```
- Plot male and female adult word counts.
```{r conversation-demo, fig.width = 7, fig.height = 4, warning = FALSE}
library(tidyr)
library(ggplot2)
# Create long data-frame of word counts
word_counts <- conversations %>%
select(conversation_nr = blkTypeId,
time = startClockTimeLocal,
female = femaleAdultWordCnt,
male = maleAdultWordCnt) %>%
gather(key = speaker, value = count, female, male) %>%
filter(count != 0)
# Add acumulated word count
word_counts <- word_counts %>%
group_by(speaker) %>%
arrange(conversation_nr) %>%
mutate(count_acc = cumsum(count))
# Plot word counts per conversations
word_counts %>%
ggplot(aes(conversation_nr, count, color = speaker)) +
geom_point() +
labs(title = "Adult Word Count per Conversation",
x = "Conversation Number",
y = "Words")
# Plot accumulating word count over time
word_counts %>%
ggplot(aes(time, y = count_acc, color = speaker)) +
geom_rect(data = recordings, inherit.aes = FALSE,
aes(xmin = startClockTimeLocal, xmax = endClockTimeLocal,
ymin=0, ymax = max(word_counts$count_acc), group = recId,
fill = "recorder on"), alpha=0.2) +
scale_fill_manual(NULL, values = 'skyblue') +
geom_line() +
labs(title = "Accumulated Adult Word Count throughout Recording",
x = "Time",
y = "Words (Accumulated)")
```
- What happened between 3pm and 4.30pm? We can have a closer look at the
segments to see what happened.
```{r warning = FALSE}
library(lubridate)
# extract segments
segments <- gather_segments(its) %>%
filter(startClockTimeLocal >= ymd_hms("2016-04-02 15:00:00"),
startClockTimeLocal <= ymd_hms("2016-04-02 16:30:00"))
segments
```
```{r segment-demo, fig.width = 7, fig.height = 4, warning = FALSE}
segments %>%
mutate(duration = endTime - startTime,
Label = forcats::fct_collapse(
spkr,
KeyChild = c("CHN", "CHF"),
Speech = c("FAN", "MAN", "CXN", "FAF", "MAF", "CXF"),
TV = c("TVN", "TVF"),
Noise = c("NOF", "NON", "OLF", "OLN"),
Silence = c("SIL"))) %>%
group_by(Label) %>%
summarize(duration = sum(duration)) %>%
ggplot(aes(Label, duration / 60, fill = Label)) +
geom_col(show.legend = FALSE) +
labs(title = "The sound Environment between 3pm and 4.30pm",
y = "Duration (minutes)", x = NULL)
```