Understanding patterns of loneliness in older long-term care users using natural language processing with free text case notes
This repository relates to the 2024 work, Understanding patterns of loneliness in older long-term care users using natural language processing with free text case notes by Sam Rickman, Jose-Luis Fernandez, and Juliette Malley. It will be updated with a link to the paper upon publication.
Loneliness and social isolation are distressing for individuals and a predictor of mortality. Evidence about the impact of loneliness and isolation on publicly funded long-term care usage is limited as there is little data indicating whether individuals using care services are lonely or socially isolated. Recent developments in natural language processing have made it possible to extract information from electronic care records, which contain large quantities of free text notes. We identify loneliness or social isolation from free text by analysing pseudonymised administrative care records containing 1.1m free text case notes about 3,046 older adults recorded in a London council between November 2008 and August 2020. We use three natural language processing methods to represent the labelled notes as vectors: document-term matrices, word embeddings and transformers. The most accurate model used a bidirectional transformers architecture. Evaluated on a test set of unseen sentences this model had an
The paper trains a machine learning model to determine whether free text about users of adult social care indicates that they are lonely or socially isolated. This repository includes:
- The final classification model, which can be run on large volumes of free text to generate such classifications, as well as some synthetic data.
- A minimal data set for the tables and figures in the paper, i.e.
csv
files of values used to build figures and tables, and R code to reproduce the tables.
The model is reproducible as it is encapsulated in a Docker container. The requirements are:
- Docker: To install Docker, follow the instructions at Docker's official site.
- API Request Tool: A method for making API requests, such as curl. The container runs the model on an open port (by default, 8000) on the local host, and you can make API requests to the model from that machine.
- Clone the repository:
git clone https://github.com/samrickman/lonelinessmodel.git
- Navigate into the directory:
cd lonelinessmodel
- Build the Docker image:
docker build . -t lonelinessimage
- Run the Docker container:
docker run -d --rm --name lonelinessmodel -p 8000:8000 lonelinessimage
Step four opens port 8000 on the container to port 8000 on the local machine. If you wish to use a different port on the local machine, replace the command with:
docker run -d --rm --name lonelinessmodel -p <local-port>:8000 lonelinessimage
You can send API requests to the model by uploading a file to be classified to the upload
endpoint. The parameters the upload route takes are:
file
: Required. String (file path). The notes to classify. For an example of the expected format, seesample_notes.csv
.out_file
: Optional. String (file path). This is a file in the container where the output is saved. By default, this is./csv_out/sentence_df.csv
. We recommend piping the results of the API response to a local file instead of using this parameter.anon_mask_file
: Optional. String (file path). If data is pseudonymized, this provides words to replace pseudonymized tokens during pre-processing to aid tokenization.overwrite
: Optional. Boolean. Whether to overwrite data in the container if it has been previously used for classification.
Personally I find this easier to understand when I see the code. If you feel similarly, the function signature for the upload method appears as follows:
def upload(
file: UploadFile = File(...),
out_file: str = "./csv_out/sentence_df.csv",
anon_mask_file: UploadFile = None,
overwrite: bool = Form(False)
):
Assuming the app is running on the default port:
-
Basic Usage: Upload a file (
app/sample_notes.csv
), generate predictions, and write the output to a local file (results.json
).curl -X 'POST' \ 'http://127.0.0.1:8000/upload' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@app/sample_notes.csv;type=text/csv' \ > results.json
-
Using a Different Set of Constants: Replace pseudonymized tokens with a different set of constants (
./app/example_diff_constants.json
).curl -X 'POST' \ 'http://127.0.0.1:8000/upload' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@app/sample_notes.csv;type=text/csv' \ -F 'anon_mask_file=@app/example_diff_constants.json;type=application/json' \ > results.json
-
Overwriting Data: Overwrite any data already in the container.
curl -X 'POST' \ 'http://127.0.0.1:8000/upload' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@app/sample_notes.csv;type=text/csv' \ -F 'anon_mask_file=@app/example_diff_constants.json;type=application/json' \ -F 'overwrite=true' \ > results.json
The container should return the appropriate error in most cases, such as "file not found." However, if there is an unexpected error, you may see Internal Server Error
. In such cases, there should be more detailed output within the container. You can find this by running:
docker logs lonelinessmodel
(assuming your container is named lonelinessmodel
). If the error is not immediately clear, please raise a GitHub issue.
This project builds upon several works licensed under the GNU General Public License v3.0, so is therefore also licensed under GPL.
The full text of the GNU General Public License can be found in the LICENSE file.
This project also makes use of the several libraries licensed under the MIT, BSD and Apache License 2.0. For the full text of these licenses and which software they refer to, please refer to the LICENSE
file.
If you use this code in your research, please cite:
Rickman, S. (2024). Understanding patterns of loneliness in older long-term care users using natural language processing with free text case notes (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.13934375