Jiwar is an open-source Python tool for generating orthographic, phonological, and phonographic measures across 40+ languages.
- Supports 40+ languages
- Calculates orthographic, phonological, and phonographic neighborhood measures
- User-friendly command-line interface
- Includes built-in and custom corpus files
- Click HERE to use an interactive Google colab notebook.
- This online notebook helps users start using Jiwar without installing it on their devices.
-
Clone the repository:
git clone https://github.com/AlaaAlzahrani/Jiwar.git cd Jiwar
-
Create and activate a virtual environment:
-
For Windows:
virtualenv -p python3 venv .\venv\Scripts\activate.ps1
-
For macOS and Linux:
python3 -m venv venv source venv/bin/activate
-
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Run Jiwar:
python jiwar.py
- Prepare your input file (csv, xlsx, txt, tsv) with a 'word' column.
- Run
python jiwar.py
and follow the prompts. - Select your desired language and measures.
- Jiwar will process your input and save the results.
Measure | Description |
---|---|
N (Neighborhood Size) | Number and forms of words that differ from the target word by one letter/phoneme via substitution only |
Density | Number and forms of words that that differ from the target word by one letter/phoneme via substitution, addition, or deletion |
OLD20/PLD20/PGLD20 | Average Levenshtein distance of the 20 closest neighbors to the target word |
C (Clustering Coefficient) | Measures the extent to which a given word's neighbors are also neighbors of each other |
Neighborhood Frequency | Descriptive statistics (Mean, SD) about the frequencies of neighboring words |
- Jiwar supports 40 languages with built-in corpus, and around 90 language varieties with custom corpus.
- For languages without a built-in corpus, you'll need to provide a custom corpus to use Jiwar.
For more detailed instructions and examples, check out our fully documented guide here:
Jiwar is licensed under the GNU General Public License v3.0.
Copyright 2024 Alaa Alzahrani
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
If you use Jiwar in your research, please cite:
@preprint{Alzahrani:2024:jiwar,
title = "{Jiwar: A database and calculator for word neighborhood measures in 40 Languages}",
author = {Alaa Alzahrani},
year = "2024",
note = "Preprint"
}