Skip to content

The Arabic Syntactic Analyzer (ARSA): an NLP tool to analyze the syntactic features of Arabic written texts.

License

Notifications You must be signed in to change notification settings

AlaaAlzahrani/ARSA

Repository files navigation

License: MIT

Contents

Overview

The Arabic Syntactic Analyzer (ARSA) is an open-source Natural Language Processing (NLP) tool designed for the analysis of syntactic features in Arabic written texts. It is based on python and employs the camel_parser library to identify and measure 13 distinct syntactic indices, comprising 9 syntactic complexity indices and 4 syntactic fluency indices.

The ARSA tool can be applied to study the following topics:

  • Writing assessment: evaluating syntactic features in Arabic compositions
  • Text readability: investigating the linguistic accessibility of Arabic texts
  • Second language acquisition: analyzing syntactic development in Arabic learners' writing

Notable Features

  • Automatic Analysis: automatically evaluates 13 syntactic indices
  • Batch Processing: capable of analyzing multiple text files simultaneously
  • User-Friendly Interface: implemented as an interactive command-line interface (CLI) for ease of use
  • Dual Functionality: operates as both a local application and a cloud-based tool via Google Colab

Installation and Setup for Windows users

  1. Download the ARSA_notebook.ipynb from this repository

  2. Open Google Colab

  3. Upload the notebook:

    • Go to 'File' -> 'Upload notebook'
    • Select the downloaded 'ARSA_notebook.ipynb'
  4. Follow the step-by-step instructions within the notebook

Note

You can run the notebook on Mac and Linux devices.

Installation and Setup for Mac/Linux users

  1. Clone this repository:
git clone https://github.com/AlaaAlzahrani/ARSA.gitL
  1. Install the required packages:
cd ARSA/camel_parser
pip install -r requirements.txt
python download_models.py
camel_data -i morphology-db-msa-s31
camel_data -i disambig-bert-unfactored-msa
cd ..
pip install -r ARSA_requirements.txt
pip install --upgrade huggingface_hub
pip install camel-tools
  1. Analyze your texts using the ARSA tool

Run the following command:

cd path/to/ARSA/directory # change the working dircotry to the ARSA repository folder
python get_analysis.py

The command will prompt you to enter the input folder:

Please select the text file(s) folder: <write-the-input-folder-name-here>

The command will also prompt you to enter the output folder:

Please select the output folder: <write-the-output-folder-name-here>
  1. Example
cd D:/my_projects/ARSA 
python get_analysis.py
Please select the text file(s) folder: example/corpus
Please select the output folder: example/results

Note

This local installation method is currently unsupported on Windows because some dependencies of the camel parser library are incompatible with Windows.

License

This work is licensed under an MIT license.

Citation

If you use this tool, please cite the following papers to support the authors and encourage the development of open-source Arabic language processing tools:

@inproceedings{Elshabrawy:2023:camelparser,
    title = "{CamelParser2.0: A State-of-the-Art Dependency Parser for Arabic}",
    author = {Elshabrawy, Ahmed and AbuOdeh, Muhammed and Inoue, Go and Habash, Nizar},
    booktitle = {Proceedings of The First Arabic Natural Language Processing Conference (ArabicNLP 2023)},
    year = "2023"
}
@misc{Alzahrani:2024:ARSA,
    title = "{Arabic Syntactic Analyzer (ARSA): An Automated Tool for the Analysis of Arabic Written Texts}",
    author = {Alzahrani, Alaa and Alfaify, Adel},
    year = "2024",
    note = {Preprint}
}

About

The Arabic Syntactic Analyzer (ARSA): an NLP tool to analyze the syntactic features of Arabic written texts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published