This repository focuses on the implementation of the n-gram algorithm, an essential technique in natural language processing (NLP) and computational linguistics. The n-gram model predicts the occurrence of a word based on the previous 'n' words in a sequence of text, often used in various text-related applications such as language modeling, speech recognition, and spelling correction.
The primary reference paper for the n-gram algorithm implementation is titled "Real-World Trajectory Sharing with Local Differential Privacy" authored by Teddy Cunningham, Graham Cormode, Hakan Ferhatosmanoglu, and Divesh Srivastava. Published in the Proceedings of the VLDB Endowment (PVLDB) in 2021 (Volume 14, Issue 11), the paper introduces a local differentially private mechanism based on perturbing hierarchically-structured, overlapping n-grams of trajectory data. The DOI for the paper is 10.14778/3476249.3476280.
This repository serves as a comprehensive guide for implementing the n-gram algorithm, a fundamental approach in text analysis and processing. The n-gram model, known for its simplicity and effectiveness in capturing sequential patterns in text, is implemented here to aid in understanding and applying this technique in various NLP-related tasks.
The repository structure is organized as follows:
-
code: Contains the primary implementation of the n-gram algorithm.
- util: Stores utility functions essential for the algorithm implementation.
-
data: Manages datasets for experimentation and evaluation.
- preprocessed: Stores preprocessed text data in preparation for n-gram analysis.
- results: Holds the output files generated during the algorithm execution.
- notebooks: Contains Jupyter notebooks demonstrating the n-gram algorithm in action.
Ensure all prerequisites listed in the 'requirements.txt' file are installed before runing the implementation.
Execute the provided scripts or Jupyter notebooks to run the n-gram algorithm on the given datasets. Follow the instructions provided in the respective files for detailed guidance.
We extend our gratitude to the authors of the referenced paper. The code implementation in this repository draws inspiration from their groundbreaking work.
- Arun Ashok Badri : [email protected]
- Sandhya V : [email protected]
- Pavan Kumar J : [email protected]
Feel free to modify or expand any section to suit your specific requirements.