Welcome to Transport and Climate Policy Miner, a powerful tool for analyzing and extracting insights from climate policy documents. This repository is designed to streamline the processing of complex textual data, making it easier to uncover key information for research, decision-making, and advocacy. 🌱📜
Transport and Climate Policy Miner leverages cutting-edge natural language processing (NLP) tools and APIs to help you:
- 🔍 Analyze climate policy documents efficiently.
- ✨ Highlight relevant text passages.
- 📊 Generate structured outputs for further analysis.
Whether you're a researcher, policymaker, or climate advocate, this tool is here to simplify your workflow and empower data-driven decisions.
Let’s get started! 🚀
Before you can run the application, you need to have the following installed on your system:
- Go to the Git for Windows download page.
- Click on the "Download" button.
- Once the installer has downloaded, open it to start the installation process.
- Go to the Anaconda Distribution page.
- Download the installer for your operating system (Windows, macOS, or Linux).
- Run the installer and follow the on-screen instructions.
If you are a first-time user, move down to the respective section of this guide. If you have cloned the repository, and installed the dependencies and gathered the API Keys already, you can just follow the subsequent steps:
Press the "Windows" button and open the "Anaconda Prompt".
Navigate to the root directory of the repository:
cd climate-policy-miner
Activate the conda environment by running:
conda activate textmining_venv
With the environment activated, you can now run the application by executing the pipeline.py script:
python -i src/pipeline.py
-
File Selection:
- When prompted:
"Paste the path or URL to the file you would like to be analyzed" - Action:
Insert the path to the policy document located in your local repository.
Alternatively, you can try pasting a URL to the document, but this option is unstable at the moment.
- When prompted:
-
Pre-processing Step:
- When prompted:
"Is the file already pre-processed? [y/n]" - Action:
Typen
and pressENTER
if it's your first time analyzing the document.
- When prompted:
-
Troubleshooting:
- If the application stops after the pre-processing step without displaying results:
- Action:
Typequit()
to exit the CLI. - Then, return to Step 5 and try again.
- This is a known issue, and in such cases, you may type
y
when prompted if the file was already pre-processed.
- Action:
- If the application stops after the pre-processing step without displaying results:
-
Output:
- Once the analysis is complete:
- A folder will be created in your repository, named after your document.
- Inside this folder, you will find the results in a subfolder called "output":
.csv
and.xlsx
files containing the retrieved data.- A
.pdf
file containing highlighted text passages.
- Once the analysis is complete:
First, you need to clone the repository to your local machine. Press the "Windows" button and open the "Anaconda Prompt". Run the following command (Copy + Paste + Enter):
git clone https://github.com/nicolas-becker/climate-policy-miner.git
Navigate to the root directory of the cloned repository:
cd climate-policy-miner
Create a new conda environment using the environment.yml file included in the repository:
conda env create -f environment.yml
Activate the newly created conda environment by running:
conda activate textmining_venv
Gather the necessary API Keys for
- Unstructure.io
- Pinecone.io
- OpenAI.com or via Azure
and insert them in the specified attributes of the .env file.
With the environment activated, you can now run the application by executing the pipeline.py script:
python -i src/pipeline.py
-
File Selection:
- When prompted:
"Paste the path or URL to the file you would like to be analyzed" - Action:
Insert the path to the policy document located in your local repository.
Alternatively, you can try pasting a URL to the document, but this option is unstable at the moment.
- When prompted:
-
Pre-processing Step:
- When prompted:
"Is the file already pre-processed? [y/n]" - Action:
Typen
and pressENTER
if it's your first time analyzing the document.
- When prompted:
-
Troubleshooting:
- If the application stops after the pre-processing step without displaying results:
- Action:
Typequit()
to exit the CLI. - Then, return to Step 5 and try again.
- This is a known issue, and in such cases, you may type
y
when prompted if the file was already pre-processed.
- Action:
- If the application stops after the pre-processing step without displaying results:
-
Output:
- Once the analysis is complete:
- A folder will be created in your repository, named after your document.
- Inside this folder, you will find the results in a subfolder called "output":
.csv
and.xlsx
files containing the retrieved data.- A
.pdf
file containing highlighted text passages.
- Once the analysis is complete:
If you've previously cloned the repository and want to pull in the latest updates, follow these steps:
-
Navigate to the Repository Folder
Open your terminal (Anaconda Prompt) and change to the directory where the repository is located:
cd climate-policy-miner
-
Fetch and Merge the Latest Changes
Run the following command to fetch the latest updates and merge them into your local branch:
git pull origin main
Note: If your local repository has changes that conflict with the update, Git may prompt you to resolve them before proceeding.
-
Update the Conda Environment
If the new update includes changes to the environment.yml file, update your Conda environment:
conda env update -f environment.yml
CP-Miner.Demo_V3_S.mp4
- Purpose: Pre-processes and extracts clean, structured text from raw files (PDFs, Word documents, etc.).
- Usage Tip: Ensure that your documents are in formats supported by the API. Refer to the Unstructured.io documentation for more details.
- Key Note: For large files or complex documents, processing times may vary. This projects applies the Free API. Please refer to Free Unstructured API for futher information on API access and limitations.
- Purpose: Performs advanced natural language processing tasks, including summarization and extracting key insights from text.
- Usage Tip: Use this API to tailor analyses to specific questions or objectives. For example, you can extract sections of text related to "emissions targets" or "policy impacts."
- Key Note: Keep track of your token usage when using the OpenAI API, especially for large-scale analyses. Refer to the OpenAI documentation for managing your API calls effectively. Please refer to OpenAI Pricing or Azure OpenAI Service for pricing information.