Skip to content

nlp-tlp/MaintKG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

79 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MaintKG: Automated Maintenance Knowledge Graph Construction

Python 3.9 License: MIT Code style: black pre-commit

MaintKG (Maintenance Knowledge Graph) is a framework for automatically constructing knowledge graphs from maintenance work order records. It processes CMMS (Computerized Maintenance Management System) records to create structured, graph-based knowledge representations.

πŸš€ Features

  • Automated knowledge graph construction from maintenance records
  • Built-in normalization and information extraction (NoisIE)
  • Neo4j integration for graph storage and querying
  • Comprehensive data processing pipeline

πŸ“‹ Table of Contents

πŸ”§ Installation

  1. Clone the Repository

    git clone https://github.com/nl-tlp/maintkg.git
    cd maintkg
  2. Set Up Virtual Environment

    python -m venv env
    # On Unix/macOS:
    source env/bin/activate
    # On Windows:
    .\env\Scripts\activate
  3. Install Dependencies

    pip install -e .
    pip install -r requirements.txt

πŸ“¦ Prerequisites

  • Python 3.9+
  • Neo4j Database Server
  • PyTorch (CUDA-enabled recommended)
  • Virtual Environment (recommended)

πŸ”‘ Environment Variables

Update the .env file in the project root with your own configuration if you wish to create MaintKG from your own data. Otherwise the default will create the default graph.

# Input Settings
INPUT__CSV_FILENAME='your_file.csv'
INPUT__ID_COL='id'
INPUT__TYPE_COL='type'
# ... other settings

# Full configuration example available in `.env`

πŸ’» Usage

  1. Prepare Your Data

    • Place your CMMS data in the ./input directory
    • Configure column mappings by updating the .env file.
  2. Run the Pipeline

    python ./src/maintkg/main.py
  3. View Results

    • Generated knowledge graphs are stored in Neo4j
    • Output files are saved in ./output/YYYY-MM-DD_HH_MM-SS-MM/

πŸ“Project Structure

maintkg/
β”œβ”€β”€ cache/                          # Cache directory
β”‚   └── .gitkeep                    # Placeholder for git
β”œβ”€β”€ input/                          # Input data directory
β”‚   └── README.md                   # Input data specifications
β”œβ”€β”€ notebooks/                      # Jupyter notebooks
β”‚   β”œβ”€β”€ assets/                     # Notebook resources
β”‚   β”‚   β”œβ”€β”€ images/                 # Visualization images
β”‚   β”‚   └── data/                   # Sample datasets
β”‚   └── example_queries.ipynb       # MaintKG competency queries
β”œβ”€β”€ output/                         # Generated artifacts
β”‚   β”œβ”€β”€ .gitkeep
β”‚   └── YYYY-MM-DD_HH_MM-SS-MM/    # Timestamped outputs
β”œβ”€β”€ src/                           # Source code
β”‚   β”œβ”€β”€ maintkg/                   # Core MaintKG package
β”‚   β”‚   β”œβ”€β”€ __init__.py           # Package initialization
β”‚   β”‚   β”œβ”€β”€ builder.py            # Graph construction logic
β”‚   β”‚   β”œβ”€β”€ main.py              # Entry point script
β”‚   β”‚   β”œβ”€β”€ models.py            # Data models and schemas
β”‚   β”‚   β”œβ”€β”€ settings.py          # Configuration management
β”‚   β”‚   └── utils/               # Utility functions
β”‚   β”œβ”€β”€ noisie/                   # NoisIE package
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ download_checkpoint.py  # Model checkpoint downloader
β”‚       β”œβ”€β”€ lightning_logs/      # Model checkpoints
β”‚       β”‚   └── .gitkeep
β”‚       β”œβ”€β”€ data/                # MaintNormIE corpus
β”‚           └── README.md        # Data documentation

β”œβ”€β”€ .git/                        # Git repository
β”œβ”€β”€ .gitignore                   # Git ignore patterns
β”œβ”€β”€ .pre-commit-config.yaml      # Pre-commit hooks
β”œβ”€β”€ requirements.txt             # Project dependencies
β”œβ”€β”€ pyproject.toml              # Project configuration
β”œβ”€β”€ LICENSE                     # MIT License
└── README.md                   # Project documentation

πŸ€– NoisIE Model

NoisIE is a sequence-to-sequence normalization and semantic information extraction model that processes raw maintenance text into high-quality semantically structured output using specialised tags for normalisations, entities, and relations.

Pretrained Model Setup

By default, the MaintKG process uses a pretrained NoisIE checkpoint. To use the pretrained NoisIE checkpoint:

python ./src/noisie/download_checkpoint.py

This will:

  • Create the ./src/noisie/lightning_logs/ directory
  • Download and verify the model checkpoints
  • Make the model available for the MaintKG pipeline

Training Custom Models

Prerequisites

  • Dataset Access: The original MaintNormIE dataset used in the thesis research requires special access. Please contact us to:
    • Access the MaintNormIE dataset
    • Use MaintNormIE for pretraining your own models
    • Discuss custom training requirements

To retrain NoisIE on the MaintNormIE dataset or to use it as pretraining for your own dataset, please contact us.

Dataset Format

Training data should be in JSONL format with paired input-output examples:

{
    "input": "1570-3week service 2-3/3/10",
    "output": "<entity> service <activity>"
}
{
    "input": "pedestal bearing 3 guage faulty",
    "output": "<norm> guage [ gauge ] <relation> faulty <state> gauge <object> has patient <relation> pedestal bearing <object> bearing <object> is a <relation> pedestal bearing <object> gauge <object> has part"
}

The input-output pairs follow these conventions:

  • Input: Raw maintenance text
  • Output: Linearized text with semantic tags:
    • <norm>: Normalization annotations
    • <entity>: Entity spans
    • <relation>: Relationship markers

For detailed information about the tagging scheme, please refer to the thesis documentation.

Training Steps

  1. Data Preparation:

    • Place your JSONL dataset in ./src/noisie/data/
    • Update the data path in train.py:
      # In ./src/noisie/train.py
      data_path = base_dir / "data" / "your_dataset.jsonl"
  2. Start Training:

    python ./src/noisie/train.py
  3. Monitor Progress:

    • Checkpoints and logs are saved in ./src/noisie/lightning_logs/
    • Track training progress using TensorBoard
    • Model checkpoints are saved at regular intervals

Evaluating NoisIE

Important

Status Update: The evaluation pipeline is currently undergoing final refinements and code review. For immediate evaluation needs, please see ./model_data.py::evaluate_model.

πŸ—„οΈ Neo4j Database

Installation

  1. Download Neo4j

    • Get Neo4j Desktop or use Docker:
      docker run \
        --name maintkg-neo4j \
        -p 7474:7474 -p 7687:7687 \
        -e NEO4J_AUTH=neo4j/password \
        neo4j:4.4
  2. Configure Database

    # Default credentials in .env
    NEO4J__URI=bolt://localhost:7687
    NEO4J__USERNAME=neo4j
    NEO4J__PASSWORD=password
    NEO4J__DATABASE=neo4j

Thesis Reference Database

To explore the exact database used in the MaintKG thesis:

  1. Download the dump file:

  2. Restore the database:

    # Using neo4j-admin
    neo4j-admin load --from=/path/to/dump.dump --database=neo4j
    
    # Or with Docker
    docker exec maintkg-neo4j \
      neo4j-admin load --from=/imports/dump.dump --database=neo4j
  3. Access the database:

Example Queries

Example queries that correspond to the competency questions (CQs) outlined in the MaintKG thesis chapter can be found in ./notebooks/example_queries.ipynb.

🀝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork & Clone

  2. Create Feature Branch

    git checkout -b feature/amazing-feature
  3. Follow Commit Convention

    <type>(<scope>): <subject>
    
    Types:
    - feat: New feature
    - fix: Bug fix
    - docs: Documentation
    - style: Formatting
    - refactor: Code restructuring
    - test: Testing
    - chore: Maintenance
    
  4. Submit PR

    • Ensure tests pass
    • Update documentation
    • Follow code style guidelines

πŸ“„ License

This project is licensed under the MIT License - see LICENSE for details.

πŸ” Attribution

If you use MaintKG in your research, please cite:

COMING SOON

πŸ™ Acknowledgments

This work was made possible by the Australian Research Centre for Transforming Maintenance through Data Science.

πŸ“§ Contact

For questions, support, or collaboration:

About

Maintenance Knowledge Graph (MaintKG)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published