This project, as detailed in finale.ipynb
, focuses on two primary aspects: Entity Merging and Relationship Extraction, Knowledge graph validation and Refinement. These components are integral to processing and analyzing complex data structures in a specific domain.
This section of the project deals with the process of entity canonicalization, duplicate detection, and the merging of entities, as outlined in the report.
The project employs an ensemble method, incorporating elements of Stanford OpenIE, to extract relationships from the data.
This project uses manual validation of knowledge graph and then refined that by passing addition argument like publication id.
- Python 3.x
- Jupyter Notebook
- Required Python libraries (ast, gensim, ipywidgets, itertools, json, nltk, numpyls, networkx, os, pandas, matplotlib, random, signal, scikit-learn, stanza, stanford_openie, subprocess, tensorflow)
- Clone the repository or download the
finale.ipynb
notebook. - Install Python 3.x and Jupyter Notebook if not already installed.
- Install the required Python libraries by running the command: pip install library-name.
- Download glove/GoogleNews-vectors-negative300.bin which is pre-trained Word2Vec model.
- install stanza, stanford_openie and CoreNLPClient. Also insure to change local host port.
- Open Jupyter Notebook in your environment.
- Navigate to the location of
finale.ipynb
. - Open the notebook and run the cells sequentially to see the results.
The notebook is divided into two main sections:
- Entity Merging: Here, the notebook goes through the process of entity canonicalization, followed by duplicate detection and entity merging.
- Relationship Extraction: This section describes the ensemble method used for extracting relationships, utilizing Stanford OpenIE.
- Validation of KG and Refinement: Validated th KG manually and then refined them. For refinement uses the publication id and make entoty unique by publication id and have pass those entoty with the abstract of that perticular publication id.
No License
Project has been devloped by 3 developers. 2 developers used mac and 1 used windows. Section 9-14 may cause problem in mac. So install comaptible library.