Project Overview:
Entity extraction, also called Named Entity Recognition (NER), is an information extraction technique that identifies key elements from unstructured text and then classifies them into predefined categories. An entity can be of several types such as names, places, organizations, products, dates, email addresses, phone numbers, etc. Our ability to identify the entities within the textual documents helps us better understand the context, and identify relevant information in massive amounts of unstructured text data. In the scope of our project, we will extract information using multiple NLP baseline techniques. These baseline techniques are integrated with the state-of-the-art deep learning models to classify the extracted information into specific types of custom labels. In this project, we are mainly focused on extracting the entity of type ‘Innovation’, as in scientific inventions or discoveries. This project is performed by the Master's students of Otto-Von-Guericke University in collaboration with Mapegy GmbH, a software company in Berlin.
Project Goals:
Construct a pipeline to process the data and extract the ‘Innovations’ using the Named Entity Recognition technique. Evaluate the value-add the pipeline brings to the company’s end goal.
Project Use:
To identify the technical innovations in unstructured text in order to facilitate technology and trends. To adapt and evaluate NLP methods to capture concepts that are not exactly an ‘entity’ but similar to it. To address the challenges when the start and the end of an entity are harder to define and interrupted by other words in sentences.