MITIE - named-entity recognition, binary relation detection, and text categorization - for PHP
- Finds people, organizations, and locations in text
- Detects relationships between entities, like
PERSON
was born inLOCATION
Run:
composer require ankane/mitie
Add scripts to composer.json
to download the shared library:
"scripts": {
"post-install-cmd": "Mitie\\Vendor::check",
"post-update-cmd": "Mitie\\Vendor::check"
}
Run:
composer install
And download the pre-trained models for your language:
Load an NER model
$model = new Mitie\NER('ner_model.dat');
Create a document
$doc = $model->doc('Nat works at GitHub in San Francisco');
Get entities
$doc->entities();
This returns
[
['text' => 'Nat', 'tag' => 'PERSON', 'score' => 0.3112371212688382, 'offset' => 0],
['text' => 'GitHub', 'tag' => 'ORGANIZATION', 'score' => 0.5660115198329334, 'offset' => 13],
['text' => 'San Francisco', 'tag' => 'LOCATION', 'score' => 1.3890524313885309, 'offset' => 23]
]
Get tokens
$doc->tokens();
Get tokens and their offset
$doc->tokensWithOffset();
Get all tags for a model
$model->tags();
Load an NER model into a trainer
$trainer = new Mitie\NERTrainer('total_word_feature_extractor.dat');
Create training instances
$tokens = ['You', 'can', 'do', 'machine', 'learning', 'in', 'PHP', '!'];
$instance = new Mitie\NERTrainingInstance($tokens);
$instance->addEntity(3, 4, 'topic'); // machine learning
$instance->addEntity(6, 6, 'language'); // PHP
Add the training instances to the trainer
$trainer->add($instance);
Train the model
$model = $trainer->train();
Save the model
$model->saveToDisk('ner_model.dat');
Detect relationships betweens two entities, like:
PERSON
was born inLOCATION
ORGANIZATION
was founded inLOCATION
FILM
was directed byPERSON
There are 21 detectors for English. You can find them in the binary_relations
directory in the model download.
Load a detector
$detector = new Mitie\BinaryRelationDetector('rel_classifier_organization.organization.place_founded.svm');
And create a document
$doc = $model->doc('Shopify was founded in Ottawa');
Get relations
$detector->relations($doc);
This returns
[['first' => 'Shopify', 'second' => 'Ottawa', 'score' => 0.17649169745814464]]
Load an NER model into a trainer
$trainer = new Mitie\BinaryRelationTrainer($model);
Add positive and negative examples to the trainer
$tokens = ['Shopify', 'was', 'founded', 'in', 'Ottawa'];
$trainer->addPositiveBinaryRelation($tokens, [0, 0], [4, 4]);
$trainer->addNegativeBinaryRelation($tokens, [4, 4], [0, 0]);
Train the detector
$detector = $trainer->train();
Save the detector
$detector->saveToDisk('binary_relation_detector.svm');
Load a model into a trainer
$trainer = new Mitie\TextCategorizerTrainer('total_word_feature_extractor.dat');
Add labeled text to the trainer
$trainer->add('This is super cool', 'positive');
Train the model
$model = $trainer->train();
Save the model
$model->saveToDisk('text_categorization_model.dat');
Load a saved model
$model = new Mitie\TextCategorizer('text_categorization_model.dat');
Categorize text
$model->categorize('What a super nice day');
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/mitie-php.git
cd mitie-php
composer install
composer test