Releases: nicolay-r/AREkit
AREkit-0.25.1
Changeset
Major
Native batching has been enabled in document parsing.
The latter means that all the queries are grouped in batches. Those components that support batching would be handled in the related mode, while the other just sequentially.
List of the release related updates #550
Minor changes
🪶 Lightweight the framework
Moved Resources
Removed sampling-related components
Full Changelog: v0.25.0-rc...v0.25.1-rc
AREkit-0.25.0
Release notes
Full Changelog: v0.24.0-rc...v0.25.0-rc
Support Batching
for effecting imputing LLM into text processing pipelines
Previosly, the whole text processing pipeline was relying on the sentence
/ text part.
Now we overcome that liimitation and therefore we can consider multiple sentences, formed in list i.e. batch.
This step is so important for LLM, LM, neural networks, for which batching accelerates the performance.
As the result, overall pipeline launching is expected to perform faster.
Sources collections are no longer going to be a part of AREkit ✨
Tha allow us to lightweight 🪶 the overall framework and so that purely focus on data processing techniques
- #537
- Remove
requests
library dependency 🪶 - Move all the tutorials 📚 to the
AREkit-ss
project. 🪶
Flexibility and Performance Enhancements
Fixed bugs
- 🔧
RowCacheStorageProvider
fixed bug with mismatching size of type list and columns list in case of otherforce
collected columns (ad4312c)
Minor Updates
- ❌ Removed
OpinionsIO
(76b4c1f) - ❌ Removed suffix
-0
in filenames for samples. (76b4c1f) - ❌ #543
- ❌ #544
- ❌ #547
Minor
- #135 (No longer available)
- Appropriate formatting of unit tests (https://github.com/nicolay-r/ARElight/blob/main/test/test_translation.py)
- 🔧 #137
- 🔧 #138 (No longer available)
Changeset
Implemented enhancements:
SamplesIO.create_target
-- provide this parameter as function [ARElight backlog] #547- No input support for pipelines Launcher #546
_get_text
is no longer needed #544TermsSplitterParser
-- is no longer required [ARElight backlog] #543Partitioning
-- fancy last operations of theSentenceObjectsParserPipelineItem
which has no longer application [ARElight backlog] #542SentenceObjectsParserPipelineItem
-- rename to theObjectsParserPipelineItem
concept #541Pipelines
-- refactoring core concept,source
customization selection for ppl items #539- Pipelines -- Batching sentences in document parser [ARElight backlog] #535
- Graph-based sampler #495
Closed issues:
- Provide link to the DEMO ARElight as a technical reference documentation #549
- Pipeline.run might be just a concept of launchers, there is no need to combine storage of items with
run
operation #540 - ➕
SQlite
-based readers and storage providers #538 - Sources Movement in AREkit-ss [including the related dependencies] #537
* This Changelog was automatically generated by github_changelog_generator
AREkit-0.24.0
- 🔥 #296
Improvements
- #527
- automate
NoFolding
support, easy API usage 🔥 #466 - 🔧 #417
- 🔧 #489
- 🔧 #502 because of #501 were fixed
- 🔧 #510
- #503
- #507
- remove everything related to applications and related framework if everything will be OK with the paper (0.23.1 as well)
- #520
- 🔧 #526
Generalization
Changes and Simplifications
- ❌ #517
- ❌ Drop support of reading grouped Opinions (#491 and #492 related) [the related unit-test was optional and has been removed as well)
- ❌ #483
- ❌ #376
Minor
AREkit-0.23.1
Main Updates
- #439
- #447
- fixed : #440
- new: 🔥 #459
- moving
evaluation
module outside 🔧 #449 (new separate project) - utils: #467
- universal API for proof-of-concept
Implemented enhancements:
NativeCsvWriter
-- sync deliimiter with other CSV formatters #486
v0.23.1-rc (2023-06-02)
Implemented enhancements:
filters=[]
-- consider the case of None by default [Paper feedback] #479opinions=[]
-- simplify usage of API [paper feedback] #478BaseSerializerPipelineItem
-- required byarekit-ss
#476Neural Network Serializer
--rows_provider
should be declared outside [paper backlog/arekit_ss project] #475- Streaming -- support
JSON
output format #474 RuAttitudesDocumentProvider
-- refactor to follow the structure of the rest resources #470- Support
None
forget_doc_existed_opinion_func
[user/paper feedback] #469 SynonymsCollection
-- setup default value ofiter_group_values_lists
to[]
#468DOC_ID
column -- removeint
type limitation #463- Streaming -- provide header column names for CSV #462
tqdm
-- display amount of processed documents in progress-bar [Project Gutenberg backlog] #461OpinionCollection
--iter_sentiment
method is not in use anymore #456OpinionCollection
-- the case ofNone
foropinion
results in incomplete initialization #455OpinionCollection
--copy
method is not in use anymore #454OpinionCollection
-- consideropinions=[]
by default in, i.e. empty collection. #453- synonyms.py -- is empty and might be removed [QUICK check and fix] #451
- Pandas -- completely remove dependencies #450
BertTextBTemplates
-- switch name to prompts #446- RuSentRel -- embed train and test indices in collection #444
- SentiNEREL -- entity filter #443
- SentiNEREL -- move from another project [NIVTS project backlog, RuSentNE competitions] #439
Fixed bugs:
Network
module -- context constant has a predefinedtext
value which is limited for networks only #485read_ruattitudes_to_brat_in_memory
-- case ofkeep_doc_ids_only==True
causes exception #482prompt
-- object non subscriptable #481fill
-- in case ofNone
rows counttqdm
throws exception #458create_sample_provider
-- misused parameter #445CroppedBertSampleRowProvider
-- might crop with references outside of the bounds [googletranslate-feedback] #440
Closed issues:
- Shortening to
RuSentRelOpinions.iter_from_doc
#480 InputTextOpinionProvider
-- rename toContentsProvider
#473RuSentiFramesCollection.read
-- rename methodread_collection
toread
[paper feedback] #472DocumentOperation
-- provide directory-based document provider by default [Project Gutenberg feedback] #467- Stream writing #459
dist_in_sent=0
by default #452- Evaluation -- is not a part of the AREkit soon #449
- Prompting -- collect base classes that allows such input processing #447
- SentiNEREL -- move
split_fixed.txt
into the data SentiNEREL data archive. #442 - What's new in 0.23.0 #401
Merged pull requests:
- CVE-2007-4559 Patch #412 (TrellixVulnTeam)
* This Changelog was automatically generated by github_changelog_generator
AREkit-0.23.0-ChineseNY
What's new: Globalization and Internalization
Globalization for any language is the major aspect of 0.23.0, since we annou
nce AREnets
and sample-transfer
We tend to generalize some aspects in order to consider other languages than original one (Russian).
We introduce CompoundEntities
which may include other entities.
Major
- Nested/Compound entities support! #398
- Detaching
networks
contrib module #423 -> AREnets - Appearance of transfer: https://github.com/nicolay-r/arekit-googletrans-sampler
Fixed bugs
- Refactored BRAT parser, fixed bugs for other languages/collections.
Minor
Implemented enhancements:
PipelineContext
-- supportparent
contexts in case of the nested pipelines. #433- Idle mode -- provide such flag into main pipeline #432
MapPipelineItem
-- providectx
parameter in order to reach out parent Pipeline Context [Idle mode] #431- NetworkSerializer -- support the case of
Vectorizers==Null
[Without embedding, google-trans-sampler backlog] #430 - ParsedRow -- depends on
pandas
, while it might be switched todict
type instead [AREnets backlog] #427 - Remove unused code after AREnets movement #425
AREnets
-- separated project fornetworks
contrib part, which provides NN implementation based on Tensorflow #423Entity
-- AdoptDisplayValue
property for CSV serialization #419- TsvWriter -- Remove
Dataframe
dependency #408 - OpenNREJsonWriter --
df.sort
is not an inplace by default #407 - NeuralNetworkModelIO -- simplify implementation #406
- Brat -- support nested entities (
CompoundEntity
type) [simple implementation] #398 - What's New -- 0.22.1 Release #323
Fixed bugs:
- Brat -- incorrect parsing approach may sometimes results in a wrong value might be mismatched (use
t
) #437 VocabRepositoryUtils
--numpy
API considers#
by default in vocabulary on load #428- LabelsScaler -- uint dict and dict might have different sizes #426
Closed issues:
read_ruattitudes_to_brat_in_memory
-- no need to pass label scaler #436PosTags
-- make them optional parameter for neural networks #435- RuSentiFrames -- clarify
tqdm
caption when loading (ARElight backlog) #434 - Sync with AREnets updates #429
BERT
-- provide cropped sampler #422googletrans
-- move to the separeted project #421_provide_sentence_terms
-- considers_ind
andt_ind
as well since they may combined with and modified at the same time [nivts_project backlog] #420- Entity -- provide DisplayValue property (which is
Value
by default) #418 googletrans
-- TranslatorPipelineItem for parsed texts #416- Instant downloading -- simplify data downloading #413
- PandasBasedRowsStorage -- implement the nested type from the
BaseRowsStorage
#410 - Readers/Writers -- make a part of the contrib #409
- TextOpinion Annotation -- particular filtering rules for SentiNEREL and Russian texts. [pipeline items] #404
- Evalution -- enhancing error log analysis #400
- Statistical Folding provided via file #399
- Balancing as a side part of the Storage #380
Merged pull requests:
- CVE-2007-4559 Patch #412 (TrellixVulnTeam)
* This Changelog was automatically generated by github_changelog_generator
arekit-0.22.1
Release Notes 🎉
WHAT'S NEW:
- 📓 Provide
BRAT-based reader
(refactoring) of documents and mentioned entities in it! 🥳 - 🔧 Provide verbose treatment of values for SynonymsCollection (#327)
- 🔧 Fixed embedding issues for
Entity
type for neural networks (#308) - 🔧 Refactoring
RuSentRel
reader, which is now repesents an ontop build over BRAT. (#287) - 🔧 Attitude annotation performed on a fly within a pipeline! (#281)
- 🔧 Opinion annotation does not depend on the experiment (#250)
- 🔧 #347
- 🆕 added
utils
contrib part and there were moved 🥳
- evaluation (2-3 scale)
- cv-splittings (#324)
- entity formatters
- synonyms collections templates: stemmer-based
- experiment handlers (#325)
- np_utils -- utils to interact with np-serialized data (#348)
- pipelines ➿ for opinions extraction and data serialization, text processing: we are now able to declare a custom pipeline and adopt serialization for a variety of RE tasks
(#322),
(#326)
(#351) - 🆕 API for conversion of external
text_opinions
intoparsed_news
(#338) - 🆕 API for a variety of pipelines for data preparation, depending on
DataType
(#343) - 🆕
DataType
now includesDev
andEtalon
by default (#345) - 🆕 Evaluation refactoring, and support
TextOpinion
level results evaluation (#355) - 🗑️
experimential_rusentrel
contrib part removed (#321) - 🗑️
OpinionRowsProvider
should be removed [ARElight backlog] (#282) - fixed: #356
Implemented enhancements:
- RuSentiFrames stat -- move script from
source
to the related UnitTest dir #391 - Vocabulary for Embedding -- save it in
.txt
format. #388 - BratSentence -- entities should be initialized via parameter #383
- ModelIO -- move vocab and embedding related API to EmbeddingIO #382
- BERT -- formatter differs only in TextB. #381
- Provide JSON writer for OpenNRE library #378
- ExperimentSerializationContext -- some parameters might be optional [Remove them] #369
ExperimentSerializationContext
--Annotator
property is not used. #368- DocumentOperations --
iter_doc_ids
actually wraps the ExperimentContext functionality #367 iter_tagget_doc_ids
-- this might be treated asiter_doc_ids
of an another instance #366ExperimentIterationHandler
-- switch to the PipelineItem for NN and BERT serialization [RemoveExperimentEngine
andExperimentHandler
] #365FixedFolding
-- intersected parts are not supported [NIVTS project backlog] #364InputDataSerializationHelper
-- refactoring #362exp_io.balance_samples
-- remove Dependency fromDataType.Train
#360- NeuralNetwork -- for the fine-tunning it is impossible to pick a default embedding/vocabulary. #359
- Evaluation -- support results evaluation for
TextOpinion
#355 DefaultOpinionAnnotator
--etalon_opinion
logic might be moved outside [RemoveDataType
dependency, backlog] #354StatesCount
,StateIndex
anditer_states
ofBaseDataFolding
-- this is a part of CV-based method #353- Evaluator refactoring #352
- Processing module -- Multiple Languages Scaling [Eng/Rus] [Contents Relocation] #351
- ExperimentContext -- remove Evaluator from the base class. #349
np_utils
-- move fromnetworks
toutils
contrib part #348StringWithEmbeddingNetworkTermMapping
-- has hard-coded algorithms for tokens and terms embedding creation. #347- Existed in Embedding -- log (remove print) #346
- DataType -- provide
Dev
andEtalon
default types [QUICK fix] #345 - Data Serialization -- update API that allow to provide a particular pipeline processor for each
DataType
[Backlog] #343 - Model io utils -- move into
contrib
part #342 Engine
-- provide states iterator as a parameter instead ofDataFolding
#341- Brat -- provide stability #340
- BaseParsedNewsServiceProvider -- support conversion from
Entity
toDocumentEntity
#338 - OpinionEntityType -- this should be generalized #335
- BratTextEntitiesParser and StringPartitioning -- nested entities are not supported. [Temp fix] #334
- RuAttitudesLabelConverter -- required only for conversion (not for parsing) #332
- SentenceOpinion -- no need to store entity values #331
- Utils -- provide opinion converters from brat #330
- RuAtttitudes -- move
SentenceOpinion
to brat #329 - BratEntityCollectionHelper --
extract_entities
considering for rows prefixed withT
#328 - SynonymsCollection --
value_to_group_id_func
does not support expansion by default. #327 - BERT and Network Serialization -- refactoring duplicated serialization implementations #322
exp_joined
-- removed such experiment atexperiment_rusentrel
contrib #321rusentrel_experiment
-- organize a separated python project #320- "Uknown}" -- specific to RuSentRel entity case #319
BertExperimentInputSerializerIterationHandler
-- Simplify API [Blog example backlog] #318- BaseRowsStorage -- consider rows shuffling [ARElight backlog] #316
- EntityIds -- expected to be a part of the BaseSampleRowProvider [ARElight backlog] #312
iter_synonym_groups
[Sources]-- refactor to common method [ARElight backlog] #310- term-embedding-pairs -- refactor chain of the parameter dependencies. #304
- Move EntityFormatters outside #302
- Sources -- RusentRel collection based on brat toolkit serialization format #287
BaseOpinionsRowProvider
-- useless class and hence should be removed [refactoring IOUtils] #282- IOUtils -- replace
experiment
instance (and dependency) with string provider. #252 - Annotator and algorithm is not related to experiment. #250
- DocumentOperations -- parsed docs related API is not related to the expetiment concepts. #249
- Remove
sep_doc_id
variable #131 - Update Framework Description #74
Fixed bugs:
StringWithEmbeddingNetworkTermMapping
--map_token
is expected a particular type of embedding which return embedding only #395- NetworksTrainingPipelineItem -- pass labels count #379
BertDefaultStringTextTermsMapper
-- non masked entity values might be withiter_rows_linked_by_text_opinions
-- fixed bug with incorrect check. Removed doc-related check. #356- TextOpinion should be a part of a single sentence -- this limitation is not emphasized in any way of exceptions and assertions #339
- BaseParsedNewsServiceProvider -- incorrect IDs assignation #337
- Example -- Documents become mixed [RuAt...
arekit-0.22.0
Release Notes 🎉
- Pipelines integration!
- Utilized now in text processing, which now could be deleted onto tokenization, entities assignation, frames assignation stages.
- Repositories for opinions and network input samples!
- Storage kernel customizations support for opinion and samples! Using Pandas by default.
- Opinion-related service turn into providers: pairs, opinions, text-opinions, etc.
NOTE: issue #232 has been moved to the next release.
This version does not support RuAttitudes collection news parsing!
Will be fixed in the upcomming project.
Changelog
v0.22.0-rc (2022-03-17)
Changes
Implemented enhancements:
create_term_embedding
-- Embedding algorithm based on parts requires useless check #298- UnitTests -- BertOntoNotes is no longer below the core processing #293
- SingleLabelScaler -- provide [QUICK] #291
- BRAT visualization -- support processing in case of multiple documents. #286
- Entity -- IDs Refactoring #280
- BaseSampleRowProvider -- provide sentence id #279
- BRAT tool -- adopt ui as a callback for the predict pipeline #275
- ExperimentIterationHandler -- add Labeled Output Samples convertion to OpinionCollection #270
- InferenceContext -- split bags and samples extraction from a single method [Quick] #268
- DataFolding -- organize united data folding. #267
- BaseDataFolding -- iter_index is not related to the base implementation #266
- DataFolding -- move into experiment context #264
- DataIO (exp_data var) -- rename it to
ExperimentContext
#263 - ExperimentIterationHandler (Callback before) -- organize ExperimentEvaluationCallback #262
- NetworkCallback -- this callback should not inherit experiment base Callback #261
- Neural Network Hidden states writers and providers refactoring #260
- TrainingCallback -- separate onto
TrainingTerminationCallback
andHiddenWriterCallback
classes. #259 - BaseTensorflowModel -- simplify
fit
andpredict
operations. #258 - LabeledCollection -- remove
is_empty
andreset_labels
api #257 - NetworkCallback -- move train/predict notification info into callback #256
- Tensorflow saver -- move the related logic outside of the model implementation #255
- DefaultSingleLabelAnnotationAlgorithm -- single label is not a part of the algo #244
ThreeScaleTaskAnnotator
-- rename and move into core. #243- Data/output -- create pipelines directory with the related output processing #240
- Examples -- document parsing executes twicely #239
- Might be utilized pipeline implementation #238
- OpinionsProvider -- performs two actions, including ids assignation #236
- entity_to_group_func --
BaseExperiment
should not provide this method. #235 - TextOpinionHelper -- to news/parsed/providers (implement the latter as a provider) #233
- DefaultSingleLabelAnnotationAlgorithm -- iter_opinion duplicates the generalized pair opinion pair creation approach #231
- Common
languages
dir -- move its contents into processing contrib. #229 - Linked Text Opinions Refactoring. #228
- Lemmatization should be a part of the frames processing pipeline stage #226
- DefaultTextParser -- this class is actually a Tokenizer #225
- News -- text-opinions provider and entities access API might be a part of a
ParsedNews
by means ofNewsParser
(new class) #224 - StringLabelsFormatter -- switch to label_types instead of label instances. #223
- AnnotationAlgorithm -- iter_opinions requires EntitiesCollection while the latter utilized for entities iteration #222
- TextParseOptions -- add
keep_tokens
#221 - FrameVariantsParser -- return modified terms only #218
- FramesAnnotation --
is_inverted
flag and processing shoult be a pipeline item #217 - FramesCollection -- use
FrameConnotationProvider
instead #216 - FrameVariantsParser -- move into processing subfolder. #215
- OpinionOperations -- remove
try_read_annotated_opinion_collection
#213 - DocumentOperation -- unify iter_doc_ids operation into one with
tag
parameter. #212 - OpinionOperations -- move readers* into IO. #211
- OpinionCollectionsProvider -- serialization should not be a part of this class #210
- data -- separate data-related information from the experiment #209
- BaseInputReader -- class stores
_df
, however it should replaced withBaseRowsStorage
#207 - Repositories -- fill method should be a part of a
storage
rather than provider. #204 - BaseStorage -- exclude
save
method into separated class BaseRowsWriter #202 - Experiments -- rename
formats
toapi
(QUICK) #201 - Embedding and Vocabulary -- organize Storage/Repository with
serialize
/load
operations. #200 - Sample -- remove dependency from DefaultNetworkConfig. #199
- BaseOutputFormatter -- both provider and formatter mixes
df
usage #198 - OpinionProvider -- remove dependency from Opinion and Document Operation instances. #197
- Repositiories -- add this class which unite all the providers for data writing #195
- Add column providers #194
- NetworkSampleFormatter -- switch to provider #193
- BaseSampleStorage -- use
store_labels
instead ofdata_type
passing (QUICK) #192 - NetworkOutputEncoder -- separate formatting from serialization. #191
- BaseSampleFormatter --
__create_row
is not relted to the Formatter, should be moved. #190 - BaseDocumentStatGenerator -- provider depends on IO files. #189
- OpinonFormatter -- use the latter in experiment io. #188
- News -- remove
return_text
parameter from iter_sentences method (QUICK) #187 - BaseRowsFormatter -- move
format
method in another class #185 - BaseSampleFormatter --
_iter_sentence_terms
should not be a part of this class. (QUICK) #184 - BaseSampleFormatter --
_provide_rows
behavior depends on row_ids_provider instance type. #182 - BaseSampleFormatter -- remove
data_type
parameter from ctor #181 - BaseObjectParser --
parse
method should return object of the same type assentence
#179 - News -- remove
entities_parser
instance from News class. #178 - BaseEntitiesParser -- generalize to BaseObjectsParser. [#177](https://github.com/nicolay-r/AREkit/issu...
arekit-0.21.0
Changelog
v0.21.0-rc (2021-08-15)
Implemented enhancements:
- Sources -- clarify
do_overwrite
and refactorcheck_uniqueness
flags RuSentiFrames #150 - Compose Python Library #145
- Sources -- provide local storage at home directory #144
- Enum -- clarify enum34 package using instead of the enum. #143
- OpinionCollectionsFormatter -- support to save/load only supported by label_formatter opinions #139
- UnitTests -- gather all tests into single folder #125
- BaseAnnotator -- intialize method is useless as the passed parameters requires only at
serialize_missed_collections
method. #123 - NeutralAnnotator -- Rename to annotator, as neutral prefix is related to a specifics of the particular task #122
- NeutralAnnot -- use a predefined template for names, based on labels count, instead of Name property #121
- DefaultNeutralAlgo -- provide dist in sentence parameter #120
- NeutralAnnot -- Two/Three scale annotators considered to be a part of the related experiment #119
- Evaluation Metrics -- such functions considered to be a part of the particular experiment #115
- Embedding -- set_stemmer method is not declared in base class #114
- FrameVariantsCollection -- remove stemmer from __init__ params. #113
- Bag (NeuralNetworks) -- label could be presented as uint. #110
- experiment_rusentrel -- Group all folders by a single
exp
prefix #108 - BaseModel -- Replace epochs_count parameter with generalized parameter structure. #107
- OpinionCollection -- provide set of supported labels (opinion filtration by labels) #106
- LabelCalculationMode -- make it enum #105
- BaseModel -- replace epochs_count with model options #104
- ThreeLabelsScaler -- remove dependecies of the latter in NeuralNetwork contrib #103
- RuAttitudes -- use int_to_label function instead of label scaler #102
- Labels -- Move Scaler into common/labels #101
- Labels -- Provide a unique labels for the partucular experiment in contrib #100
- Experiments -- reorganize rusentrel experiments data within the related new folder #97
Fixed bugs:
- RuAttitudes-v1.2. -- fix downloading link #155
- sources -- Remove data folder #149
- Entity -- type could be
None
while there is no restriction for that #148 - RuSentRelOpinionCollectionFormatter -- label could not be found during neural network training. #137
- frame_variant -- label scaler receives
NoLabel
while experiment based onNeutralLabel
#136 - BaseEvaluator -- opinion labels might be incompatible with the one utilized in ResultEvaluator. #124
Closed issues:
AREkit-0.20.5
Release
Fixed:
- Using custom check of duplicated opinions during
OpinionCollection
initialization.
Changes: - Speed-up and engine optimizations:
- Optionally loading neutral annotator.
- Multi-Instance networks: now we consider that the next appered context always continues the prior.
(check out multi-instance bags creation for details) - Now shuffling in models performed for bags, not for bag groups.
- Networks: added
allow_growth=True
flag for tensorflow based neural networks.
Memory fraction parameter has been removed.
Collection of parsed news become dispatched from text opinions collection.
- News parsing now is assumed to be performed using
TextParser.parse(news, options)
call. Related refactoring.- Stemmer application from
RuAttitiudes
parser has been removed.
- Stemmer application from
- Removed dependency from
RelatedParsedNewCollection
in TextOpinionCollection. - Labeling now separated from LinkedTextOpinion collection.
ParsedText
class has been refactored, removed unused methods. Keep tokens has been discarded.- BERT tsv-format-encoders are now in a Factory (at contrib directory).
- Fixed:
RuSentRelTextOpinion
replaced withTextOpinion
, and independent fromOpinionRef
. Single
/Multi
models now are not exist, as the latter prefixes affects only onto batch types selection. Refactoring.
AREkit-0.20.4
Release Notes
- Labels conversion
to_str
andfrom_str
now a part of external formatters (unique for each source, experiment, etc.). - Added labels-scaler, and labels casing (to int or uint) now depends on scaler;
- Added bert exporter in contribution folder: with related formatters according to the [paper]:
- NLI -- (Natural language inference) format, assumes to provide an additional sentence, which describes
attitude should be extracted - QA -- (Question answering) provides an additional question onto attitude sentiment.
With Label encoding in following format: - Multiple -- all the supported sentiment labels (positive, negative, neutral)
- Binary -- (YES, NO) according to mention (additional sentence), provided by NLI and QA formatters.
- NLI -- (Natural language inference) format, assumes to provide an additional sentence, which describes
- Refactoring experiments in order to apply the latter also for classifiers (models from scikit-learn)
- Updated nn-engine API
- Refactoring tf-based neural network implementation.
- Bert now moved into separated folder from
contrib
directory. - frame_variants moved to
frames
directory. - Frame variants labeling in news now performed during
parse
operation. DataType
now enumeration. List of Supported data-types now a part of experiment
The latter were moved onto sample level.Service
folder removed as the latter assumes to be apart of this repository.