Skip to content

Releases: nicolay-r/AREkit

AREkit-0.25.1

07 Dec 11:55
Compare
Choose a tag to compare

img-arekit-0-25-1-small

Changeset

Major

Native batching has been enabled in document parsing.

The latter means that all the queries are grouped in batches. Those components that support batching would be handled in the related mode, while the other just sequentially.

List of the release related updates #550

Minor changes

🪶 Lightweight the framework

Moved Resources

Removed sampling-related components

Full Changelog: v0.25.0-rc...v0.25.1-rc

AREkit-0.25.0

27 Feb 11:07
Compare
Choose a tag to compare

Release notes

Full Changelog: v0.24.0-rc...v0.25.0-rc

Support Batching for effecting imputing LLM into text processing pipelines

Previosly, the whole text processing pipeline was relying on the sentence / text part.
Now we overcome that liimitation and therefore we can consider multiple sentences, formed in list i.e. batch.
This step is so important for LLM, LM, neural networks, for which batching accelerates the performance.
As the result, overall pipeline launching is expected to perform faster.

Sources collections are no longer going to be a part of AREkit ✨
Tha allow us to lightweight 🪶 the overall framework and so that purely focus on data processing techniques

  • #537
  • Remove requests library dependency 🪶
  • Move all the tutorials 📚 to the AREkit-ss project. 🪶

Flexibility and Performance Enhancements

Fixed bugs

  • 🔧 RowCacheStorageProvider fixed bug with mismatching size of type list and columns list in case of other force collected columns (ad4312c)

Minor Updates

Minor

Changeset

Implemented enhancements:

  • SamplesIO.create_target -- provide this parameter as function [ARElight backlog] #547
  • No input support for pipelines Launcher #546
  • _get_text is no longer needed #544
  • TermsSplitterParser -- is no longer required [ARElight backlog] #543
  • Partitioning -- fancy last operations of the SentenceObjectsParserPipelineItem which has no longer application [ARElight backlog] #542
  • SentenceObjectsParserPipelineItem -- rename to the ObjectsParserPipelineItem concept #541
  • Pipelines -- refactoring core concept, source customization selection for ppl items #539
  • Pipelines -- Batching sentences in document parser [ARElight backlog] #535
  • Graph-based sampler #495

Closed issues:

  • Provide link to the DEMO ARElight as a technical reference documentation #549
  • Pipeline.run might be just a concept of launchers, there is no need to combine storage of items with run operation #540
  • SQlite-based readers and storage providers #538
  • Sources Movement in AREkit-ss [including the related dependencies] #537

* This Changelog was automatically generated by github_changelog_generator

AREkit-0.24.0

07 Nov 12:43
Compare
Choose a tag to compare

Improvements

  • #527
  • automate NoFolding support, easy API usage 🔥 #466
  • 🔧 #417
  • 🔧 #489
  • 🔧 #502 because of #501 were fixed
  • 🔧 #510
  • #503
  • #507
  • remove everything related to applications and related framework if everything will be OK with the paper (0.23.1 as well)
  • #520
  • 🔧 #526

Generalization

Changes and Simplifications

  • #517
  • ❌ Drop support of reading grouped Opinions (#491 and #492 related) [the related unit-test was optional and has been removed as well)
  • #483
  • #376

Minor

AREkit-0.23.1

19 Jun 14:07
Compare
Choose a tag to compare

Main Updates

  • #439
  • #447
  • fixed : #440
  • new: 🔥 #459
  • moving evaluation module outside 🔧 #449 (new separate project)
  • utils: #467
  • universal API for proof-of-concept

Full Changelog

Implemented enhancements:

  • NativeCsvWriter -- sync deliimiter with other CSV formatters #486

v0.23.1-rc (2023-06-02)

Full Changelog

Implemented enhancements:

  • filters=[] -- consider the case of None by default [Paper feedback] #479
  • opinions=[] -- simplify usage of API [paper feedback] #478
  • BaseSerializerPipelineItem -- required by arekit-ss #476
  • Neural Network Serializer -- rows_provider should be declared outside [paper backlog/arekit_ss project] #475
  • Streaming -- support JSON output format #474
  • RuAttitudesDocumentProvider -- refactor to follow the structure of the rest resources #470
  • Support None for get_doc_existed_opinion_func [user/paper feedback] #469
  • SynonymsCollection -- setup default value of iter_group_values_lists to [] #468
  • DOC_ID column -- remove int type limitation #463
  • Streaming -- provide header column names for CSV #462
  • tqdm -- display amount of processed documents in progress-bar [Project Gutenberg backlog] #461
  • OpinionCollection -- iter_sentiment method is not in use anymore #456
  • OpinionCollection -- the case of None for opinion results in incomplete initialization #455
  • OpinionCollection -- copy method is not in use anymore #454
  • OpinionCollection -- consider opinions=[] by default in, i.e. empty collection. #453
  • synonyms.py -- is empty and might be removed [QUICK check and fix] #451
  • Pandas -- completely remove dependencies #450
  • BertTextBTemplates -- switch name to prompts #446
  • RuSentRel -- embed train and test indices in collection #444
  • SentiNEREL -- entity filter #443
  • SentiNEREL -- move from another project [NIVTS project backlog, RuSentNE competitions] #439

Fixed bugs:

  • Network module -- context constant has a predefined text value which is limited for networks only #485
  • read_ruattitudes_to_brat_in_memory -- case of keep_doc_ids_only==True causes exception #482
  • prompt -- object non subscriptable #481
  • fill -- in case of None rows count tqdm throws exception #458
  • create_sample_provider -- misused parameter #445
  • CroppedBertSampleRowProvider -- might crop with references outside of the bounds [googletranslate-feedback] #440

Closed issues:

  • Shortening to RuSentRelOpinions.iter_from_doc #480
  • InputTextOpinionProvider -- rename to ContentsProvider #473
  • RuSentiFramesCollection.read -- rename method read_collection to read [paper feedback] #472
  • DocumentOperation -- provide directory-based document provider by default [Project Gutenberg feedback] #467
  • Stream writing #459
  • dist_in_sent=0 by default #452
  • Evaluation -- is not a part of the AREkit soon #449
  • Prompting -- collect base classes that allows such input processing #447
  • SentiNEREL -- move split_fixed.txt into the data SentiNEREL data archive. #442
  • What's new in 0.23.0 #401

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator

AREkit-0.23.0-ChineseNY

21 Jan 11:15
a2f6fe8
Compare
Choose a tag to compare

What's new: Globalization and Internalization

arekit-chinese-ny-1

Globalization for any language is the major aspect of 0.23.0, since we annou
nce AREnets and sample-transfer
We tend to generalize some aspects in order to consider other languages than original one (Russian).
We introduce CompoundEntities which may include other entities.

Major

Fixed bugs

  • Refactored BRAT parser, fixed bugs for other languages/collections.

Minor

Full Changelog

Implemented enhancements:

  • PipelineContext -- support parent contexts in case of the nested pipelines. #433
  • Idle mode -- provide such flag into main pipeline #432
  • MapPipelineItem -- provide ctx parameter in order to reach out parent Pipeline Context [Idle mode] #431
  • NetworkSerializer -- support the case of Vectorizers==Null [Without embedding, google-trans-sampler backlog] #430
  • ParsedRow -- depends on pandas, while it might be switched to dict type instead [AREnets backlog] #427
  • Remove unused code after AREnets movement #425
  • AREnets -- separated project for networks contrib part, which provides NN implementation based on Tensorflow #423
  • Entity -- Adopt DisplayValue property for CSV serialization #419
  • TsvWriter -- Remove Dataframe dependency #408
  • OpenNREJsonWriter -- df.sort is not an inplace by default #407
  • NeuralNetworkModelIO -- simplify implementation #406
  • Brat -- support nested entities (CompoundEntity type) [simple implementation] #398
  • What's New -- 0.22.1 Release #323

Fixed bugs:

  • Brat -- incorrect parsing approach may sometimes results in a wrong value might be mismatched (use t) #437
  • VocabRepositoryUtils -- numpy API considers # by default in vocabulary on load #428
  • LabelsScaler -- uint dict and dict might have different sizes #426

Closed issues:

  • read_ruattitudes_to_brat_in_memory -- no need to pass label scaler #436
  • PosTags -- make them optional parameter for neural networks #435
  • RuSentiFrames -- clarify tqdm caption when loading (ARElight backlog) #434
  • Sync with AREnets updates #429
  • BERT -- provide cropped sampler #422
  • googletrans -- move to the separeted project #421
  • _provide_sentence_terms -- consider s_ind and t_ind as well since they may combined with and modified at the same time [nivts_project backlog] #420
  • Entity -- provide DisplayValue property (which is Value by default) #418
  • googletrans -- TranslatorPipelineItem for parsed texts #416
  • Instant downloading -- simplify data downloading #413
  • PandasBasedRowsStorage -- implement the nested type from the BaseRowsStorage #410
  • Readers/Writers -- make a part of the contrib #409
  • TextOpinion Annotation -- particular filtering rules for SentiNEREL and Russian texts. [pipeline items] #404
  • Evalution -- enhancing error log analysis #400
  • Statistical Folding provided via file #399
  • Balancing as a side part of the Storage #380

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator

arekit-0.22.1

06 Sep 08:36
Compare
Choose a tag to compare

Release Notes 🎉

arekit-21-1-0-s

Full Changelog

WHAT'S NEW:

  • 📓 Provide BRAT-based reader (refactoring) of documents and mentioned entities in it! 🥳
  • 🔧 Provide verbose treatment of values for SynonymsCollection (#327)
  • 🔧 Fixed embedding issues for Entity type for neural networks (#308)
  • 🔧 Refactoring RuSentRel reader, which is now repesents an ontop build over BRAT. (#287)
  • 🔧 Attitude annotation performed on a fly within a pipeline! (#281)
  • 🔧 Opinion annotation does not depend on the experiment (#250)
  • 🔧 #347
  • 🆕 added utils contrib part and there were moved 🥳
    - evaluation (2-3 scale)
    - cv-splittings (#324)
    - entity formatters
    - synonyms collections templates: stemmer-based
    - experiment handlers (#325)
    - np_utils -- utils to interact with np-serialized data (#348)
    - pipelines ➿ for opinions extraction and data serialization, text processing: we are now able to declare a custom pipeline and adopt serialization for a variety of RE tasks
    (#322),
    (#326)
    (#351)
  • 🆕 API for conversion of external text_opinions into parsed_news (#338)
  • 🆕 API for a variety of pipelines for data preparation, depending on DataType (#343)
  • 🆕 DataType now includes Dev and Etalon by default (#345)
  • 🆕 Evaluation refactoring, and support TextOpinion level results evaluation (#355)
  • 🗑️ experimential_rusentrel contrib part removed (#321)
  • 🗑️ OpinionRowsProvider should be removed [ARElight backlog] (#282)
  • fixed: #356

Implemented enhancements:

  • RuSentiFrames stat -- move script from source to the related UnitTest dir #391
  • Vocabulary for Embedding -- save it in .txt format. #388
  • BratSentence -- entities should be initialized via parameter #383
  • ModelIO -- move vocab and embedding related API to EmbeddingIO #382
  • BERT -- formatter differs only in TextB. #381
  • Provide JSON writer for OpenNRE library #378
  • ExperimentSerializationContext -- some parameters might be optional [Remove them] #369
  • ExperimentSerializationContext -- Annotator property is not used. #368
  • DocumentOperations -- iter_doc_ids actually wraps the ExperimentContext functionality #367
  • iter_tagget_doc_ids -- this might be treated as iter_doc_ids of an another instance #366
  • ExperimentIterationHandler -- switch to the PipelineItem for NN and BERT serialization [Remove ExperimentEngine and ExperimentHandler] #365
  • FixedFolding -- intersected parts are not supported [NIVTS project backlog] #364
  • InputDataSerializationHelper -- refactoring #362
  • exp_io.balance_samples-- remove Dependency from DataType.Train #360
  • NeuralNetwork -- for the fine-tunning it is impossible to pick a default embedding/vocabulary. #359
  • Evaluation -- support results evaluation for TextOpinion #355
  • DefaultOpinionAnnotator -- etalon_opinion logic might be moved outside [Remove DataType dependency, backlog] #354
  • StatesCount, StateIndex and iter_states of BaseDataFolding -- this is a part of CV-based method #353
  • Evaluator refactoring #352
  • Processing module -- Multiple Languages Scaling [Eng/Rus] [Contents Relocation] #351
  • ExperimentContext -- remove Evaluator from the base class. #349
  • np_utils -- move from networks to utils contrib part #348
  • StringWithEmbeddingNetworkTermMapping -- has hard-coded algorithms for tokens and terms embedding creation. #347
  • Existed in Embedding -- log (remove print) #346
  • DataType -- provide Dev and Etalon default types [QUICK fix] #345
  • Data Serialization -- update API that allow to provide a particular pipeline processor for each DataType [Backlog] #343
  • Model io utils -- move into contrib part #342
  • Engine -- provide states iterator as a parameter instead of DataFolding #341
  • Brat -- provide stability #340
  • BaseParsedNewsServiceProvider -- support conversion from Entity to DocumentEntity #338
  • OpinionEntityType -- this should be generalized #335
  • BratTextEntitiesParser and StringPartitioning -- nested entities are not supported. [Temp fix] #334
  • RuAttitudesLabelConverter -- required only for conversion (not for parsing) #332
  • SentenceOpinion -- no need to store entity values #331
  • Utils -- provide opinion converters from brat #330
  • RuAtttitudes -- move SentenceOpinion to brat #329
  • BratEntityCollectionHelper -- extract_entities considering for rows prefixed with T #328
  • SynonymsCollection -- value_to_group_id_func does not support expansion by default. #327
  • BERT and Network Serialization -- refactoring duplicated serialization implementations #322
  • exp_joined -- removed such experiment at experiment_rusentrel contrib #321
  • rusentrel_experiment -- organize a separated python project #320
  • "Uknown}" -- specific to RuSentRel entity case #319
  • BertExperimentInputSerializerIterationHandler -- Simplify API [Blog example backlog] #318
  • BaseRowsStorage -- consider rows shuffling [ARElight backlog] #316
  • EntityIds -- expected to be a part of the BaseSampleRowProvider [ARElight backlog] #312
  • iter_synonym_groups [Sources]-- refactor to common method [ARElight backlog] #310
  • term-embedding-pairs -- refactor chain of the parameter dependencies. #304
  • Move EntityFormatters outside #302
  • Sources -- RusentRel collection based on brat toolkit serialization format #287
  • BaseOpinionsRowProvider -- useless class and hence should be removed [refactoring IOUtils] #282
  • IOUtils -- replace experiment instance (and dependency) with string provider. #252
  • Annotator and algorithm is not related to experiment. #250
  • DocumentOperations -- parsed docs related API is not related to the expetiment concepts. #249
  • Remove sep_doc_id variable #131
  • Update Framework Description #74

Fixed bugs:

  • StringWithEmbeddingNetworkTermMapping -- map_token is expected a particular type of embedding which return embedding only #395
  • NetworksTrainingPipelineItem -- pass labels count #379
  • BertDefaultStringTextTermsMapper -- non masked entity values might be with separation between words #377
  • iter_rows_linked_by_text_opinions -- fixed bug with incorrect check. Removed doc-related check. #356
  • TextOpinion should be a part of a single sentence -- this limitation is not emphasized in any way of exceptions and assertions #339
  • BaseParsedNewsServiceProvider -- incorrect IDs assignation #337
  • Example -- Documents become mixed [RuAt...
Read more

arekit-0.22.0

17 Mar 11:46
Compare
Choose a tag to compare

Release Notes 🎉

  • Pipelines integration!
    • Utilized now in text processing, which now could be deleted onto tokenization, entities assignation, frames assignation stages.
  • Repositories for opinions and network input samples!
  • Storage kernel customizations support for opinion and samples! Using Pandas by default.
  • Opinion-related service turn into providers: pairs, opinions, text-opinions, etc.

NOTE: issue #232 has been moved to the next release.
This version does not support RuAttitudes collection news parsing!
Will be fixed in the upcomming project.

Changelog

v0.22.0-rc (2022-03-17)

Full Changelog

Changes

Implemented enhancements:

  • create_term_embedding -- Embedding algorithm based on parts requires useless check #298
  • UnitTests -- BertOntoNotes is no longer below the core processing #293
  • SingleLabelScaler -- provide [QUICK] #291
  • BRAT visualization -- support processing in case of multiple documents. #286
  • Entity -- IDs Refactoring #280
  • BaseSampleRowProvider -- provide sentence id #279
  • BRAT tool -- adopt ui as a callback for the predict pipeline #275
  • ExperimentIterationHandler -- add Labeled Output Samples convertion to OpinionCollection #270
  • InferenceContext -- split bags and samples extraction from a single method [Quick] #268
  • DataFolding -- organize united data folding. #267
  • BaseDataFolding -- iter_index is not related to the base implementation #266
  • DataFolding -- move into experiment context #264
  • DataIO (exp_data var) -- rename it to ExperimentContext #263
  • ExperimentIterationHandler (Callback before) -- organize ExperimentEvaluationCallback #262
  • NetworkCallback -- this callback should not inherit experiment base Callback #261
  • Neural Network Hidden states writers and providers refactoring #260
  • TrainingCallback -- separate onto TrainingTerminationCallback and HiddenWriterCallback classes. #259
  • BaseTensorflowModel -- simplify fit and predict operations. #258
  • LabeledCollection -- remove is_empty and reset_labels api #257
  • NetworkCallback -- move train/predict notification info into callback #256
  • Tensorflow saver -- move the related logic outside of the model implementation #255
  • DefaultSingleLabelAnnotationAlgorithm -- single label is not a part of the algo #244
  • ThreeScaleTaskAnnotator -- rename and move into core. #243
  • Data/output -- create pipelines directory with the related output processing #240
  • Examples -- document parsing executes twicely #239
  • Might be utilized pipeline implementation #238
  • OpinionsProvider -- performs two actions, including ids assignation #236
  • entity_to_group_func -- BaseExperiment should not provide this method. #235
  • TextOpinionHelper -- to news/parsed/providers (implement the latter as a provider) #233
  • DefaultSingleLabelAnnotationAlgorithm -- iter_opinion duplicates the generalized pair opinion pair creation approach #231
  • Common languages dir -- move its contents into processing contrib. #229
  • Linked Text Opinions Refactoring. #228
  • Lemmatization should be a part of the frames processing pipeline stage #226
  • DefaultTextParser -- this class is actually a Tokenizer #225
  • News -- text-opinions provider and entities access API might be a part of a ParsedNews by means of NewsParser (new class) #224
  • StringLabelsFormatter -- switch to label_types instead of label instances. #223
  • AnnotationAlgorithm -- iter_opinions requires EntitiesCollection while the latter utilized for entities iteration #222
  • TextParseOptions -- add keep_tokens #221
  • FrameVariantsParser -- return modified terms only #218
  • FramesAnnotation -- is_inverted flag and processing shoult be a pipeline item #217
  • FramesCollection -- use FrameConnotationProvider instead #216
  • FrameVariantsParser -- move into processing subfolder. #215
  • OpinionOperations -- remove try_read_annotated_opinion_collection #213
  • DocumentOperation -- unify iter_doc_ids operation into one with tag parameter. #212
  • OpinionOperations -- move readers* into IO. #211
  • OpinionCollectionsProvider -- serialization should not be a part of this class #210
  • data -- separate data-related information from the experiment #209
  • BaseInputReader -- class stores _df, however it should replaced with BaseRowsStorage #207
  • Repositories -- fill method should be a part of a storage rather than provider. #204
  • BaseStorage -- exclude save method into separated class BaseRowsWriter #202
  • Experiments -- rename formats to api (QUICK) #201
  • Embedding and Vocabulary -- organize Storage/Repository with serialize/load operations. #200
  • Sample -- remove dependency from DefaultNetworkConfig. #199
  • BaseOutputFormatter -- both provider and formatter mixes df usage #198
  • OpinionProvider -- remove dependency from Opinion and Document Operation instances. #197
  • Repositiories -- add this class which unite all the providers for data writing #195
  • Add column providers #194
  • NetworkSampleFormatter -- switch to provider #193
  • BaseSampleStorage -- use store_labels instead of data_type passing (QUICK) #192
  • NetworkOutputEncoder -- separate formatting from serialization. #191
  • BaseSampleFormatter -- __create_row is not relted to the Formatter, should be moved. #190
  • BaseDocumentStatGenerator -- provider depends on IO files. #189
  • OpinonFormatter -- use the latter in experiment io. #188
  • News -- remove return_text parameter from iter_sentences method (QUICK) #187
  • BaseRowsFormatter -- move format method in another class #185
  • BaseSampleFormatter -- _iter_sentence_terms should not be a part of this class. (QUICK) #184
  • BaseSampleFormatter -- _provide_rows behavior depends on row_ids_provider instance type. #182
  • BaseSampleFormatter -- remove data_type parameter from ctor #181
  • BaseObjectParser -- parse method should return object of the same type as sentence #179
  • News -- remove entities_parser instance from News class. #178
  • BaseEntitiesParser -- generalize to BaseObjectsParser. [#177](https://github.com/nicolay-r/AREkit/issu...
Read more

arekit-0.21.0

15 Aug 10:14
Compare
Choose a tag to compare

Changelog

v0.21.0-rc (2021-08-15)

Full Changelog

Implemented enhancements:

  • Sources -- clarify do_overwrite and refactor check_uniqueness flags RuSentiFrames #150
  • Compose Python Library #145
  • Sources -- provide local storage at home directory #144
  • Enum -- clarify enum34 package using instead of the enum. #143
  • OpinionCollectionsFormatter -- support to save/load only supported by label_formatter opinions #139
  • UnitTests -- gather all tests into single folder #125
  • BaseAnnotator -- intialize method is useless as the passed parameters requires only at serialize_missed_collections method. #123
  • NeutralAnnotator -- Rename to annotator, as neutral prefix is related to a specifics of the particular task #122
  • NeutralAnnot -- use a predefined template for names, based on labels count, instead of Name property #121
  • DefaultNeutralAlgo -- provide dist in sentence parameter #120
  • NeutralAnnot -- Two/Three scale annotators considered to be a part of the related experiment #119
  • Evaluation Metrics -- such functions considered to be a part of the particular experiment #115
  • Embedding -- set_stemmer method is not declared in base class #114
  • FrameVariantsCollection -- remove stemmer from __init__ params. #113
  • Bag (NeuralNetworks) -- label could be presented as uint. #110
  • experiment_rusentrel -- Group all folders by a single exp prefix #108
  • BaseModel -- Replace epochs_count parameter with generalized parameter structure. #107
  • OpinionCollection -- provide set of supported labels (opinion filtration by labels) #106
  • LabelCalculationMode -- make it enum #105
  • BaseModel -- replace epochs_count with model options #104
  • ThreeLabelsScaler -- remove dependecies of the latter in NeuralNetwork contrib #103
  • RuAttitudes -- use int_to_label function instead of label scaler #102
  • Labels -- Move Scaler into common/labels #101
  • Labels -- Provide a unique labels for the partucular experiment in contrib #100
  • Experiments -- reorganize rusentrel experiments data within the related new folder #97

Fixed bugs:

  • RuAttitudes-v1.2. -- fix downloading link #155
  • sources -- Remove data folder #149
  • Entity -- type could be None while there is no restriction for that #148
  • RuSentRelOpinionCollectionFormatter -- label could not be found during neural network training. #137
  • frame_variant -- label scaler receives NoLabel while experiment based on NeutralLabel #136
  • BaseEvaluator -- opinion labels might be incompatible with the one utilized in ResultEvaluator. #124

Closed issues:

  • UnitTests -- Run all unit tests via bash script #156
  • Remove release_notes.md file and move the related content into Releases descriptions. #146
  • Tutorial -- Clarify on how we perform optimization #90

AREkit-0.20.5

29 Jul 12:08
Compare
Choose a tag to compare

Release

Fixed:

  • Using custom check of duplicated opinions during OpinionCollection initialization.
    Changes:
  • Speed-up and engine optimizations:
    • Optionally loading neutral annotator.
  • Multi-Instance networks: now we consider that the next appered context always continues the prior.
    (check out multi-instance bags creation for details)
  • Now shuffling in models performed for bags, not for bag groups.
  • Networks: added allow_growth=True flag for tensorflow based neural networks.
    Memory fraction parameter has been removed.

Collection of parsed news become dispatched from text opinions collection.

  • News parsing now is assumed to be performed using TextParser.parse(news, options) call. Related refactoring.
    • Stemmer application from RuAttitiudes parser has been removed.
  • Removed dependency from RelatedParsedNewCollection in TextOpinionCollection.
  • Labeling now separated from LinkedTextOpinion collection.
  • ParsedText class has been refactored, removed unused methods. Keep tokens has been discarded.
  • BERT tsv-format-encoders are now in a Factory (at contrib directory).
  • Fixed: RuSentRelTextOpinion replaced with TextOpinion, and independent from OpinionRef.
  • Single/Multi models now are not exist, as the latter prefixes affects only onto batch types selection. Refactoring.

AREkit-0.20.4

29 Jul 11:32
7e92387
Compare
Choose a tag to compare

Release Notes

  • Labels conversion to_str and from_str now a part of external formatters (unique for each source, experiment, etc.).
  • Added labels-scaler, and labels casing (to int or uint) now depends on scaler;
  • Added bert exporter in contribution folder: with related formatters according to the [paper]:
    • NLI -- (Natural language inference) format, assumes to provide an additional sentence, which describes
      attitude should be extracted
    • QA -- (Question answering) provides an additional question onto attitude sentiment.
      With Label encoding in following format:
    • Multiple -- all the supported sentiment labels (positive, negative, neutral)
    • Binary -- (YES, NO) according to mention (additional sentence), provided by NLI and QA formatters.
  • Refactoring experiments in order to apply the latter also for classifiers (models from scikit-learn)
  • Updated nn-engine API
  • Refactoring tf-based neural network implementation.
  • Bert now moved into separated folder from contrib directory.
  • frame_variants moved to frames directory.
  • Frame variants labeling in news now performed during parse operation.
  • DataType now enumeration. List of Supported data-types now a part of experiment
    The latter were moved onto sample level.
  • Service folder removed as the latter assumes to be apart of this repository.