-
Notifications
You must be signed in to change notification settings - Fork 39
What's New in Baleen 2.7.0
Baleen 2.7.0 was released on 8th May 2019. The release notes for v2.7.0 lists all the changes in this release, but this page contains additional examples and information of significant changes.
This enables content extractors to use Baleen Resources, which was not previously possible.
This is a potentially breaking change for third party collection readers which specifically depend of a content extractor
The yaml pipeline syntax has also changed but in this case backwards compatibility has been maintained, so existing pipeline yaml files should still be valid. For example the following two yaml files are equivalent and valid
collectionreader:
class: FolderReader
folders: input
contentextractor: TikaContentExtractor
collectionreader:
class: FolderReader
folders: input
contentExtractor: TikaContentExtractor
Using the new syntax content extractor parameters may be specified as
contentextractor:
class: CsvContentExtractor
contentColumn: 2
Baleen 2.6.0 introduced the ability to include common sets of pipeline entities within a nested yaml file, for example the following yaml file would include a list of annotators
annotators:
-include: path/to/my/annotators.yml
It is now also possible to include an entire map section, for example
include: collectionreader.yml
annotators:
- insert (or include) annotators here
consumers:
- insert (or include) consumers here
where collectionreader.yml contains
collectionreader:
class: FolderReader
folders: input