Releases: IBM/unitxt
Releases · IBM/unitxt
Unitxt 1.16.4
What's Changed
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
Unitxt 1.16.3
What's Changed
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
Unitxt 1.16.2
What's Changed
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
New Contributors
- @eliyahabba made their first contribution in #1464
Unitxt 1.16.1
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
1.16.0
Main Changes
What's Changed
Usability
- Add error message when saving artifacts that got changed by @elronbandel in #1417
- A simple way to create and evaluate given a 'task' in the catalog and python data structure by @yoavkatz in #1413
- Evaluation results class for easier access to results by @elronbandel in #1326
- Eval Assist integration by @martinscooper in #1409
Documentation
- Update to new logo by @elronbandel in #1427
- Indentation within docstrings to improve appearance on web pages, on the way - eliminating two red lines from "make docs-server" by @dafnapension in #1429
- Add catalog search with tags filtering by @elronbandel in #1430
- Update catalog search engine by @elronbandel in #1431
- Add custom titles to catalog items by @elronbandel in #1432
- Change card to dataset in the catalog search tags by @elronbandel in #1433
- Updated documentation to show use of installed version and chat api by @yoavkatz in #1435
- Fix documentation for task registration example by @Etelis in #1443
Bug Fixes
- fix mistral format used in llmaj (when not using chat_api) by @lilacheden in #1425
- Fix LMMSEval Inference Engine to work with chat api and fix examples by @elronbandel in #1440
- metadata is set only once in recipe by @dafnapension in #1437
- verify only fresh artifacts are fetched by @dafnapension in #1444
- add data_classification_policy_to_clapnq by @BenjSz in #1451
CI/CD
- eliminate exceeding line_limit errors, and many red lines from "make docs-server" by @dafnapension in #1434
New Contributors
Full Changelog: 1.15.10...1.16.0
1.15.10
What's Changed
- Fix arenahard bluebench template by @perlitz in #1405
- Fixed formal types of infer() and also added runtime check by @yoavkatz in #1406
- not using "score" as metric main_score by @lilacheden in #1407
- Fix model strings for Llama 3 on Together AI by @yifanmai in #1411
- Adjust binary llmaj to new engines and add rits support by @lilacheden in #1408
- Granite Guardian RAG metrics by @arielge in #1393
- Solved many red lines in 'make docs-server' by @dafnapension in #1418
- Fix artifact dict assignment bug by @elronbandel in #1419
- Remove top level imports from guerdian metric (as it adds dependencis to unitxt) by @elronbandel in #1420
- Make types compatible with python 3.8 by @elronbandel in #1423
- Benjams/loaders fix separator by @BenjSz in #1424
- Update version to 1.15.10 by @elronbandel in #1426
Full Changelog: 1.15.9...1.15.10
Unitxt 1.15.9
Main changes
- Artifacts in the catalog can now be links to other artifacts and can also be marked deprecated.
What's Changed
- artifact link by @dafnapension in #1363
- Add processors also as operators by @antonpibm in #1397
- added 'add_link_to_catalog' for easily adding artifact_links with/without deprecation msg by @dafnapension in #1398
- Safety updates by @bnayahu in #1391
- Reduce error message clutter by @yoavkatz in #1401
- Update version to 1.15.9 by @yoavkatz in #1404
Full Changelog: 1.15.8...1.15.9
Unitxt 1.15.8
Main changes
Added support for RITS Inference Engine
Inference Engines
- Add inference engines to the catalog by @martinscooper in #1394
- Add support for OpenAI custom base url and default headers + RITS Inference engine by @martinscooper in #1385
Assets
- Add vectara's hhem2.1 faithfulness model as a metric by @lilacheden in #1382
Bug Fixes
- fix template in Arena Hard card and example by @OfirArviv in #1390
Full Changelog: 1.15.7...1.15.8
1.15.7
Assets
- add llama-3-405b-instruct wml classification engine by @lilacheden in #1383
Usability
- Support MerticsList - to store a list of metrics by @lilacheden in #1379
Bug fixes
- Made sure null augmentor works as expected by @yoavkatz in #1381
- Fixes and improvements to task based llm as judge by @lilacheden in #1366
- Fix package dir in settings by @yoavkatz in #1387
Documentation
- Typos in the rst files by @dafnapension in #1380
- Chat api blog post by @elronbandel in #1371
Inference Engine
- Tests and minor changes Changes to GenAI, WML and HF inference engines by @pawelknes in #1290
Full Changelog: 1.15.6...1.15.7
Unitxt 1.15.6 - Chat Inference
Main changes
-
Added support for generating output in ChatAPI format (user/assistant turns) and for inference engines to process ChatAPI input.
See details in blog. -
Improved catalog browsing experience with cleaner formatting of catalog assets, and clickable hyper links between catalog assets and between catalog assets and code. See for example.
New Features
Inference Engines that support ChatApi interface
- Add target_prefix erasing post processor by default by @elronbandel in #1361
- Add multi api inference engine by @elronbandel in #1343
- Add chat api format with standard open ai chat format by @elronbandel in #1314
- Add option selecting huggingface inference engine by @elronbandel in #1357
Improved multi model support
- Add seed bench dataset and support for videos by @elronbandel in #1309
- Add LMMSEvalInferenceEngine by @elronbandel in #1301
- Vision robustness blog by @elronbandel in #1318
New Asserts
- added QTSUMM taskcard for query-focused table summarization task by @csrajmohan in #1304
- Add OptionSelectingByLogProbsInferenceEngine by @martinscooper in #1317
- Replace 20 newsgroup with a shorter version in bluebench by @perlitz in #1347
- Bluebench Update by @perlitz in #1342
- Update Blue Bench description by @elronbandel in #1354
- Batched multi class classification by @yoavkatz in #1340
- move rag binary llmaj under rag metrics by @lilacheden in #1338
- adding generic inference binary+idk judges by @Roni-Friedman in #1316
- Add table augmentors by @elronbandel in #1328
- Align augmenters with task and types mechanisms by @elronbandel in #1356
- add serializers to catalog + new table operators by @ShirApp in #1365
Performance
- Add loaders cache by @elronbandel in #1333
Usuability
- Allow turning single stream to dataset by @elronbandel in #1335
Documentation
- Add ability to load_dataset without a template for simpler usage for beginners by @elronbandel in #1350
- add score name prefix for judge_raw_output/input in llmaj metric by @OfirArviv in #1323
- Add link to source in catalog assets by @elronbandel in #1362
- Fix docs compilation and links from docs to github by @elronbandel in #1359
- Fix website docs-code links by @elronbandel in #1360
- Update error checking and documentation of processors by @yoavkatz in #1325
- Unified catalog terminology by @yoavkatz in #1355
- Improved documentation formatting by @dafnapension in #1334
- Fix catalog links by @elronbandel in #1348
- Print catalog entries as yamls by @dafnapension in #1351
CI/CD
- a more elaborated message from performace-test-summary, and doc-string of card_profiler by @dafnapension in #1307
- Make package requirements compatible with requirements.txt like format by @elronbandel in #1310
- Make inference engine tests run only when inference.py has changed by @elronbandel in #1311
- Seperate examples tests by @elronbandel in #1322
- Fix pyproject.toml to be standalone and comply with modern standards by @elronbandel in #1324
- Fix GitHub Actions concurrence execution by @elronbandel in #1349
- Make tests faster and clearer by @dafnapension in #1345
New Contributors
- @martinscooper made their first contribution in #1317
Full Changelog: 1.14.1...1.15.6