John Snow Labs NLP Test 1.2.0: Announcing Support for Cohere, AI21, Azure OpenAI and Hugging Face Inference API
π’ Overview
NLP Test 1.2.0 π comes with brand new features, including: support for testing Cohere, AI21, Hugging Face Inference API and Azure-OpenAI LLMs for robustness, bias, accuracy and representation tests on the BoolQ and Natural Questions datasets, and many other enhancements and bug fixes!
A big thank you to our early-stage community for their contributions, feedback, questions, and feature requests π
Make sure to give the project a star right here β
π₯ New Features & Enhancements
- Adding support for 4 new LLM APIs for Question Answering task #388
- Adding support for bias tests for testing LLMs on Question Answering #404
- Adding support for representation tests for testing LLMs on Question Answering #405
- Adding support for accuracy tests for testing LLMs on Question Answering #394
- Adding new robustness test called number_to_word #377
π Bug Fixes
- Fixed bias tests to enable multi-token name replacements #400
- Fixed issue in ethnicity/religion-names #393
- Fixed issue in default HF text classification model #402
β How to Use
Get started now! π
pip install nlptest
Create your test harness in 3 lines of code π§ͺ
# Set OpenAI API keys
os.environ['OPENAI_API_KEY'] = ''
# Import and create a Harness object
from nlptest import Harness
h = Harness(task='question-answering', model='gpt-3.5-turbo', hub='openai', data='BoolQ-test', config='config.yml')
# Generate test cases, run them and view a report
h.generate().run().report()
π Documentation
β€οΈ Community support
- Slack For live discussion with the NLP Test community, join the
#nlptest
channel - GitHub For bug reports, feature requests, and contributions
- Discussions To engage with other community members, share ideas, and show off how you use NLP Test!
We would love to have you join the mission π open an issue, a PR, or give us some feedback on features you'd like to see! π
β»οΈ Changelog
What's Changed
- fix/task test supoort check by @alytarik in #378
- Add boolq dev dataset by @alytarik in #390
- Issue 374 add representation tests by @ArshaanNazir in #381
- Issue in ethnicity religion names by @ArshaanNazir in #393
- Feature: Add representation tests for LLMs by @ArshaanNazir in #405
- Fix: default HF text classification model issue by @chakravarthik27 in #402
- Feature: Add support for bias tests for question answering by @ArshaanNazir in #404
- Chore: Adding supported hubs as logos to landing page by @luca-martial in #403
- Fix/bias_tests Enable multi-token name replacements by @ArshaanNazir in #400
- Feature: Add support for number to words robustness test by @RakshitKhajuria in #377
- Feature: Adding support for 4 new LLM APIs by @chakravarthik27 in #388
- DRAFT: Feature/accuracy for qa task by @alytarik in #394
- fix typo and order of columns by @alytarik in #406
- Fix/llm accuracy bug fix by @alytarik in #407
- Fix prompt template llm and transformer version by @ArshaanNazir in #408
- added number_to_words test to robustness nb by @RakshitKhajuria in #410
- notebooks and default_config paths updated. by @chakravarthik27 in #411
- Fix: switch default HF classifier dataset from tweet to imdb by @luca-martial in #409
- Chore: Website updates for new LLMs and pages by @luca-martial in #401
- Release/1.2.0 by @ArshaanNazir in #415
New Contributors
- @RakshitKhajuria made their first contribution in #377
Full Changelog: v1.1.0...v1.2.0