Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.4.0 #213

Merged
merged 58 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3155ea3
Bump codecov/codecov-action from 3 to 4
dependabot[bot] Feb 1, 2024
bda5217
update pre-commit hooks and reformat code
nobu-g Feb 1, 2024
bdb34b7
Merge pull request #203 from ku-nlp/dependabot/github_actions/dev/cod…
nobu-g Feb 1, 2024
1f11d76
eliminate the escape of hydra variable interpolation: ${}
nobu-g Feb 1, 2024
26f008a
fix configs
nobu-g Feb 5, 2024
6f2c66c
fix analyze.py
nobu-g Feb 5, 2024
e7659b5
fix predict.py
nobu-g Feb 5, 2024
a3cb37c
refactor benchmark.sh
nobu-g Feb 5, 2024
f9d5779
Merge pull request #204 from ku-nlp/eliminate-escape
omukazu Feb 8, 2024
ed3057b
refactor word_normalization.py
omukazu Apr 6, 2024
320b4dc
make char module accept whitespace
omukazu Apr 7, 2024
19da2f5
fix bug
omukazu Apr 7, 2024
a104b51
make typo module accept whitespace
omukazu Apr 13, 2024
010dd8a
make word module accept whitespace
omukazu Apr 13, 2024
e0bb34e
add preprocess_reading.py
omukazu Apr 13, 2024
66b1ffb
tweak
omukazu Apr 13, 2024
9532593
fix bug
omukazu Apr 13, 2024
919310d
fix tests
omukazu Apr 13, 2024
2516c39
fix test workflow
nobu-g Apr 15, 2024
a6e091c
update deps
nobu-g Apr 15, 2024
ec652c5
disable caching for poetry install
nobu-g Apr 15, 2024
92077e1
fix test workflow
nobu-g Apr 15, 2024
40141a0
Bump softprops/action-gh-release from 1 to 2
dependabot[bot] Apr 15, 2024
2e0fda1
Merge remote-tracking branch 'origin/dev' into add-whitespace-token
omukazu Apr 15, 2024
043d883
tweak
omukazu Apr 20, 2024
054ee32
fix bug
omukazu Apr 20, 2024
049c232
Merge pull request #206 from ku-nlp/dependabot/github_actions/dev/sof…
nobu-g May 1, 2024
68ba9f4
tweak
omukazu May 2, 2024
b0d7654
add control characters
omukazu May 9, 2024
a658b0c
remove full-width space and triple dot tokens from seq2seq module
omukazu May 14, 2024
78a11ab
Merge remote-tracking branch 'origin/dev' into add-whitespace-token
omukazu Jun 4, 2024
d169be8
tweak
omukazu Jun 5, 2024
5c91560
fix test
omukazu Jun 6, 2024
1fd4946
refactor seq2seq
omukazu Jun 24, 2024
14b32ed
fix test
omukazu Jun 24, 2024
be64207
tweak
omukazu Jun 25, 2024
758dd42
tweak
omukazu Jun 26, 2024
4bca62b
fix bug
omukazu Jun 27, 2024
a9f814a
Bump the dependencies group across 1 directory with 3 updates
dependabot[bot] Jul 1, 2024
88a9274
tweak
omukazu Jul 1, 2024
e9a45d0
tweak
omukazu Jul 8, 2024
b7f5474
Merge pull request #211 from ku-nlp/add-whitespace-token
omukazu Jul 8, 2024
758960f
Merge pull request #210 from ku-nlp/dependabot/pip/dev/dependencies-5…
omukazu Jul 16, 2024
5a1d26e
update dependencies
omukazu Jul 16, 2024
95450d4
Revert "update dependencies"
omukazu Jul 16, 2024
4ea0128
update setuptools to >= 70.0.0
omukazu Jul 17, 2024
ed1bacb
update zipp to >= 3.19.1
omukazu Jul 17, 2024
199e81a
update scikit-learn to >= 1.5.0
omukazu Jul 17, 2024
103ac3b
update urllib3 to >= 2.2.2
omukazu Jul 17, 2024
c460a00
update requests to >= 2.32.0
omukazu Jul 17, 2024
a8f5f96
update Jinja2 to >= 3.1.4
omukazu Jul 17, 2024
3f69bda
update certifi to >= 2024.07.04
omukazu Jul 17, 2024
9c0ebbc
update tqdm to >= 4.66.3
omukazu Jul 17, 2024
35dce14
fix bug
omukazu Jul 17, 2024
c55f36a
set default value of num_beams to 1
omukazu Jul 18, 2024
c78cb10
tweak
omukazu Jul 18, 2024
a338e9c
tweak
omukazu Jul 18, 2024
7ded0e0
bump version to 2.4.0
omukazu Jul 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
POETRY_PYTHON: ${{ steps.setup-python.outputs.python-path }}
run: |
poetry env use $POETRY_PYTHON
poetry install --no-interaction --without dev,test
poetry install --no-interaction --without dev,test --no-cache
- name: Build KWJA
run: |
poetry build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Add path for Python packages
run: echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies
run: poetry install --no-interaction --only main
run: poetry install --no-interaction --only main --no-cache
- name: Build package
run: poetry build

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
uses: actions/checkout@v4
- name: Create Release
id: create_release
uses: softprops/action-gh-release@v1
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
body: |
Expand Down
7 changes: 2 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,12 @@ jobs:
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies
run: |
poetry config virtualenvs.create false
poetry install --no-interaction --without dev
poetry install --no-interaction --without dev --no-cache
- name: Run tests
run: |
poetry run pytest --cov=./ --cov-report=xml
env:
XDG_CACHE_HOME: ${{ github.workspace }}/.cache
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
Expand Down
18 changes: 9 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,42 @@ default_language_version:
python: python3.10
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
rev: v4.6.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-yaml
- id: check-toml
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 23.11.0
rev: 24.4.0
hooks:
- id: black
- repo: https://github.com/PyCQA/flake8
rev: 6.1.0
rev: 7.0.0
hooks:
- id: flake8
additional_dependencies: [Flake8-pyproject]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.7.0
rev: v1.9.0
hooks:
- id: mypy
additional_dependencies:
- rhoknp==1.6.0
- hydra-core==1.3.2
- torch==2.1.1
- torchmetrics==1.2.0
- transformers==4.34.1
- torch==2.2.0
- torchmetrics==1.3.0
- transformers==4.38.2
- tokenizers
- wandb
- typer
- types-PyYAML
- cohesion-tools==0.5.7
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.0
rev: v3.15.2
hooks:
- id: pyupgrade
args:
Expand Down
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [v2.4.0] - 2024-07-18
### Added
- Introduce a special token to handle whitespaces as they are.

### Changed
- Set default value of num_beams to 1.
- Refactor seq2seq module to make it slightly faster.

### Removed
- Remove normalization of whitespaces to "␣".

## [v2.3.0] - 2024-02-01
### Added
- Support Python 3.12.
Expand Down
1 change: 1 addition & 0 deletions configs/char_module.debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: [" "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
1 change: 1 addition & 0 deletions configs/char_module.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: [" "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
1 change: 1 addition & 0 deletions configs/datamodule/base/char.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@ denormalize_probability: ${denormalize_probability}
tokenizer:
_target_: transformers.AutoTokenizer.from_pretrained
pretrained_model_name_or_path: ${encoder.pretrained_model_name_or_path}
additional_special_tokens: ${special_tokens}
do_word_tokenize: false
_convert_: all
4 changes: 3 additions & 1 deletion configs/datamodule/predict/char_inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,7 @@ defaults:
- _self_

_target_: kwja.datamodule.datasets.CharInferenceDataset
texts: []
doc_id_prefix: null
# texts and raw_text_file are mutually exclusive
texts: []
raw_text_file: null
3 changes: 1 addition & 2 deletions configs/datamodule/predict/seq2seq_inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ defaults:
- _self_

_target_: kwja.datamodule.datasets.Seq2SeqInferenceDataset
texts: []
doc_id_prefix: null
juman_file: null
2 changes: 2 additions & 0 deletions configs/datamodule/predict/typo_inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ defaults:
- _self_

_target_: kwja.datamodule.datasets.TypoInferenceDataset
# texts and raw_text_file are mutually exclusive
texts: []
raw_text_file: null
6 changes: 3 additions & 3 deletions configs/eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependency_topk: 4
discourse_threshold: 0.0

# environment dependent settings
num_workers: -1
num_workers: ${oc.env:NUM_WORKERS,0}
devices: ${oc.env:DEVICES,1}
max_batches_per_device: 4
compile: false
max_batches_per_device: ${oc.env:MAX_BATCHES_PER_DEVICE,2}
compile: ${oc.env:COMPILE,false}
4 changes: 2 additions & 2 deletions configs/seq2seq_module.debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ do_predict_after_train: false
checkpoint_path: ""

# For decoding settings
use_forced_decoding: true
use_surf_forced_decoding: true
decoding:
max_length: ${max_tgt_length}
num_beams: 3
num_beams: 1

# set monitor and mode for early_stopping and model_checkpoint
monitor: valid/seq2seq_loss
Expand Down
4 changes: 2 additions & 2 deletions configs/seq2seq_module.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ do_predict_after_train: false
checkpoint_path: ""

# For decoding settings
use_forced_decoding: true
use_surf_forced_decoding: true
decoding:
max_length: ${max_tgt_length}
num_beams: 3
num_beams: 1

# set monitor and mode for early_stopping and model_checkpoint
monitor: valid/seq2seq_loss
Expand Down
2 changes: 1 addition & 1 deletion configs/typo_module.debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: ["<k>", "<d>", "<_>", "<dummy>"]
special_tokens: ["<k>", "<d>", "<_>", "<dummy>", " "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
2 changes: 1 addition & 1 deletion configs/typo_module.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: ["<k>", "<d>", "<_>", "<dummy>"]
special_tokens: ["<k>", "<d>", "<_>", "<dummy>", " "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
2 changes: 1 addition & 1 deletion configs/word_module.debug.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: ["[著者]", "[読者]", "[不特定:人]", "[不特定:物]", "[NULL]", "[NA]", "[ROOT]"]
special_tokens: ["[著者]", "[読者]", "[不特定:人]", "[不特定:物]", "[NULL]", "[NA]", "[ROOT]", " "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
2 changes: 1 addition & 1 deletion configs/word_module.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ compile: ${oc.env:COMPILE,false}
ignore_hparams_on_save: false

# constants
special_tokens: ["[著者]", "[読者]", "[不特定:人]", "[不特定:物]", "[NULL]", "[NA]", "[ROOT]"]
special_tokens: ["[著者]", "[読者]", "[不特定:人]", "[不特定:物]", "[NULL]", "[NA]", "[ROOT]", " "]
hparams_to_ignore_on_save:
- project
- work_dir
Expand Down
Loading
Loading