Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes leaked semaphores #2125

Merged
merged 1 commit into from
Jul 12, 2024
Merged

Fixes leaked semaphores #2125

merged 1 commit into from
Jul 12, 2024

Conversation

thewhaleking
Copy link
Contributor

Fixes leaked semaphores in POW solving by correctly handling the shutdown/joining of multiprocessing objects.

@thewhaleking thewhaleking requested a review from a team July 12, 2024 16:04
@thewhaleking thewhaleking merged commit c8fde9d into staging Jul 12, 2024
31 checks passed
@thewhaleking thewhaleking deleted the fix/leaked-semaphores branch July 12, 2024 17:31
@thewhaleking thewhaleking linked an issue Jul 12, 2024 that may be closed by this pull request
1 task
Rapiiidooo pushed a commit to Rapiiidooo/forked-bittensor that referenced this pull request Jul 15, 2024
Fixes leaked semaphores in POW solving by correctly handling the shutdown/joining of multiprocessing objects.
unconst pushed a commit that referenced this pull request Aug 16, 2024
* chore:lint

* Fix LA test

* Skip faucet test

* Skip faucet test

* Prevent E2E tests run on drafts and non_prs.

* Prevent E2E tests run on drafts and non_prs.

* init commit

* Add child singular cli done

* Set Child Hotkey

* Set children multiple command.

* Remove merge conflict artifacts

* Add GetChildrenHotkeysCommand

* Add RevokeChildHotkey

* lint

* Revoke children multiple command

* lint

* CLI revoke multiple

* lint

* Imports

* lint

* lint

* imports in init

* fix e2e ci

* fix e2e ci - update

* update if statement

* update if statement

* Fix Faucet and fastblocks

* Refactor to follow new pattern

* U64 for all proportion values

* lint

* debug

* Clean up a bit.

* Clean up a bit.

* u16 to u64

* un-refactor

* APY + other updates

* lint

* lint

* Extract Method, and params

* Pr comments

* feat: enhances dendrite error messages

* add timeout and change pattern

* feat: enhances dendrite error messages

* Merge pull request #2012 from opentensor/feature/gus/liquid-alpha-params

Liquid alpha

* ci: adds release steps

* ci: replace CircleCi with GH workflow

* Re-adds check_coldkey_swap command (#2126)

* Inital commit: Readds check-coldkey-swap

* Adds check_coldkey_swap command

* ci: github workflow release

* Bumps version to 7.3.0

* Fixes leaked semaphores (#2125)

Fixes leaked semaphores in POW solving by correctly handling the shutdown/joining of multiprocessing objects.

* Adds check_pr_status

* Make check_pr_status.sh executable

* Fixes ruff checks

* Updated e2e yaml file

* ci: add branch regex

* ci: add hotfix regex

* chore: update changelog

* update e2e test

* ruff

* ci: rename release workflow file

* ci: adding release.yml

* chore: add doc string

* Merge master into staging

* fix: coldkeypub usage instead of coldkey for arbitration_stats

* test: fix mocksubtensor query previous blocks

* Removes extra no_prompts in commands

* Adds timeout for e2e tests

* fix: updates test_axon verify body async tests

* Adds e2e for metagraph command + subtensor.metagraph

* Adds E2E for wallet creations

* Added doc strings and temp variable for testing

* Temp: Added capsys before each command

* Changed pattern to splitting output before matching regex

* ci: removes deprecated release scripts and adds docker release workflow

* Adds test for wallet regenerations + fixes input bug for regen hotkey

* Bumps setuptools~=70.0.0

* bump pysub to 1.7.9+

* bump scale-codec to 1.2.10

* bump scale-codec to 1.2.11

* Removed monkey-patch of py-scale-codec which hasn't been necessary since v1.2.9. Removed comments regarding verification of the package upgrades because of this patch.

* ruff

* fix streaming synapse regression

* Fixed type annotations and doc strings

* Adds docstring to test_wallet_regen test

* Fix naming convention of swap hotkey test

* set streaming response headers

* Removes wait_epoch

* Merge pull request #2156 from opentensor/feat/ledger-integration

[Ledger Integration] [Feature] bump pysub to 1.7.9+

* Revert "fix streaming synapse regression"

* Merge pull request #2159 from backend-developers-ltd/fix_streaming_synapse

fix streaming synapse regression

* ci: auto assigns cortex to opened PRs

* no op

* tests/e2e_tests/utils.py: logging and epoch logic fix

- Log commands as executed, so that CI errors can be pinpointed to their
  originating command.
- Dropped wait_epoch() as it is not used.
- Improved wait_interval(), explanation below:

For tempo T, the epoch interval is T+1, and the offset depends on the
netuid. See subtensor/src/block_step.rs, blocks_until_next_epoch(), with
the comment stating:

https://github.com/opentensor/subtensor/blob/1332d077ea73bc7bf40f551c7f1adea3370df2bd/pallets/subtensor/src/block_step.rs#L33
"Networks run their epoch when (block_number + netuid + 1 ) % (tempo + 1) = 0"

This comment from the subtensor code is not correct, the algorithm
actually tests:

(block_number + netuid + 1 ) % (tempo + 1) == tempo

This is because what is tested, is whether

    blocks_until_next_epoch() == 0

and defining:

    A = (block_number + netuid + 1)%(tempo + 1)

we can say that, looking at https://github.com/opentensor/subtensor/blob/1332d077ea73bc7bf40f551c7f1adea3370df2bd/pallets/subtensor/src/block_step.rs#L47:

    blocks_until_next_epoch() = tempo - A

And so it is easy to see that we need A == tempo to run the epoch.

Then, to find the last epoch, calculating mod M = tempo+1:

    (block_number + netuid + 1)%M = tempo%M
    (block_number + netuid + 1)%M = (-1)%M
    (block_number + netuid + 1)%M + 1 = 0

So the last epoch is at:

    last_epoch = block_number - 1 - (block_number + netuid + 1) % interval

It is easily seen that this is in the range:

    [block_number-interval, block_number-1]

And so the current block, if it were epoch, is not seen as the last
epoch.

The next epoch is then:

    last_epoch + interval

Which is in the range:

    [block_number, block_number+interval-1]

And so if the current block is epoch, wait_for_interval() will return
immediately.

It is suspected that CI tests fail because the wait_epoch() waits for
the wrong block. And if this is not an issue at the present time, it may
become an issue later.

Note that difficulty adjustments follow a different schedule.

In the reworked function, blocks passing while waiting are reported
after passing 10 blocks, not when exactly hitting block N*10. This
ensures that logging is seen, even if the N*10 block passes during
time.sleep(1).

TODO: check if time.sleep() should be replaced by asyncio.sleep(),
as time.sleep() halts the entire process.

* e2e_tests/multistep/test_axon.py: replace magic constant 1 by netuid

* e2e_tests/multistep/test_dendrite.py: replace magic constant 1 by netuid

* e2e_tests/multistep/test_emissions.py: replace magic constant 1 by netuid

* e2e_tests/multistep/test_incentive.py: replace magic constant 1 by netuid

* e2e_tests: improvements

- Replace wait_for_interval(360) calls with wait_epoch(), reducing the
  number of magic constants and preparing for optionally changing the
  tempo used in tests.
- Automatically determine tempo from on-chain data.
- Make wait functions async, eliminating time.sleep() which basically
  halts everything, which is not beneficial to other coroutines.

* replaced bittensor.btlogging.error with bittensor.logging.error as bittensor.btlogging is a module, and thus does not contain the `error` function.

* First terminates the worker process before joining.

* Fixes mock of correct object.

* Ruff.

* Fix typo

Co-authored-by: gus-opentensor <[email protected]>

* ci: update reviews

* Init commit

* 2nd commit: enhancing e2e suite

* Reverts conftest env

* Clean up

* Fix ruff

* Skips emissions, fixes senate vote assertion, fixes exec call in swap_hotkey

* Adds check for participation of a neuron

* Adds updated type in timeouts dendrite

* Bump black from 23.7.0 to 24.3.0 in /requirements

Bumps [black](https://github.com/psf/black) from 23.7.0 to 24.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](psf/black@23.7.0...24.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>

* btlogging/loggingmachine.py: Fix bw compat API.

- add stacklevel=2 so that the filename of the originating call is
  logged (requires Python >= 3.8)
- move *args before prefix and suffix kwargs so regular logging calls
  using args, such as in retry, still work as intended

To elaborate on the last point, retry does the following:

    logger.warning('%s, retrying in %s seconds...', e, _delay)

and the args e and _delay would land on kwargs prefix and suffix without
this patch.

Any code relying on prefix and suffix also being positional arguments,
is now broken.

Note that logger.exception() calls through to logger.error() without
compensating for this additional internal call when determining the
calling file, on CPython < 3.11. This code works around this issue.

* btlogging/loggingmachine.py: Improve bw compat API.

In case prefix and suffix are not used, messages are still prefixed and
suffixed with " - ".

This patch addresses that and only injects separators between non-empty
strings.

* Add unit test for logging output

Add a unit test for:
- correct filename in loglines
- prefix= and suffix= kwargs
- format string functionality

* test: subnet list e2e

* Ensures that each element of _concat_msg is a string

* updates to use local_chain fixture

* fixes test

* test: adds wallet list command e2e test

* rm local_chain

* update path

* Fixes tests depending on explicit line numbers

* Fixes ruff

* Adds workflow for multiple bittensor version tests

* Init: changes name

* Changes name

* Bumps ansible and certifi based on security vulnerabilities

* Fixes version number

* fix Synapse performance regression

* Update child hotkeys with new subtensor calls and remove ChildInfo object.

* ruff

* update

* update

* ruff

* mypy error

* Improve child hotkeys to not double prompt and display table in appropriate places.

* lint

* # Add documentation for retrieve_children and render_table methods

Added detailed docstrings to the retrieve_children and render_table methods in stake.py to specify their purpose, parameters, and return values. This enhances code readability and assists other developers in understanding the methods' functionalities better.

* Refactor normalization to use floor function and handle excess.

In staking.py, introduced the floor function for normalized proportions and added logic to handle excess sums. Updated formatting.py to use floor for float-to-u64 conversion, ensuring values stay within bounds.

* fix DelegateInfoLite name

* add ruff to dev.txt

* ruff formatter

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Gus <[email protected]>
Co-authored-by: opendansor <[email protected]>
Co-authored-by: ibraheem-opentensor <[email protected]>
Co-authored-by: Maciej Urbański <[email protected]>
Co-authored-by: Samuel Dare <[email protected]>
Co-authored-by: open-junius <[email protected]>
Co-authored-by: open-junius <[email protected]>
Co-authored-by: gus-opentensor <[email protected]>
Co-authored-by: ibraheem-opentensor <[email protected]>
Co-authored-by: Benjamin Himes <[email protected]>
Co-authored-by: Benjamin Himes <[email protected]>
Co-authored-by: Roman <[email protected]>
Co-authored-by: Rapiiidooo <[email protected]>
Co-authored-by: Tamerlan <[email protected]>
Co-authored-by: Cameron Fairchild <[email protected]>
Co-authored-by: Maciej Urbanski <[email protected]>
Co-authored-by: μ <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix leaked semaphores for POW
3 participants