Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not for merge] DESC/LSST combined feature branch #1993

Closed
wants to merge 75 commits into from

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Mar 17, 2021

This branch is ongoing with @TomGlanzman for using parsl with the LSST DESC DR2 workflow. This PR is to get it tested in parsl CI.

This branch is made from an stgit patch stack with regular merges to master. Don't push changes to this branch - rather, get them put on master and poke @benclifford for an update.

I'm migrating patches from this stack over the years onto master and adding new stuff on the end - hopefully eventually the patch stack, and this branch with it, should evaporate entirely.

There was one at the parsl top level, and one in providers, but
these were not used uniformly:

* the monitoring system used the provider version even though
it is nothing to do with the provider system.

* the top level error was not listed in the API documentation
and then [B] lots of "tried to update NNNN rows, only did MMMM" errors.

hypothesis:

i) we need to rety on db locked? [yes we do]

 at the sqlite3 commandline that needs to happen, at least, so
maybe also in code? this would only really show up if you're looking in the db logs, and probably
only if you're also running queries against the DB (or some other access) rather than letting the
db manager be the only user.

ii) if those locked errors are causing rows to not be INSERTed (eg for tasks) then that means that
row cannot be UPDATEd later, which might be then causing symptom B. That symptom probably means
that all updates in that batch are discarded. This could happen because of symptom A, or because
other malformed data could not be inserted into the database.

concerns:
 i) cross-DB portability (probably a lost cause without further abstraction, because
    that isn't really a thing SQL does)

 ii) what happens if I'm looping on some more genuine error that is always present and that
     causes some internal structures to overflow

For the purposes of lsst-parsl, this patch is allowed to be a bit of a messy hack, to try to understand
how this problem can be fixed.


Problems with this patch:
  I'm seeing database lock errors when running CREATE TABLE fairly often
  (but awkward enough to reproduce that I'm not confident in reproducing at the moment).
  This looks like sqlalchemy doing those ops somewhere that doesn't have the retry
  logic I added for doing inserts and updates. So this would manifest, i guess, if
  you were poking at the database before it had been properly initialised by the parsl
  monitoring code. I think that is probably a situation I care about.
__repr__ should be quasi-machine-readable, and __str__ human readable

See PR #1966, commit a423955, for
further context.

Before:

>>> str(e)
"(['mymod'], 'this test needs demonstrating')"
>>> repr(e)
"The functionality requested requires a missing optional module:['mymod'],  Reason:this test needs demonstrating"

After:

>>> str(e)
"The functionality requested requires missing optional modules ['mymod'], because: this test needs demonstrating"
>>> repr(e)
"OptionalModuleMissing(['mymod'], 'this test needs demonstrating')"
This PR adds type annotations, asserts and comments to better
describe the existing formats used for monitoring.

This is intended to make future simplification easier.

This PR should not change any behaviour except error handling when
a format is violated.
I was seeing delays in running fresh_config of sometimes 30 seconds

This i suspect was auto address config, so i've removed that and use localhost,
which is fine for this test config

This moves test run time to a consistent 11s against master on my laptop.
…when running --config local tests. exex isn't needed for LSST, so disable testing of it.
…it isn't high enough priority for me to fix at the moment
This makes RESOURCE_INFO messages be sent around in the same format as other messages, which makes upcoming rearrangement of monitoring message deliver easier
This is groundwork for sending monitoring messages in on any queue,
and so via different protocols.
…atus delivery

TODO: needs to be switchable? At the moment, it is hard coded into the definition of an executor - which maybe makes sense when using HTEX, but not so much when parsl user wants to choose between mechanisms (eg UDP vs FS). That switching also needs to be per executor.

for lsst purposes this needs to scale up to the scale of
1000 nodes running many works (10s of) on each node.

so eg 50000 workers as a target?

50000 zmq TCP connections is A Lot

I'd like something that is not tied into the
htex codebase (which at the moment, monitoring is a bit,
and putting a messaging concentrator in there would
make it more tied in).

But a messaging concentrator per-node is what I need.
Perhaps I can re-use the existing htex queue, even,
using message types... "here's a message I want you to
forward to the messaging system in the interchange" ?

That would at least be a bit orthogonal? Or perhaps the
messaging engine (eg the UDPRadio bit) can be replaced
by an htex specific one. so you get UDPRadio in udp
circumstances, or HTEXRadio in the htex case, if you
so configure it? that would be an ok plugin thing
to try out that at least would allow switching
between UDP and HTEX while allowing very-HTEX specific
stuff to be implemented.

(and specificlaly, HTEX provides this kind of hierarchical
feature that parsl-core does not)

So additional communication this needs:

  i) monitoring worker wrapper: ability to communicate with htex worker enough to send monitoring messages
        - needs some "examine my context" that is probably provided by globals: if you're using "htex" mode monitoring, inspect a global that you expect to exist
        - and htex worker should set up that worker.
        - perhaps that also provides some scope for *inside apps* reporting state changes which is something i was interested in

  ii) interchange<->mangager<->htex worker: add in ability to send monitoring messages (perhaps as a broader "new tag to deliver")

  iii) interchange -> monitoring: deliver those messages to monitoring
…sages

the file creation technique is similar to Maildir:
  create a file in tmp/ without any need to write atomically
  then move to new/ atomically

This means that the only messages appearing in new/ will be complete, and can be deleted by the receiver without conflicting with the sender.

I expect this will present interesting file system loads because of all the directory operations,
especially if everything is happening just inside a single directory. the load will perhaps be comparable to each job creating a few stdout/stderr files inside one directory. unless resource monitoring is on, in which case much higher load.

so maybe it will be slower but more reliable than UDP.

i expect that there will be more out-of-order message delivery than with UDP - and I expect monitoring will deal badly with that. (especially, the notion of "first" RESOURCE_INFO messages)

the intention is to use someone elses infrastructure to deliver messages rather than implementing a new htex-style comms network just for monitoring
…e WQ

this is to check for me that wq+fs monitoring is ok

i'm not clear how it should be integrated into the CI test suite.
the ExtremeScale case will never fire in the current extremescale implementaiton,
because an extreme scale executor is also a high throughput executor, and so the
earlier htex case will fire.

It is possible that extreme scale scaling was broken because of this case. This
patch should not make it either better or worse, because it only eliminates dead
code.

when an executor is not an htex instance, no cases match, but no error is raised
here, and so tasks_per_node is never assigned. Later on (line 206) use of
tasks_per_node is an error.

this entire case is removed, and executor.workers_per_node is always used.
…nforced/documented in the executor base classes.

This patch makes it obligatory for statushandlingexecutors to have that, on the assumption
that statushandlingexecutor will become generally a scaling-capable base class.
broadly, this involves adding in enough helpers that the scaling strategy knows
whats going on.

i think probably the status handling class needs to also include descriptions
(ideally machine enforceable) about the methods that I've had to add here.

or the simple strategy should be made more resilient to their absence rather than
crashing.

PERHAPS OPEN AN ISSUE: "different kinds of executors, from a scaling perspective"

the wq monitoring config is modified to start with 0 init blocks, and let the scaling system bring up more blocks. at time of writing this comment, that doesn't happen: wq launches init blocks only (once, at start) and scaling doesn't touch WQ (because the poll interval is set via the superclass to -1)

wq default beahviour is one worker per node but i want ot run it in a mode where it figures out that it can run more things on a node at once.

"worker" has a different meaning here wrt htex: htex workers each run one task, so 100 workers = 100 tasks at once. wq is more dynamic and always runes one wq per node but then that worker can run more parsl tasks.

so be careful with terminology here.
…ic ones) to capture final cumulative usage

although maybe that info isn't available any more if for example, executables have completed? how does `time` do it?
…logs will now go to the parsl main log ***

*** so maybe i shouldn't do this patch after all ***

the reason was because I couldn't find where @wrap_with_logs errors were going with interchange, but perhaps
they're going to the main log?

-----

make interchange set up logs like rest of parsl

this removes some code that isn't interchange specific, in favour of other similar but different code

it also means that logs made to parsl.* will go to the interchange log
for tomg, before release:
 i) disable (or option to disable) all resource monitoring

 ii) later, per-app enable/disable of resource monitoring, which will also maybe work around the joinapp vs threads problem
…be 0 tries.

this is probably coming about because there is always a try number (starting at 0)
even if no tries happen.

TOOD: make a monitoring test to check this

TODO: fix this bug
…not sure which one causes which) is that

there is thread related connection sharing which is not permitted. (erlang style local connection rather than
object-shared connection might help push on that, by removing the opportunity to share connections across
threads)
and the prepared state - comes from a failed not-rolled-back transaction earlier on.

Perhaps: an error happened. Then tried to do another one. hence the error happened.
then that tried to call rollback, but it was in a different thread (because it was in the sqlalchemy code
which cannot call rollbacks successfully I guess, on broken connections, because we don't know which thread
finalization will happen in)

2020-11-25 12:04:00 parsl.executors.high_throughput.executor:649 [INFO]  Attempting HighThroughputExecutor shutdown
2020-11-25 12:04:00 parsl.executors.high_throughput.executor:651 [INFO]  Finished HighThroughputExecutor shutdown attempt
2020-11-25 12:04:00 parsl.monitoring.monitoring:271 [INFO]  Terminating Monitoring Hub
2020-11-25 12:04:00 parsl.monitoring.monitoring:287 [INFO]  Waiting for Hub to receive all messages and terminate
2020-11-25 12:04:03 parsl.process_loggers:17 [INFO]  exception wrapper: normal ending for thread MainThread in process UNLOGGED
Exception during reset or similar
Traceback (most recent call last):
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/parsl/monitoring/db_manager.py", line 476, in _update
    self.db.update(table=table, columns=columns, messages=messages)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/parsl/monitoring/db_manager.py", line 76, in update
    self.session.bulk_update_mappings(mapper, mappings)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2860, in bulk_update_mappings
    mapper, mappings, True, False, False, False, False
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2876, in _bulk_save_mappings
    transaction = self.begin(subtransactions=True)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 938, in begin
    self.transaction = self.transaction._begin(nested=nested)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 316, in _begin
    self._assert_active()
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 282, in _assert_active
    "This session is in 'prepared' state; no further "
sqlalchemy.exc.InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
    fairy._reset(pool)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
    pool._dialect.do_rollback(self)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 530, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 46912496695104 and this is thread id 46912630470400.
Exception closing connection <sqlite3.Connection object at 0x2aaab2acc1f0>
Traceback (most recent call last):
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/parsl/monitoring/db_manager.py", line 476, in _update
    self.db.update(table=table, columns=columns, messages=messages)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/parsl/monitoring/db_manager.py", line 76, in update
    self.session.bulk_update_mappings(mapper, mappings)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2860, in bulk_update_mappings
    mapper, mappings, True, False, False, False, False
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2876, in _bulk_save_mappings
    transaction = self.begin(subtransactions=True)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 938, in begin
    self.transaction = self.transaction._begin(nested=nested)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 316, in _begin
    self._assert_active()
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 282, in _assert_active
    "This session is in 'prepared' state; no further "
sqlalchemy.exc.InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
    fairy._reset(pool)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
    pool._dialect.do_rollback(self)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 530, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 46912496695104 and this is thread id 46912630470400.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 270, in _close_connection
    self._dialect.do_close(connection)
  File "/global/homes/d/descdm/.conda/envs/parsl-lsst-dm/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_close
    dbapi_connection.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 46912496695104 and this is thread id 46912630470400.
2020-11-25 12:04:03 parsl.process_loggers:17 [INFO]  exception wrapper: normal ending for thread MainThread in process UNLOGGED
2020-11-25 12:04:03 parsl.dataflow.dflow:1055 [INFO]  DFK cleanup complete
This was disabled as part of the large set of tests disabled in the
PR #652 testpocalypse. But on review it *should* work.
…edly (on each poll?) in my

tutorial example.

What's going wrong?

2020-10-19 11:29:24.016 parsl.dataflow.flow_control:114 [ERROR]  Flow control callback threw an exception - logging and proceeding anyway
Traceback (most recent call last):
  File "/home/benc/parsl/src/parsl/parsl/dataflow/flow_control.py", line 112, in make_callback
    self.callback(tasks=self._event_buffer, kind=kind)
  File "/home/benc/parsl/src/parsl/parsl/dataflow/task_status_poller.py", line 64, in poll
    self._update_state()
  File "/home/benc/parsl/src/parsl/parsl/dataflow/task_status_poller.py", line 71, in _update_state
    item.poll(now)
  File "/home/benc/parsl/src/parsl/parsl/dataflow/task_status_poller.py", line 28, in poll
    self._status = self._executor.status()
  File "/home/benc/parsl/src/parsl/parsl/executors/status_handling.py", line 73, in status
    status = self._make_status_dict(job_ids, self._provider.status(job_ids))
  File "/home/benc/parsl/src/parsl/parsl/providers/local/local.py", line 86, in status
    str_ec = self._read_job_file(script_path, '.ec').strip()
  File "/home/benc/parsl/src/parsl/parsl/providers/local/local.py", line 143, in _read_job_file
    with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/benc/parsl/src/parsl/runinfo/000/submit_scripts/parsl.localprovider.1603103227.0265722.sh.ec
… that

f(3) run multiple times has the same hashsum
and
f(4) has a different hashsum to f(3)

in addition to the existing way of checking that runs are not repeated

because now we have acceses to the task record in tests due to future.task_def
…n the monitoring DB. i

vaguely remember this happening and fixing it a while ago but maybe only in this branch, not 
lsst-dm-202005?

TODO: write a regression test for this using sql queries and looking at the task record hashsum 
(and checking two different invocations get two different hashsums)
#1848)

> These changes look good to me. I am seeing two separate states in the plots, but it is still showing up as green for both exec_done and memo_done on the workflow page for me. Might be worth checking that.

so I should dig into that visual style
ipp isn't used any more

so remove the hack

and see if the problem appears in other places
…cal hostname

eg workers on htex could set an environment variable that gives executor name, manager ID, worker number, block id, for exampe

feature req from tomq - he specifically asked for block id
…repr__ed to the monitoring database (see recent PR i made that writes out using __repr__ not __str_- they lack context)

Look at the task failures in a monitoring.db coming from running the test suite to see examples of some which do not look right - i.e. ones that look liek a string, not Classname(FooDetail)
this is not intended to be exact, in the sense that a job with a lower priority might run
before a job with a higher priority - but the "bulk" of the work (in the LSST sense)
should be prioritised this way.

priorities can be anything comparable to each other (and to the default priority, which is integer 0)

i'm not going to address the macsafequeue in this prototype
…received the ctrl-C and is working towards ending.

This would be fine as an ERROR level log message, I think, because it is an abort of the workflow.

I'm not sure what the ctrl-C reporting of this looks like right now... so i should review that befor echanging anythign else
@benclifford
Copy link
Collaborator Author

benclifford commented Apr 20, 2021

superseded by #2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant