-
Notifications
You must be signed in to change notification settings - Fork 74
[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state. |
e9d96e2 to
7e9a09f
Compare
|
A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly: Update: |
abafff5 to
fb36836
Compare
alanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. Just a couple of things in the test
|
Looks like we have a conflict. Seems this PR is close to completion? |
|
Just checking to see if this PR is still being considered for 3.3.0. |
Will check its currency to see whether the segfault is still a concern. If so, then I think we can consider it for this release. |
fb36836 to
481952c
Compare
Co-authored-by: Alan King <[email protected]>
|
What's the status of this PR? |
I believe it's almost ready. I want to look over it once more. |
alanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awaiting signal that this is ready
I've added a test (first draft, will run soon) that interrupts a put . We weren't testing that previously. |
finish put test debug(parallel) debug(put-test) behaves better if we add mgr to list sooner? experimental changes ACTIVE_PATH paths active make return values consisten from io_multipart_*() print debug on abort almost there? move statement where transfer_managers is updated rework abort_transfer fn slightly handle logic for prematurely shutdown executor
|
Now open to comment ... once more. These changes are final in my mind, as far as the significant changes to library functionality. |
|
I can squash the last 6 commits or so, if that helps reviewers. |
|
Yes please. |
| transfer_managers: weakref.WeakKeyDictionary["_Multipart_close_manager", Any] = weakref.WeakKeyDictionary() | ||
|
|
||
| def abort_parallel_transfers(dry_run=False, filter_function=None): | ||
| """'cls' should be tuple to extract the current synchronous transfer.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is/was 'cls'?
| if File is None: | ||
| File = gen_file_handle() | ||
| try: | ||
| f = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is f? It appears to be unused.
| # TODO - examine this experimentally restored code, as | ||
| # library should react to these two exception types(and perhaps others) by quitting all transfer threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue we can link to with this TODO? To which two exception types is it referring?
| # if queueLength > 0: | ||
| (futures, mgr, chunk_notify_queue) = retval | ||
| # else: | ||
| # futures = retval | ||
| # TODO: investigate: Huh? Why were we zeroing out total_bytes when there is no progress queue? | ||
| #chunk_notify_queue = total_bytes = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep these commented-out bits? And if so, can we get an issue number for the TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this TODO can go.
| def wait_till_true(function, timeout=LARGE_TEST_TIMEOUT, msg = ''): | ||
| """Wait for test purposes until a condition becomes true , as determined by the | ||
| return value of the provided test function. | ||
| By default, we wait at most LARGE_TEST_TIMEOUT seconds for the function to return true, and then | ||
| quit or time out. Alternatively, a timeout of None translates as a request never to time out. | ||
| If the msg value passed in is a nonzero-length string, it can be used to raise a timeout exception; | ||
| otherwise timing out causes a normal exit, relaying as the return value the last value returned | ||
| from the test function. | ||
| """ | ||
| start_time = time.clock_gettime_ns(time.CLOCK_BOOTTIME) | ||
| while not (truth_value := function()): | ||
| if ( | ||
| timeout is not None | ||
| and (time.clock_gettime_ns(time.CLOCK_BOOTTIME) - start_time) * 1e-9 | ||
| > timeout | ||
| ): | ||
| if msg: | ||
| raise TimeoutError(msg) | ||
| else: | ||
| break | ||
| time.sleep(_clock_polling_interval) | ||
| return truth_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a common place where we can put this? It's duplicated here and in test_signal_handling_in_multithread_get.py.
| # Tell the parent process the name of the local file being "get"ted (got) from iRODS | ||
| print(local_path) | ||
| sys.stdout.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be used as signal in the test on line 70 that the transfer threads have spawned. Is that understanding correct? If so, let's explicitly mention that in the comment as well because that seems really important.
| abort_parallel_transfers() | ||
| exit(128+sig) | ||
|
|
||
| signal.signal(signal.SIGTERM, handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the KeyboardInterrupt handling below the "signal handler" for SIGINT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
|
Also, take a peek at the 2 failing checks to see if there's anything actionable. |
4b0458e to
14037f9
Compare
|
Sorry about the delay. (I see some reviews are already made.) I have just submitted a squash - but no actual changes of any kind since the last note I posted above. |
Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.
Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.