- commit
- 003234c9ae10eddfc1f4a4df3aecd8fbb2b1c3c3
- parent
- 5f4fb68b95c44288a575e018fbffbbe689b8a70f
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2023-02-04 17:44
extend async post
Diffstat
M | _content/posts/2023-01-29-python-async-loops/index.md | 293 | +++++++++++++++++++++++++++++++++++++++++++++++++------------ |
M | _content/posts/2023-01-29-python-async-loops/index.yml | 2 | +- |
2 files changed, 239 insertions, 56 deletions
diff --git a/_content/posts/2023-01-29-python-async-loops/index.md b/_content/posts/2023-01-29-python-async-loops/index.md
@@ -1,5 +1,5 @@ 1 1 [asyncio](https://peps.python.org/pep-3156/) was first added to the python2 -1 standard library more than 10 years ago. Asynchronous IO had already been-1 2 standard library more than 10 years ago. Asynchronous I/O had already been 3 3 possible before that, by using libraries such as twisted or gevent. But asyncio 4 4 was an attempt to bring the community together and standardize on a common 5 5 solution. @@ -10,12 +10,12 @@ in JavaScript. 10 10 11 11 But maybe I just don't understand asyncio properly yet. I learn best by trying 12 12 to recreate the thing I want to learn about. So in this post I will retrace the13 -1 history of asynchronous programming, specifically in python, but I guess much14 -1 of this translates to other languages. Hopefully this will allow me to better15 -1 understand and appreciate what asyncio is doing. And hopefully you will enjoy16 -1 accompanying me on that journey.-1 13 history of asynchronous programming. I will concentrate on python, but I guess -1 14 much of this translates to other languages. Hopefully this will allow me to -1 15 better understand and appreciate what asyncio is doing. And hopefully you will -1 16 enjoy accompanying me on that journey. 17 1718 -1 If you are interested, all seven implementations are available on-1 18 If you are interested, all eight implementations are available on 19 19 [github](https://github.com/xi/python_async_loops). 20 20 21 21 # Setup @@ -170,8 +170,9 @@ finally: 170 170 These are just the parts of the code that changed. I used `fnctl` to set the 171 171 file descriptor to non-blocking mode. In this mode, `os.read()` will raise a 172 172 `BlockingIOError` if there is nothing to read. This is great because we cannot173 -1 get stuck on a blocking read. However, this loop will fully saturate the CPU.174 -1 This is called a busy loop and obviously not what we want.-1 173 get stuck on a blocking read. However, this loop will just keep trying and -1 174 fully saturate the CPU. This is called a busy loop and obviously not what we -1 175 want. 175 176 176 177 # Implementation 3: Sleepy Loop 177 178 @@ -195,7 +196,7 @@ By simply adding a `sleep()` we get the benefits of both of the first two 195 196 implementation: We cannot get stuck on a blocking read, but we also do not end 196 197 up in a busy loop. This is still far from perfect though: If data arrives 197 198 quickly we introduce a very noticeable delay of 1 second. And if data arrives198 -1 slowly we wake up much more often that would be needed. We can adjust the sleep-1 199 slowly we wake up much more often than would be needed. We can adjust the sleep 199 200 duration to the specific case, but it will never be perfect. 200 201 201 202 # Implementation 4: Select Loop @@ -309,7 +310,7 @@ intervals, similar to what you might know from JavaScript. 309 310 # Aside: Everything is a File 310 311 311 312 So far our loops can react to files and timeouts, but is that enough? My first312 -1 impression is that in unix, "everything is a file", so this should get us-1 313 hunch is that in unix, "everything is a file", so this should get us 313 314 pretty far. But let's take a closer look. 314 315 315 316 - I was surprised to learn that processes have *not* been files in unix for @@ -325,6 +326,8 @@ pretty far. But let's take a closer look. 325 326 into your select loop: The [self-pipe trick](https://cr.yp.to/docs/selfpipe.html): 326 327 327 328 ```python -1 329 import signal -1 330 328 331 def register_signal(sig, callback): 329 332 def on_signal(*args): 330 333 os.write(pipe_w, b'.') @@ -334,18 +337,19 @@ pretty far. But let's take a closer look. 334 337 callback() 335 338 336 339 pipe_r, pipe_w = os.pipe()337 -1 signallib.signal(sig, on_signal)-1 340 signal.signal(sig, on_signal) 338 341 loop.register_file(pipe_r, wrapper) 339 342 ``` 340 343341 -1 - Any network connections use sockets which can be used with select.-1 344 - Network connections use sockets which can be used with select. 342 345 Unfortunately, most libraries that implement specific network protocols343 -1 (e.g. HTTP) do not expose the underlying socket in a way that would allow344 -1 us to integrate them with our select loop. So while it is possible to do345 -1 network requests in a select loop, you will have to reinvet a lot of346 -1 wheels.-1 346 (e.g. HTTP) are not really reusable because they do not expose the -1 347 underlying socket. Some years ago there was a [push to create more reusable -1 348 protocol implementations](https://sans-io.readthedocs.io/) which produced -1 349 the [hyper project](https://github.com/python-hyper). Unfortunately it -1 350 didn't really gain traction. 347 351348 -1 Another issue with reusing existing code is that python likes to buffer a-1 352 - Another issue with reusing existing code is that python likes to buffer a 349 353 lot. This can have [surprising 350 354 effects](https://github.com/python/cpython/issues/101053) when the selector 351 355 tells you that the underlying file descriptor is empty, but there is still @@ -353,9 +357,8 @@ pretty far. But let's take a closer look. 353 357 354 358 # Implementation 6: Generator Loop 355 359356 -1 We are down to the final two, but there is still a lot of conceptual ground to357 -1 cover. Before we get to the final version (async/await), we have to talk about358 -1 generators.-1 360 We are getting closer to asyncio, but there is still a lot of conceptual ground -1 361 to cover. Before we get to async/await, we have to talk about generators. 359 362 360 363 ## Motivation 361 364 @@ -543,8 +546,11 @@ There are a few more things you can do with generators: 543 546 544 547 - `generator.close()` is like `generator.throw(GeneratorExit)` 545 548546 -1 - `field from foo` is like `for item in foo: yield item`-1 549 - `yield from foo` is like `for item in foo: yield item` 547 550 -1 551 For a more in-depth discussion of generators I can recommend the [introduction -1 552 to async/await by Brett -1 553 Cannon](https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/). 548 554 549 555 ## The Loop 550 556 @@ -580,23 +586,28 @@ class Task: 580 586 def __init__(self, gen): 581 587 self.gen = gen 582 588 self.files = set()583 -1 self.times = {0}584 -1 self.init = False-1 589 self.times = set() 585 590 self.done = False 586 591 self.result = None 587 592588 -1 def step(self, files, now):-1 593 def set_result(self, result): -1 594 self.done = True -1 595 self.result = result -1 596 -1 597 def init(self): -1 598 try: -1 599 self.files, self.times = next(self.gen) -1 600 except StopIteration as e: -1 601 self.set_result(e.value) -1 602 -1 603 def wakeup(self, files, now): 589 604 try: 590 605 if self.done: 591 606 return592 -1 elif not self.init:593 -1 self.files, self.times = next(self.gen)594 -1 self.init = True595 607 elif any(t < now for t in self.times) or files & self.files: 596 608 self.files, self.times = self.gen.send((files, now)) 597 609 except StopIteration as e:598 -1 self.done = True599 -1 self.result = e.value-1 610 self.set_result(e.value) 600 611 601 612 def close(self): 602 613 self.gen.close() @@ -605,11 +616,12 @@ class Task: 605 616 def run(gen): 606 617 task = Task(gen) 607 618 try: -1 619 task.init() 608 620 while not task.done: 609 621 now = time.time() 610 622 timeout = min((t - now for t in task.times), default=None) 611 623 files = {key.fileobj for key, mask in selector.select(timeout)}612 -1 task.step(files, time.time())-1 624 task.wakeup(files, time.time()) 613 625 return task.result 614 626 finally: 615 627 task.close() @@ -622,6 +634,8 @@ def sleep(t): 622 634 def gather(*generators): 623 635 subtasks = [Task(gen) for gen in generators] 624 636 try: -1 637 for task in subtasks: -1 638 task.init() 625 639 while True: 626 640 wait_files = set().union( 627 641 *[t.files for t in subtasks if not t.done] @@ -631,7 +645,7 @@ def gather(*generators): 631 645 ) 632 646 files, now = yield wait_files, wait_times 633 647 for task in subtasks:634 -1 task.step(files, now)-1 648 task.wakeup(files, now) 635 649 if all(task.done for task in subtasks): 636 650 return [task.result for task in subtasks] 637 651 finally: @@ -705,7 +719,6 @@ state of generators. 705 719 # Implementation 7: async/await Loop 706 720 707 721 From here it is a small step to async/await. Generators that are used for708 -1 :qa709 722 asynchronous execution have already been called "coroutines" in PEP 342. [PEP 710 723 492](https://peps.python.org/pep-0492/) (2015) deprecated that approach in 711 724 favor of "native coroutines" and async/await. @@ -770,25 +783,30 @@ class AYield: 770 783 771 784 class Task: 772 785 def __init__(self, coro):773 -1 self.iter = coro.__await__()-1 786 self.gen = coro.__await__() 774 787 self.files = set()775 -1 self.times = {0}776 -1 self.init = False-1 788 self.times = set() 777 789 self.done = False 778 790 self.result = None 779 791780 -1 def step(self, files, now):-1 792 def set_result(self, result): -1 793 self.done = True -1 794 self.result = result -1 795 -1 796 def init(self): -1 797 try: -1 798 self.files, self.times = next(self.gen) -1 799 except StopIteration as e: -1 800 self.set_result(e.value) -1 801 -1 802 def wakeup(self, files, now): 781 803 try: 782 804 if self.done: 783 805 return784 -1 elif not self.init:785 -1 self.files, self.times = next(self.gen)786 -1 self.init = True787 806 elif any(t < now for t in self.times) or files & self.files: 788 807 self.files, self.times = self.gen.send((files, now)) 789 808 except StopIteration as e:790 -1 self.done = True791 -1 self.result = e.value-1 809 self.set_result(e.value) 792 810 793 811 def close(self): 794 812 self.gen.close() @@ -797,11 +815,12 @@ class Task: 797 815 def run(coro): 798 816 task = Task(coro) 799 817 try: -1 818 task.init() 800 819 while not task.done: 801 820 now = time.time() 802 821 timeout = min((t - now for t in task.times), default=None) 803 822 files = {key.fileobj for key, mask in selector.select(timeout)}804 -1 task.step(files, time.time())-1 823 task.wakeup(files, time.time()) 805 824 return task.result 806 825 finally: 807 826 task.close() @@ -814,6 +833,8 @@ async def sleep(t): 814 833 async def gather(*coros): 815 834 subtasks = [Task(coro) for coro in coros] 816 835 try: -1 836 for task in subtasks: -1 837 task.init() 817 838 while True: 818 839 wait_files = set().union( 819 840 *[t.files for t in subtasks if not t.done] @@ -823,7 +844,7 @@ async def gather(*coros): 823 844 ) 824 845 files, now = await AYield((wait_files, wait_times)) 825 846 for task in subtasks:826 -1 task.step(files, now)-1 847 task.wakeup(files, now) 827 848 if all(task.done for task in subtasks): 828 849 return [task.result for task in subtasks] 829 850 finally: @@ -871,18 +892,180 @@ async def amain(): 871 892 run(amain()) 872 893 ``` 873 894874 -1 # Conclusion-1 895 # Implementation 8: asyncio -1 896 -1 897 So which kinds of loop does asyncio use? After reading [PEP -1 898 3156](https://peps.python.org/pep-3156/) I would say: That's complicated. -1 899 -1 900 At the core, asyncio is a simple callback loop. The relevant functions are -1 901 called `add_reader(file, callback)` and `call_later(delay, callback)`. -1 902 -1 903 But then asyncio adds a second layer using async/await. A simplified version -1 904 looks roughly like this: -1 905 -1 906 ```python -1 907 import asyncio -1 908 875 909876 -1 These were seven different versions of asynchronous loops. I think this is my877 -1 longest post yet, mostly due to the sheer amount of code.-1 910 class Future: -1 911 def __init__(self): -1 912 self.callbacks = [] -1 913 self.result = None -1 914 self.execution = None -1 915 self.done = False -1 916 -1 917 def _set_done(self): -1 918 self.done = True -1 919 for callback in self.callbacks: -1 920 callback(self) -1 921 -1 922 def set_result(self, result): -1 923 self.result = result -1 924 self._set_done() -1 925 -1 926 def set_exception(self, exception): -1 927 self.exception = exception -1 928 self._set_done() -1 929 -1 930 def add_done_callback(self, callback): -1 931 self.callbacks.append(callback) -1 932 -1 933 def __await__(self): -1 934 yield self -1 935 -1 936 -1 937 class Task: -1 938 def __init__(self, coro): -1 939 self.gen = coro.__await__() -1 940 -1 941 def wakeup(self, future=None): -1 942 try: -1 943 if future and future.exception: -1 944 new_future = self.gen.throw(future.exception) -1 945 else: -1 946 new_future = next(self.gen) -1 947 new_future.add_done_callback(self.wakeup) -1 948 except StopIteration: -1 949 pass -1 950 -1 951 -1 952 async def sleep(t): -1 953 future = Future() -1 954 loop.call_later(t, future.set_result, None) -1 955 await future -1 956 -1 957 -1 958 async def amain(): -1 959 print('start') -1 960 try: -1 961 await sleep(5) -1 962 loop.stop() -1 963 finally: -1 964 print('finish') -1 965 -1 966 -1 967 loop = asyncio.new_event_loop() -1 968 task = Task(amain()) -1 969 task.wakeup() -1 970 loop.run_forever() -1 971 ``` -1 972 -1 973 When we call `task.wakeup()`, the coroutine `amain()` starts executing. It -1 974 prints `'foo'`, creates a future, and tells the loop to resolve that future in -1 975 5 seconds. Then it yields that future back down to `wakeup()`, which registeres -1 976 itself as a callback on the future. Now the loop starts running, waits for 5 -1 977 seconds, and then resolves the future. Because `wakeup()` was added as a -1 978 callback, it is now called again and passes control back into `amain()`, which -1 979 prints `'finish'`, stops the loop, and raises `StopIteration`. -1 980 -1 981 In the earlier coroutine examples, I yielded files and timeouts as conditions. -1 982 Since this version is hosted on a callback loop, it instead yields futures that -1 983 wrap loop callbacks. -1 984 -1 985 This approach works reasonably well. But I also see some issues with it. -1 986 -1 987 ## Limited support for files -1 988 -1 989 You may have noticed that I did not implement the full subprocess example this -1 990 time. This is because asyncio's coroutine layer doesn't really support files. -1 991 -1 992 Futures represent actions that are completed when the callback is called. File -1 993 callbacks are called every time data is available for reading. This disconnect -1 994 can probably be bridged somehow, but this post is already long enough and I -1 995 didn't want to go down yet another rabbit hole. -1 996 -1 997 ## Futures are not a monad -1 998 -1 999 If you know some JavaScript you have probably come across Promises. Promises -1 1000 are basically the JavaScript equivalent of Futures. However, they have a much -1 1001 nicer API. They are basically a monad, and every Haskell fan can give you an -1 1002 impromptu lecture about the awesomeness of monads. Consider the following -1 1003 snippets that do virtually the same: -1 1004 -1 1005 ```javascript -1 1006 Promise.resolve(1) -1 1007 .then(x => x + 1) -1 1008 .finally(() => console.log('done')); -1 1009 ``` -1 1010 -1 1011 ```python -1 1012 import asyncio -1 1013 -1 1014 def increment(future): -1 1015 try: -1 1016 future2.set_result(future.result() + 1) -1 1017 except Exception as e: -1 1018 future2.set_exception(e) -1 1019 -1 1020 def print_done(future): -1 1021 print('done') -1 1022 -1 1023 loop = asyncio.new_event_loop() -1 1024 -1 1025 future1 = loop.create_future() -1 1026 future1.add_done_callback(increment) -1 1027 future1.set_result(1) -1 1028 -1 1029 future2 = loop.create_future() -1 1030 future2.add_done_callback(print_done) -1 1031 -1 1032 loop.run_until_complete(future2) -1 1033 ``` -1 1034 -1 1035 ## Naming Confusion -1 1036 -1 1037 So far we have "Coroutines", "Futures", and "Tasks". The asyncio documentation -1 1038 also uses the term "Awaitables" for anything that implements `__await__()`, so -1 1039 both Coroutines and Futures are Awaitables. -1 1040 -1 1041 What really makes this complicated is that `Task` inherits from `Future`. So in -1 1042 some places, Coroutines and Futures can be used interchangably because they are -1 1043 both Awaitables -- and in other places, Coroutines and Futures can be used -1 1044 interchangably because Coroutines can automatically be wrapped in Tasks which -1 1045 makes them Futures. -1 1046 -1 1047 I wonder whether it would have been better to call Tasks "CoroutineFutures" -1 1048 instead. Probably not. That makes them sound like they are a simple wrapper, -1 1049 when in fact they are the thing that is actually driving most of the coroutine -1 1050 layer. -1 1051 -1 1052 In any case I believe the asyncio documentation could benefit from a clear -1 1053 separation of layers. First should be a description of the high level coroutine -1 1054 API including `sleep()` and `gather()`. The second part could be about the -1 1055 callback layer, including `call_later()` and `add_reader()`. The third and -1 1056 final part could explain the low level plumbing for those people who want to -1 1057 dive deep. This is the only part that needs to mention terms like "Awaitable", -1 1058 "Task", or "Future". -1 1059 -1 1060 # Conclusion 878 1061879 -1 I have certainly learned something. A bit about async primitives on linux and a880 -1 lot about generators in python. I am not sure whether I have learned a lot881 -1 about asyncio. There are still so many words I don't understand, e.g. task or882 -1 future. But at the very least, this should post serve as a helpful reference883 -1 for future endeavors.-1 1062 These were eight different versions of asynchronous loops. I have certainly -1 1063 learned something. A bit about async primitives on linux and a lot about -1 1064 generators and coroutines in python. I hope this post serves as a helpful -1 1065 reference for future endeavors. 884 1066885 -1 I am also not sure which approach I prefer. The simple cleanup in the generator886 -1 approach is a huge advantage, but it comes at the cost of significant887 -1 complexity compared to callbacks. I am still hopin there is an approach that888 -1 combines the benefits of both.-1 1067 The big question remains: Which approach is better? The simple cleanup in the -1 1068 coroutine approach is a huge advantage, but it comes at the cost of significant -1 1069 complexity compared to callbacks. The thought that we have to limit ourselves -1 1070 to one of them is not great. So here's to hoping we will someday find an -1 1071 approach that combines the benefits of both.
diff --git a/_content/posts/2023-01-29-python-async-loops/index.yml b/_content/posts/2023-01-29-python-async-loops/index.yml
@@ -1,3 +1,3 @@1 -1 title: Seven different ways to implement an asyncronous loop in python-1 1 title: Eight different ways to implement an asyncronous loop in python 2 2 date: 2023-01-29 3 3 tags: [code, linux]