blog

git clone https://git.ce9e.org/blog.git

commit
003234c9ae10eddfc1f4a4df3aecd8fbb2b1c3c3
parent
5f4fb68b95c44288a575e018fbffbbe689b8a70f
Author
Tobias Bengfort <tobias.bengfort@posteo.de>
Date
2023-02-04 17:44
extend async post

Diffstat

M _content/posts/2023-01-29-python-async-loops/index.md 293 +++++++++++++++++++++++++++++++++++++++++++++++++------------
M _content/posts/2023-01-29-python-async-loops/index.yml 2 +-

2 files changed, 239 insertions, 56 deletions


diff --git a/_content/posts/2023-01-29-python-async-loops/index.md b/_content/posts/2023-01-29-python-async-loops/index.md

@@ -1,5 +1,5 @@
    1     1 [asyncio](https://peps.python.org/pep-3156/) was first added to the python
    2    -1 standard library more than 10 years ago. Asynchronous IO had already been
   -1     2 standard library more than 10 years ago. Asynchronous I/O had already been
    3     3 possible before that, by using libraries such as twisted or gevent. But asyncio
    4     4 was an attempt to bring the community together and standardize on a common
    5     5 solution.
@@ -10,12 +10,12 @@ in JavaScript.
   10    10 
   11    11 But maybe I just don't understand asyncio properly yet. I learn best by trying
   12    12 to recreate the thing I want to learn about. So in this post I will retrace the
   13    -1 history of asynchronous programming, specifically in python, but I guess much
   14    -1 of this translates to other languages. Hopefully this will allow me to better
   15    -1 understand and appreciate what asyncio is doing. And hopefully you will enjoy
   16    -1 accompanying me on that journey.
   -1    13 history of asynchronous programming. I will concentrate on python, but I guess
   -1    14 much of this translates to other languages. Hopefully this will allow me to
   -1    15 better understand and appreciate what asyncio is doing. And hopefully you will
   -1    16 enjoy accompanying me on that journey.
   17    17 
   18    -1 If you are interested, all seven implementations are available on
   -1    18 If you are interested, all eight implementations are available on
   19    19 [github](https://github.com/xi/python_async_loops).
   20    20 
   21    21 # Setup
@@ -170,8 +170,9 @@ finally:
  170   170 These are just the parts of the code that changed. I used `fnctl` to set the
  171   171 file descriptor to non-blocking mode. In this mode, `os.read()` will raise a
  172   172 `BlockingIOError` if there is nothing to read. This is great because we cannot
  173    -1 get stuck on a blocking read. However, this loop will fully saturate the CPU.
  174    -1 This is called a busy loop and obviously not what we want.
   -1   173 get stuck on a blocking read. However, this loop will just keep trying and
   -1   174 fully saturate the CPU. This is called a busy loop and obviously not what we
   -1   175 want.
  175   176 
  176   177 # Implementation 3: Sleepy Loop
  177   178 
@@ -195,7 +196,7 @@ By simply adding a `sleep()` we get the benefits of both of the first two
  195   196 implementation: We cannot get stuck on a blocking read, but we also do not end
  196   197 up in a busy loop. This is still far from perfect though: If data arrives
  197   198 quickly we introduce a very noticeable delay of 1 second. And if data arrives
  198    -1 slowly we wake up much more often that would be needed. We can adjust the sleep
   -1   199 slowly we wake up much more often than would be needed. We can adjust the sleep
  199   200 duration to the specific case, but it will never be perfect.
  200   201 
  201   202 # Implementation 4: Select Loop
@@ -309,7 +310,7 @@ intervals, similar to what you might know from JavaScript.
  309   310 # Aside: Everything is a File
  310   311 
  311   312 So far our loops can react to files and timeouts, but is that enough? My first
  312    -1 impression is that in unix, "everything is a file", so this should get us
   -1   313 hunch is that in unix, "everything is a file", so this should get us
  313   314 pretty far. But let's take a closer look.
  314   315 
  315   316 -   I was surprised to learn that processes have *not* been files in unix for
@@ -325,6 +326,8 @@ pretty far. But let's take a closer look.
  325   326     into your select loop: The [self-pipe trick](https://cr.yp.to/docs/selfpipe.html):
  326   327 
  327   328     ```python
   -1   329     import signal
   -1   330 
  328   331     def register_signal(sig, callback):
  329   332         def on_signal(*args):
  330   333             os.write(pipe_w, b'.')
@@ -334,18 +337,19 @@ pretty far. But let's take a closer look.
  334   337             callback()
  335   338 
  336   339         pipe_r, pipe_w = os.pipe()
  337    -1         signallib.signal(sig, on_signal)
   -1   340         signal.signal(sig, on_signal)
  338   341         loop.register_file(pipe_r, wrapper)
  339   342     ```
  340   343 
  341    -1 -   Any network connections use sockets which can be used with select.
   -1   344 -   Network connections use sockets which can be used with select.
  342   345     Unfortunately, most libraries that implement specific network protocols
  343    -1     (e.g. HTTP) do not expose the underlying socket in a way that would allow
  344    -1     us to integrate them with our select loop. So while it is possible to do
  345    -1     network requests in a select loop, you will have to reinvet a lot of
  346    -1     wheels.
   -1   346     (e.g. HTTP) are not really reusable because they do not expose the
   -1   347     underlying socket. Some years ago there was a [push to create more reusable
   -1   348     protocol implementations](https://sans-io.readthedocs.io/) which produced
   -1   349     the [hyper project](https://github.com/python-hyper). Unfortunately it
   -1   350     didn't really gain traction.
  347   351 
  348    -1     Another issue with reusing existing code is that python likes to buffer a
   -1   352 -   Another issue with reusing existing code is that python likes to buffer a
  349   353     lot. This can have [surprising
  350   354     effects](https://github.com/python/cpython/issues/101053) when the selector
  351   355     tells you that the underlying file descriptor is empty, but there is still
@@ -353,9 +357,8 @@ pretty far. But let's take a closer look.
  353   357 
  354   358 # Implementation 6: Generator Loop
  355   359 
  356    -1 We are down to the final two, but there is still a lot of conceptual ground to
  357    -1 cover. Before we get to the final version (async/await), we have to talk about
  358    -1 generators.
   -1   360 We are getting closer to asyncio, but there is still a lot of conceptual ground
   -1   361 to cover. Before we get to async/await, we have to talk about generators.
  359   362 
  360   363 ## Motivation
  361   364 
@@ -543,8 +546,11 @@ There are a few more things you can do with generators:
  543   546 
  544   547 -   `generator.close()` is like `generator.throw(GeneratorExit)`
  545   548 
  546    -1 -   `field from foo` is like `for item in foo: yield item`
   -1   549 -   `yield from foo` is like `for item in foo: yield item`
  547   550 
   -1   551 For a more in-depth discussion of generators I can recommend the [introduction
   -1   552 to async/await by Brett
   -1   553 Cannon](https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/).
  548   554 
  549   555 ## The Loop
  550   556 
@@ -580,23 +586,28 @@ class Task:
  580   586     def __init__(self, gen):
  581   587         self.gen = gen
  582   588         self.files = set()
  583    -1         self.times = {0}
  584    -1         self.init = False
   -1   589         self.times = set()
  585   590         self.done = False
  586   591         self.result = None
  587   592 
  588    -1     def step(self, files, now):
   -1   593     def set_result(self, result):
   -1   594         self.done = True
   -1   595         self.result = result
   -1   596 
   -1   597     def init(self):
   -1   598         try:
   -1   599             self.files, self.times = next(self.gen)
   -1   600         except StopIteration as e:
   -1   601             self.set_result(e.value)
   -1   602 
   -1   603     def wakeup(self, files, now):
  589   604         try:
  590   605             if self.done:
  591   606                 return
  592    -1             elif not self.init:
  593    -1                 self.files, self.times = next(self.gen)
  594    -1                 self.init = True
  595   607             elif any(t < now for t in self.times) or files & self.files:
  596   608                 self.files, self.times = self.gen.send((files, now))
  597   609         except StopIteration as e:
  598    -1             self.done = True
  599    -1             self.result = e.value
   -1   610             self.set_result(e.value)
  600   611 
  601   612     def close(self):
  602   613         self.gen.close()
@@ -605,11 +616,12 @@ class Task:
  605   616 def run(gen):
  606   617     task = Task(gen)
  607   618     try:
   -1   619         task.init()
  608   620         while not task.done:
  609   621             now = time.time()
  610   622             timeout = min((t - now for t in task.times), default=None)
  611   623             files = {key.fileobj for key, mask in selector.select(timeout)}
  612    -1             task.step(files, time.time())
   -1   624             task.wakeup(files, time.time())
  613   625         return task.result
  614   626     finally:
  615   627         task.close()
@@ -622,6 +634,8 @@ def sleep(t):
  622   634 def gather(*generators):
  623   635     subtasks = [Task(gen) for gen in generators]
  624   636     try:
   -1   637         for task in subtasks:
   -1   638             task.init()
  625   639         while True:
  626   640             wait_files = set().union(
  627   641                 *[t.files for t in subtasks if not t.done]
@@ -631,7 +645,7 @@ def gather(*generators):
  631   645             )
  632   646             files, now = yield wait_files, wait_times
  633   647             for task in subtasks:
  634    -1                 task.step(files, now)
   -1   648                 task.wakeup(files, now)
  635   649             if all(task.done for task in subtasks):
  636   650                 return [task.result for task in subtasks]
  637   651     finally:
@@ -705,7 +719,6 @@ state of generators.
  705   719 # Implementation 7: async/await Loop
  706   720 
  707   721 From here it is a small step to async/await. Generators that are used for
  708    -1 :qa
  709   722 asynchronous execution have already been called "coroutines" in PEP 342. [PEP
  710   723 492](https://peps.python.org/pep-0492/) (2015) deprecated that approach in
  711   724 favor of "native coroutines" and async/await.
@@ -770,25 +783,30 @@ class AYield:
  770   783 
  771   784 class Task:
  772   785     def __init__(self, coro):
  773    -1         self.iter = coro.__await__()
   -1   786         self.gen = coro.__await__()
  774   787         self.files = set()
  775    -1         self.times = {0}
  776    -1         self.init = False
   -1   788         self.times = set()
  777   789         self.done = False
  778   790         self.result = None
  779   791 
  780    -1     def step(self, files, now):
   -1   792     def set_result(self, result):
   -1   793         self.done = True
   -1   794         self.result = result
   -1   795 
   -1   796     def init(self):
   -1   797         try:
   -1   798             self.files, self.times = next(self.gen)
   -1   799         except StopIteration as e:
   -1   800             self.set_result(e.value)
   -1   801 
   -1   802     def wakeup(self, files, now):
  781   803         try:
  782   804             if self.done:
  783   805                 return
  784    -1             elif not self.init:
  785    -1                 self.files, self.times = next(self.gen)
  786    -1                 self.init = True
  787   806             elif any(t < now for t in self.times) or files & self.files:
  788   807                 self.files, self.times = self.gen.send((files, now))
  789   808         except StopIteration as e:
  790    -1             self.done = True
  791    -1             self.result = e.value
   -1   809             self.set_result(e.value)
  792   810 
  793   811     def close(self):
  794   812         self.gen.close()
@@ -797,11 +815,12 @@ class Task:
  797   815 def run(coro):
  798   816     task = Task(coro)
  799   817     try:
   -1   818         task.init()
  800   819         while not task.done:
  801   820             now = time.time()
  802   821             timeout = min((t - now for t in task.times), default=None)
  803   822             files = {key.fileobj for key, mask in selector.select(timeout)}
  804    -1             task.step(files, time.time())
   -1   823             task.wakeup(files, time.time())
  805   824         return task.result
  806   825     finally:
  807   826         task.close()
@@ -814,6 +833,8 @@ async def sleep(t):
  814   833 async def gather(*coros):
  815   834     subtasks = [Task(coro) for coro in coros]
  816   835     try:
   -1   836         for task in subtasks:
   -1   837             task.init()
  817   838         while True:
  818   839             wait_files = set().union(
  819   840                 *[t.files for t in subtasks if not t.done]
@@ -823,7 +844,7 @@ async def gather(*coros):
  823   844             )
  824   845             files, now = await AYield((wait_files, wait_times))
  825   846             for task in subtasks:
  826    -1                 task.step(files, now)
   -1   847                 task.wakeup(files, now)
  827   848             if all(task.done for task in subtasks):
  828   849                 return [task.result for task in subtasks]
  829   850     finally:
@@ -871,18 +892,180 @@ async def amain():
  871   892 run(amain())
  872   893 ```
  873   894 
  874    -1 # Conclusion
   -1   895 # Implementation 8: asyncio
   -1   896 
   -1   897 So which kinds of loop does asyncio use? After reading [PEP
   -1   898 3156](https://peps.python.org/pep-3156/) I would say: That's complicated.
   -1   899 
   -1   900 At the core, asyncio is a simple callback loop. The relevant functions are
   -1   901 called `add_reader(file, callback)` and `call_later(delay, callback)`.
   -1   902 
   -1   903 But then asyncio adds a second layer using async/await. A simplified version
   -1   904 looks roughly like this:
   -1   905 
   -1   906 ```python
   -1   907 import asyncio
   -1   908 
  875   909 
  876    -1 These were seven different versions of asynchronous loops. I think this is my
  877    -1 longest post yet, mostly due to the sheer amount of code.
   -1   910 class Future:
   -1   911     def __init__(self):
   -1   912         self.callbacks = []
   -1   913         self.result = None
   -1   914         self.execution = None
   -1   915         self.done = False
   -1   916 
   -1   917     def _set_done(self):
   -1   918         self.done = True
   -1   919         for callback in self.callbacks:
   -1   920             callback(self)
   -1   921 
   -1   922     def set_result(self, result):
   -1   923         self.result = result
   -1   924         self._set_done()
   -1   925 
   -1   926     def set_exception(self, exception):
   -1   927         self.exception = exception
   -1   928         self._set_done()
   -1   929 
   -1   930     def add_done_callback(self, callback):
   -1   931         self.callbacks.append(callback)
   -1   932 
   -1   933     def __await__(self):
   -1   934         yield self
   -1   935 
   -1   936 
   -1   937 class Task:
   -1   938     def __init__(self, coro):
   -1   939         self.gen = coro.__await__()
   -1   940 
   -1   941     def wakeup(self, future=None):
   -1   942         try:
   -1   943             if future and future.exception:
   -1   944                 new_future = self.gen.throw(future.exception)
   -1   945             else:
   -1   946                 new_future = next(self.gen)
   -1   947             new_future.add_done_callback(self.wakeup)
   -1   948         except StopIteration:
   -1   949             pass
   -1   950 
   -1   951 
   -1   952 async def sleep(t):
   -1   953     future = Future()
   -1   954     loop.call_later(t, future.set_result, None)
   -1   955     await future
   -1   956 
   -1   957 
   -1   958 async def amain():
   -1   959     print('start')
   -1   960     try:
   -1   961         await sleep(5)
   -1   962         loop.stop()
   -1   963     finally:
   -1   964         print('finish')
   -1   965 
   -1   966 
   -1   967 loop = asyncio.new_event_loop()
   -1   968 task = Task(amain())
   -1   969 task.wakeup()
   -1   970 loop.run_forever()
   -1   971 ```
   -1   972 
   -1   973 When we call `task.wakeup()`, the coroutine `amain()` starts executing. It
   -1   974 prints `'foo'`, creates a future, and tells the loop to resolve that future in
   -1   975 5 seconds. Then it yields that future back down to `wakeup()`, which registeres
   -1   976 itself as a callback on the future. Now the loop starts running, waits for 5
   -1   977 seconds, and then resolves the future. Because `wakeup()` was added as a
   -1   978 callback, it is now called again and passes control back into `amain()`, which
   -1   979 prints `'finish'`, stops the loop, and raises `StopIteration`.
   -1   980 
   -1   981 In the earlier coroutine examples, I yielded files and timeouts as conditions.
   -1   982 Since this version is hosted on a callback loop, it instead yields futures that
   -1   983 wrap loop callbacks.
   -1   984 
   -1   985 This approach works reasonably well. But I also see some issues with it.
   -1   986 
   -1   987 ## Limited support for files
   -1   988 
   -1   989 You may have noticed that I did not implement the full subprocess example this
   -1   990 time. This is because asyncio's coroutine layer doesn't really support files.
   -1   991 
   -1   992 Futures represent actions that are completed when the callback is called. File
   -1   993 callbacks are called every time data is available for reading. This disconnect
   -1   994 can probably be bridged somehow, but this post is already long enough and I
   -1   995 didn't want to go down yet another rabbit hole.
   -1   996 
   -1   997 ## Futures are not a monad
   -1   998 
   -1   999 If you know some JavaScript you have probably come across Promises. Promises
   -1  1000 are basically the JavaScript equivalent of Futures. However, they have a much
   -1  1001 nicer API. They are basically a monad, and every Haskell fan can give you an
   -1  1002 impromptu lecture about the awesomeness of monads. Consider the following
   -1  1003 snippets that do virtually the same:
   -1  1004 
   -1  1005 ```javascript
   -1  1006 Promise.resolve(1)
   -1  1007     .then(x => x + 1)
   -1  1008     .finally(() => console.log('done'));
   -1  1009 ```
   -1  1010 
   -1  1011 ```python
   -1  1012 import asyncio
   -1  1013 
   -1  1014 def increment(future):
   -1  1015     try:
   -1  1016         future2.set_result(future.result() + 1)
   -1  1017     except Exception as e:
   -1  1018         future2.set_exception(e)
   -1  1019 
   -1  1020 def print_done(future):
   -1  1021     print('done')
   -1  1022 
   -1  1023 loop = asyncio.new_event_loop()
   -1  1024 
   -1  1025 future1 = loop.create_future()
   -1  1026 future1.add_done_callback(increment)
   -1  1027 future1.set_result(1)
   -1  1028 
   -1  1029 future2 = loop.create_future()
   -1  1030 future2.add_done_callback(print_done)
   -1  1031 
   -1  1032 loop.run_until_complete(future2)
   -1  1033 ```
   -1  1034 
   -1  1035 ## Naming Confusion
   -1  1036 
   -1  1037 So far we have "Coroutines", "Futures", and "Tasks". The asyncio documentation
   -1  1038 also uses the term "Awaitables" for anything that implements `__await__()`, so
   -1  1039 both Coroutines and Futures are Awaitables.
   -1  1040 
   -1  1041 What really makes this complicated is that `Task` inherits from `Future`. So in
   -1  1042 some places, Coroutines and Futures can be used interchangably because they are
   -1  1043 both Awaitables -- and in other places, Coroutines and Futures can be used
   -1  1044 interchangably because Coroutines can automatically be wrapped in Tasks which
   -1  1045 makes them Futures.
   -1  1046 
   -1  1047 I wonder whether it would have been better to call Tasks "CoroutineFutures"
   -1  1048 instead. Probably not. That makes them sound like they are a simple wrapper,
   -1  1049 when in fact they are the thing that is actually driving most of the coroutine
   -1  1050 layer.
   -1  1051 
   -1  1052 In any case I believe the asyncio documentation could benefit from a clear
   -1  1053 separation of layers. First should be a description of the high level coroutine
   -1  1054 API including `sleep()` and `gather()`. The second part could be about the
   -1  1055 callback layer, including `call_later()` and `add_reader()`. The third and
   -1  1056 final part could explain the low level plumbing for those people who want to
   -1  1057 dive deep. This is the only part that needs to mention terms like "Awaitable",
   -1  1058 "Task", or "Future".
   -1  1059 
   -1  1060 # Conclusion
  878  1061 
  879    -1 I have certainly learned something. A bit about async primitives on linux and a
  880    -1 lot about generators in python. I am not sure whether I have learned a lot
  881    -1 about asyncio. There are still so many words I don't understand, e.g. task or
  882    -1 future. But at the very least, this should post serve as a helpful reference
  883    -1 for future endeavors.
   -1  1062 These were eight different versions of asynchronous loops. I have certainly
   -1  1063 learned something. A bit about async primitives on linux and a lot about
   -1  1064 generators and coroutines in python. I hope this post serves as a helpful
   -1  1065 reference for future endeavors.
  884  1066 
  885    -1 I am also not sure which approach I prefer. The simple cleanup in the generator
  886    -1 approach is a huge advantage, but it comes at the cost of significant
  887    -1 complexity compared to callbacks. I am still hopin there is an approach that
  888    -1 combines the benefits of both.
   -1  1067 The big question remains: Which approach is better? The simple cleanup in the
   -1  1068 coroutine approach is a huge advantage, but it comes at the cost of significant
   -1  1069 complexity compared to callbacks. The thought that we have to limit ourselves
   -1  1070 to one of them is not great. So here's to hoping we will someday find an
   -1  1071 approach that combines the benefits of both.

diff --git a/_content/posts/2023-01-29-python-async-loops/index.yml b/_content/posts/2023-01-29-python-async-loops/index.yml

@@ -1,3 +1,3 @@
    1    -1 title: Seven different ways to implement an asyncronous loop in python
   -1     1 title: Eight different ways to implement an asyncronous loop in python
    2     2 date: 2023-01-29
    3     3 tags: [code, linux]