visit
A lot of talk has been happening in the python community about asyncio. Asynchronous programming is the new hot thing in python. In my last post I talked about how to do asynchronous programming and what it is, but in this post we will cover whether or not it’s worth even bothering with this whole async thing. Let’s talk about some common use cases in python, and decide whether asyncio will serve you well.
Disclaimer: every point in this article assumes Python as the programming language. If you are using another language, some of these points might not be completely accurate.
This means that a single asynchronous thread will inherently have the same performance as a multi-threaded app, except for the fact that the async version won’t have any thread contention. In order to do in python you need to use multiple processes. Each process in python has its own GIL and they do not share resources at all. If you want to maximize your CPU cores, typically the rule is to have the number of processes equal the number of cores you have plus one. This rule applies to both threads and async programming.
There is in programming, and asynchronous programming is no exception. There are certain cases where asynchronous programming might benefit you, and several others where it won’t. There are even some areas where async could hurt you. In order to know if async will provide benefits, we have to consider the use cases. The most common use case would be a web-application backend (web server). This is one of the most common types of programming I come across in python, so I will try to cover it in depth. If you don’t deal with web programming, feel free to skip to the Queens Croquet Ground section.
Remember the days before online banking, when we would actually go to a bank, stand in line, and wait for them to accept our money? Since you might be too young to remember such dark days, here is a brief sum up of how banks work. It’s a cold day, and you’re in a hurry to cash your check. You walk into the bank and see a line of people standing in a single queue. You notice that there are several tellers helping customers. You wait in line, getting antsy because you left your car outside with the engine on. You get anxious when you see a slow, troublesome customer over at teller #4. Hoping that everyone in front of you only needs to simply cash a check, you stay in line. You know that teller #4 won’t be freed up anytime soon, so you start cheering in your head for teller #2 to work quickly. When you finally get up to the teller, you hand them your check, and get the cash equivalent. How long you waited in line was dependent on how long it took the bankers to handle the people in front of you. The fast-food industry, however, has learned the art of maximizing throughput. After all, a synonym for speed is in the name. When you go to a fast-food place, there is usually only one person taking orders. The person takes the order, you complete your transaction, then you wait for your food to arrive. The difference here is that while you are waiting for your food, the same teller that took your order is now taking the order of everyone else in line. Because you only wait to order, you go through the line a lot quicker.
The fast-food example is the asynchronous example. In this example there is only one clerk who is accepting incoming requests as fast as possible. Since you are limited to a single cash-register, it makes sense to only have a single person (thread) handling requests. This way you don’t waste time “waiting for the cash register” (cpu scheduling). The fast-food place instead has waiting points; waiting for another person to complete their part of the order. When a wait happens, the employee can help another customer while waiting. For example imagine the person assembling your food as the database. While you are waiting for your food, the clerk is accepting new orders. But once your food is ready, the clerk puts it all together and hands it to you. The astute reader might realize that the overall time waiting “to be finished” is still the same in both scenarios, the only difference is that in the fast-food example, you are waiting twice instead of all at once. This is true in the naive example, but your average wait times decrease as your tasks become denormalized. Imagine every person ordered the same meal at the restaurant. If this was the case, the bank style and the fast-food style of getting your food would yield the same results. The reason why fast-food chains prefer an asynchronous flow is because it yields better results with varying sizes of orders. Imagine it’s dinner time, and everyone is ordering with their families, but all you want to do is order a shake. In a synchronous example, you have to wait for a family’s meal to be ready before you can even place your order. But in the asynchronous example, you get to place your order right away, and because it doesn’t require the food assemblers (database), you can get your shake right away, rather than waiting on others. This increases your overall throughput and lowers average response times. It also means that large tasks don’t make other non-large tasks more latent.
One thing I didn’t discuss is the cost of context switching with threads. I won’t waste too much time discussing context switching, as I have already done so. Needless to say, asynchronous web servers are theoretically faster than threaded web servers even in the naive case, because you have no wasted context switches. A very simple asynchronous web server will be faster than a simple threaded version. However, this does not mean yours will be faster simply by moving to async. It depends greatly on what your web server is doing and to whether you are actually using asynchronous paradigms.
Any improvements made anywhere besides the bottleneck are an illusion.In other words, if you have a large bottleneck, you will only see nano improvements by tweaking any other part of the system. Most servers have a hard dependency on a database. That database could be a bottleneck or . Async can still provide benefits despite a bottleneck if 1) the bottleneck is a remote service (i.e. database, downstream service, redis, etc.), and 2) you can do other things for that same request while waiting on a database/service.
The next contingency is whether your web server is CPU bound. If your web server has only a couple processes that do CPU-bound tasks, async will slow you down. However, if you have a standard web server (lots of threads) and your app is mostly doing calculations (not much I/O), then threads will actually slow you down more than async. Async doesn’t really lend itself to CPU bound programs well, and it would be complicated to write, but if done correctly, would be faster than many threads due to lack of thread contention (see benchmarks at bottom of post).
The last contingency is long running connections. One fairly hot technology these days is websockets. Websockets allows a server to maintain a long-running connection to a client, and send them information without the need to poll the server. Websockets are also often used to improve performance since the client can use a single, already established connection. The problem with websockets, however, is that they must remain open. In a thread based server this means that every client is holding a thread open. Every thread held open is a potential hit to your throughput, since throughput is directly correlated to number of available threads. Async, however, can use a single process to handle all these open connections, which means the number of long running opened connections it can handle is essentially unlimited (OS socket limits are typically in the millions). If you are doing anything with long running open connections, async will benefit you regardless of performance, because you won’t need to worry about resource exhaustion. This also means that async servers are far less susceptible to slow-client DoS attacks.
I have mostly been comparing async against sync as if the two can not coexist. They can, however, coexist. Making them coexist might or might not be the right choice though. If you have an application that is currently synchronous, or even threaded, and you want to do one or two things asynchronously, you can. All you need to do is start an event loop for that one given thing. However, if you are also using threads you need to be careful as not everything asyncio does is thread safe.
If you have an asynchronous module, you need to be careful. The basic rule of thumb is, once you go full async, everything has to go async. If you accidentally write or use one non-async function that takes too long, or does some waiting, you have just blocked your entire event loop. In the case of a web-server, that means you are not accepting any requests while your event loop is blocked. The most common mistake is using the standard time.sleep(10) instead of the asynchronous version await asyncio.sleep(10). The former will make your entire event loop sleep for 10 seconds. That sounds fun, right?
Since everything an asynchronous library does needs to be asynchronous, this leads to an interesting problem. That problem is 3rd party libraries. If you decide to go asynchronous, say goodbye to all the libraries you have learned to love. Libraries such as requests, sqlalchemy, and boto3 are all off-limits. They use blocking IO and will therefore halt your event loop. You need to instead find a similar asynchronous version of the library. For example there is aiohttp, asyncpgsa, and aiobotocore respectively. Their libraries are typically newer, and do not have as many features. This also leads to a division in the python community. Almost everything that is popular in the python community is now being re-done for async support. Rather than a single library supporting both paradigms, you end up with a separate library for each paradigm.
My recommendation is that you try the asyncio library for a simple task and see what you think. It takes a while to get used to and understand how everything works. Also, if you haven’t done so yet, you should read my previous article where I explain how async works in python.
I really think async is fun, and can help you if you truly switch to an async paradigm, but it might not be worth giving up certain libraries to do it. I have already experienced much frustration when I can’t find an asynchronous library doesn’t do things the way the synchronous version does, and it leaves me confused. Also, halting your event loop can be disastrous, but unfortunately all too easy to do. That being said, Canopy runs asyncio in production and it is working very well for us. Our python apps are getting the same throughput and latency as our Java apps. Asyncio is stable enough for production, and a good number of libraries exist. Try it out and see what you think!
normal: 18.08asyncio: 1.734threaded: 2.55
To be honest, in this example we are splitting hairs. I am sure if you ran the test enough times you would see the threaded example win over asyncio a good amount as well. This really boils down to what the application code itself is doing. I use two different libraries requests and aiohttp and this code is really just about which one of those two is faster. I think asyncio and threaded are close to tied in this example because we have a very low volume of threads.
2. What happens if we do something really stupid and use asyncio/threads for a CPU-bound tasks? We see exactly why you shouldn’t do that, that’s what. In this example we will calculate the Fibonacci sequence in two ways. First, the normal way (our control), where we just try to calculate it flat out. Then we will try to calculate it in a way where every step is a new thread/co-routine. This means we should be able to calculate a lot faster as we are using map/reduce right? Wrong. This isn’t parallelism, its concurrency. Example code:
Normal: 0.060085Asyncio: 1.8154Threaded: 2.099454802002583
As you can see, there is a lot of overhead when creating new threads and dealing with context switching. The async version is 100 times as slow, and the threaded version is 200 times as slow. So what if we instead calculate the Fibonacci sequence a couple times, and instead of breaking up per calculation, we just calculate it 10 times, concurrently? See below! (same code snippet)
Normal: 0.09980539800017141Asyncio: 0.915086Threaded: 0.182169
So asyncio is about 1.2 times slower than normal, and threaded is about 1.25 times slower. In this example, all threads already existed, so you are only seeing the overhead of context switching on only 10 threads. Shows the effect of context switching.Want to talk about python or asyncio with me? I am often hanging out on the #async channel on the python developers slack channel. .