visit
The fascinating story of me diving deep into an arbitrage business on the most active blockchain platform.
This story started one long evening while I was routinely analyzing traders’ performance on the BSC blockchain. I was using our Datamint data analytics engine to look for the most profitable actors. This engine contains historical and live data being constantly streamed from the most active blockchain in terms of daily active users’ quantity — BNB Smart Chain (former Binance Smart Chain). I had a hypothesis that it was possible to find trading insights from these traders’ behavior.
Suddenly, something caught my attention. I found a trader who managed to overplay the market by 270 BNB (65,000$ at the time of writing and even more than that at the time of deal). He used only decentralized exchanges (DEX) to achieve this result.
Well, “not great not terrible,” I thought. Then, I ran a query to understand how much time it took him to achieve this result. The query console thought for a few seconds and left me with a stunning answer: one day. Really? Okay, but how many transactions did he send? Just one? How could it be?
Well, it’s technically possible to make multiple deals in one transaction. On Ethereum-like blockchains (like Polygon, Avalanche C-Chain, many others, and, of course, BSC) this would require you to write and deploy a smart contract. You’d need to code in Solidity language, but it isn’t that difficult if you have some coding experience (just be aware that most code examples of “Uniswap Arbitrage Bot” promoted online are scams that just steal your money).
But how exactly can you extract so much value in one atomic step❓
It’s time to check the transaction contents.
Let’s take a closer look at what’s happening here.
We see that this lucky guy takes 1500 BNB and swaps them to BUSD stablecoin. Then, he swaps BUSD to BSC-USD (another dollar-pegged stablecoin). After that, the trader swaps BSC-USD to CAKE (Pancakeswap DeFi protocol’s native token), which is then swapped back to BNB in its wrapped form. In the end, the trader gets 1769 BNB, effectively extracting 269 BNB of profit.
This is called arbitrage. Many assets are traded on multiple decentralized exchanges (DEX) and prices on different DEXes and asset pairs are constantly changing. A smart actor can leverage these price discrepancies and extract value in one single transaction.
After some research, I’ve found out that this example, while impressive, isn’t that common because of two things. First, in this case, the player used his own capital for trading. Second, he extracted a really big amount.
So here’s another, much more common example of arbitrage:
First of all, you may see that the order of operations is strange. The arbitrageur takes tokens from a liquidity pool before paying for them. This is called a “flash loan”. Exchange protocols like Uniswap v2 (Pancakeswap and most other DEXes on BSC are forks of Uniswap v2) allow you to take any amount of money (up to the total liquidity amount of the pool). But you have to repay all borrowed funds with a fee before the transaction ends. The protocol doesn’t bear any risk here because it’ll forcefully revert the transaction if you don’t fully repay the loan.
So, this arbitrageur leverages flash loans. This way, he can execute arbitrage without a penny of working capital (except for gas costs). But we also have another difference here: the small profit. The trader got only 1.26$ and paid something like 0.54$ for gas. So, his net profit for this transaction was less than 1$. However, arbitrageurs like this one may do dozens of arbitrage attempts per minute. This means he probably still earned a lot. Soon, we’ll know how much exactly.
At this point, I was so thrilled by these examples that I decided to dive deeper. I was determined to find out how exactly this market works. And maybe… find out how to earn a Lambo by going into an arbitrage business 😃.
Spoiler: I still don’t own a Lambo, but this was a fascinating journey. I also learned a lot about arbitrage and even tried to put myself into the arbitrageur’s shoes.
Are you ready to follow me on this journey? Let’s start then!
I decided to start with calculating how much arbitrageurs are earning.
Today, we’re talking solely about so-called flash “arbitrageurs”. These are the guys who take a flash loan, execute trades, repay the loan, and keep the profit — all in one transaction.
This part of the arbitrage market is very sweet. In fact, you don’t need working capital to do it, and you don’t risk losing your capital. When you send the transaction, you either win or the transaction is reverted. You only lose a small gas fee, usually less than 10 cents. But how to calculate the flash arbitrageurs’ total profits? After a few experiments with a manual review, I came up with a query that does the following steps:
Filter only transactions that satisfy following criteria:
- Value extracted in one of the most liquid assets on BSC: BNB, BUSD, USDT, USDC, WBTC, or WETH.
- The tx receiver isn’t a public contract (like the PancakeSwap router for example). We exclude these contracts to properly calculate the total arbitrage gas fees, including the fees for failed attempts.
Find out the contract creator — we need this because one arbitrageur may have multiple contracts and we want to group them together.
Calculate the total revenue and convert all incomes to USD value by using the median asset-BUSD DEX price of the day of the deal.
Calculate gas costs for both successful and unsuccessful attempts.
Note: When talking about an arbitrageurs’ profit, we mean gross profit: revenues minus gas costs. In reality, arbitrageurs most likely have other significant costs that include R&D team payroll and infrastructure payments (server rent, private network links, etc.). The exact cost structure is the arbitrageur’s know-how of arbitrageur, we can’t discover it with on-chain analysis. This tool is powerful but obviously has limitations.
The resulting query turned out to be really compute-intensive. In normal circumstances, it might have taken days or weeks to be executed (especially given the enormous size of the BNB Smart Chain archive). However, thanks to the very fast analytical database at the core of Datamint’s data analytics engine and some help from my fellow colleagues with query optimizations, I got all the data in a matter of minutes. Then, I connected a visualization tool to the resulting data marts. And here you have it: the Datamint BSC Flash Arbitrage Monitor. You can check it out live on our website (see the link at the end of the article).
But let’s get back to our analysis. We can see that all profitable arbitrageurs earned 138,05m$ since the 1st of January 2021. This is essentially almost the whole history of BNB Smart Chain. The BNB Smart Chain was launched in September 2020 and gained momentum in Q4 2020.
We also can see a slow but steady decline in arbitrageurs’ profits. (That’s excluding the spikes which we’ll discuss later). I interpret this decline as the fact that DeFi markets are becoming more mature and efficient, leaving less room for value extraction by arbitrage.
I was surprised by how many arbitrageurs suffer direct losses –some of them losing thousands of dollars.
And this isn’t because of gas costs (gas costs are really a small fraction of revenue for a wide majority of arbitrageurs). They don’t really have to carry these losses because you can revert a transaction if you see that it’s about to have negative profit. I can only think that their losses are due to a bad math job. And this is not so surprising as this sweet market attracts both professionals and amateurs.
Let’s take a look at the most successful guy, I call him “The BSC Arbitrage King”. He’s been in the game since the beginning and earned almost 8m$ by the end of Q2 2022.
He owns 32 arbitrage contracts, all of them profitable (well, he’s a professional, no doubt). Looking at his profit dynamics, we see that his earnings are generally lower in 2022 than in 2021. This is partly because of the more efficient market and partly because of the lower BNB price (the average BNB price in 2022 is ~300$ vs ~400$ in 2021) and the overall market dip. However, we can see a huge spike in his profits in May. He typically earned around 5–15k$ a day in 2020. But on the 12th of May, he earned a stunning 320k$! (I can imagine what a wild party his team had that night). How?
Well, to get the right answer, I just had to Google “What happened on the crypto market on 11–12th of May 2022?”. But I decided to dive deeper into on-chain data. I wanted to learn everything the hard way. So, I’ve assembled the second query that allowed me to break down arbitrage profits by assets used in arbitrage.
Note: many arbitrages involve 3 and even 4 assets in a row, so the figures on the diagram below are non-additive (the total deal profit is credited to all participating assets).
When I ran the query and set the date filter to the 11th of May, I understood everything with one glance.
Yes, it was LUNA and UST. This was the day when insufficient liquidity in the LUNA protocol caused the algorithmic UST stablecoin to de-peg from USD. Then, the massive automatic minting of LUNA tokens caused their price to drop to the abyss.
But how did this affect the arbitrageurs’ profits? It’s easy.
When things like this happen, massive sellouts occur. And big sales in DEX protocol mean big price changes. This causes big price discrepancies between different DEXes (e.g. Pancakeswap and BiSwap). And this is exactly what arbitrageurs crave for. When people start to panic, they tend to sell faster instead of selling at an optimal price. In normal times, they’d split their order, start selling in a pair with higher liquidity, wait for the price to be recovered, etc. But that day, they didn’t have enough time to do all that. Moreover, a lot of positions in overcollateralized on-chain loan protocols (usually used for shorting) were closed by liquidators. Collaterals were sold automatically to cover the loans. These massive automatic sales also created inefficiencies. All of this brought 320k$ to our proud King of BSC Arbitrage. The total profit for all arbitrageurs on that single day was over 2m$.
To get the idea of what assets (and, specifically, combinations of assets I call “paths”) are the most profitable at “normal” times, I had to assemble another query. This query grouped profits (again, in a non-additive way) by sorted paths rather than single assets. Then, I adjusted the date filter to exclude spikes from observations.
I expected that paths made up exclusively of the most liquid and popular assets like CAKE, ALPACA, and SHIB would be most profitable. But that wasn’t true.
For some reason, many top profitable paths included FIST token, which I never heard about before. Some brief Googling showed me several websites claiming to be FistSwap or FstSwap DEX affiliated with the FIST token. At least some of them are definitely scams. I’m sure that more investigation here can bring interesting insights, but I decided that this is out of scope for this research.
An interesting thing is that I couldn’t find any significant correlation between market volatility and arbitrage earnings.
Here’s my initial hypothesis: the more volatile the market is, the more arbitrageurs earn. However, other than several big spikes, I couldn’t identify any strong relationship between arbitrage profits and volatility.
Okay, now we know two main things. First, the arbitrage market does have some money. Second, the brave ones can still find profit opportunities to chase despite the steady decline in overall USD profits.
However, before we dive deeper, we want to know a very important thing:
Because if the market is unfair, it doesn’t matter how much money it has - you won’t be able to cut the share-off player with some secret advantage. And we have a lot of reasons to be suspicious of the market’s fairness. That’s because some actors on the BNB Smart Chain can have a massive advantage. Here’s why.
When the arbitrage opportunity appears, all arbitrageurs are trying to backrun it. This means they put their transaction in exactly the next position after the transaction that creates the opportunity (say, a swap for 2000 BNB for CAKE). I’m calling this transaction “a trigger”. The winner takes it all, others just pay gas for an unsuccessful attempt. To be that fast, arbitrageurs must race to detect the trigger in the mempool (temporary storage for pending transactions). Then, they’ll have to send their arbitrage transaction to the validator first.
Obviously, two types of actors have a massive advantage here — RPC nodes (that get transactions through RPC API from, say, your Metamask wallet) and Validator nodes.
Validator nodes can reorder transactions to their advantage and even privately mine transactions without sending them to the mempool. This advantage is called the MEV (Miner Extractable Value). On the Ethereum network, we even find a special initiative called Flashbots MEV. This initiative attempts to provide fair access to private mining. However, the BSC doesn’t have anything like this. It only has 21 validators. And at any given moment, we can suspect that these validators are playing the arbitrage game to increase their profits.
So, let’s first take a look at the market share distribution.
Since the 1st of January 2021, I’ve observed 1972 arbitrageurs using 6689 custom contracts. And only 430 of them were profitable (with 2648 profitable contracts). If we look at the current situation (say, Q2 2022), we see 129 profitable arbitrageurs out of 645 in total. The market is also quite consolidated — the top 7 players hold 50+% of the market, and the top 20 hold 80+%. However, many smaller players are still within the tail of profit distribution, and they still earn their penny.
Well, this sort of market distribution doesn’t reveal how fair the market is. We need some other way to figure it out.
That’s when we want to use the on-chain data crystal ball 🔮 again.
At this moment, we have the complete history of all successful arbitrage transactions. And for each transaction, we know the number of the block. We also know the address (and sometimes the name) of its validator. We may assume that it’s unlikely that any arbitrageur has “special” relationships with all 21 validators at once. So, if an arbitrageur is leveraging MEV, his profit should be skewed to some validator.
Okay, looks like it’s time for another query. This time, I took the time period from the 1st of January 2021 and calculated the following indicators for each successful arbitrageur: total profit, average profit per deal, how many successful deals they made, and, most importantly, the biggest profit share they have with a single validator.
In theory, if an arbitrageur doesn’t have any preference for validators, this “topValidatorProfitShare” indicator should be close to 1/21 = ~5%. Actually, many other factors can skew the success rate. For instance, the arbitrageur’s nodes can share the same datacenter with some of the validators. Or their nodes can have direct peering with a validator. Heuristically, I’d assume that everything below 20% is an indicator of fair play.
So, I’ve run a query and sorted the result by total profit. And…
Well, at least the top 10 arbitrageurs show no signs of special relations with validators. The first suspicious entry (0x92ef7fac0708fc3c49921907361429ec14cd8cb6) is at position 15 with 39% of profits. It gained 1,1m$ in total from blocks validated by NodeReal. But still, this isn’t enough to say that this is unfair play. An even more suspicious entry (0xba5276f63492b351c7227a4f285593cefa250ad3) is at position 45 with 89% of profits. It got 566k$ in total from blocks validated by HashQuark. It might be a strong sign of affiliation, but in this case, the total profit looks too small. It’s approximately 6 times less than 1/21 of the total market.
Then, I tried to verify that this entry is special in affiliation with HashQuark. I’ve selected only arbitrageurs “favored” by HashQuark and sorted them by their “topValidatorProfitShare” descending.
The suspicious guy is at the first place. And by the way, he truly has an amazing average profit per successful deal — 331$. That’s a lot. However, in the second place, we see absolutely no remarkable arbitrageur with a total profit of less than 2k$. At the same time, his topValidatorProfitShare is only two percent less than of the first one. It doesn’t look like he profits from affiliation. He also has more than 6674 successful deals, so this doesn’t look like an accident.
Well, maybe both just accidentally share the same shelf in the datacenter? We’ll never know for sure.
However, I can say that overall, for a market that competitive, the evidence of affiliation is very weak (or very well disguised). It looks like most players are playing fair, and no whales are dominating the market. 😌
Okay, so we know now that the flash arbitrage market on BNB Smart Chain has a decent size, a seemingly fair structure, and a lot of opportunities. Looks like we’re ready to enter it and fight for the Lambo!
Well, not so fast! One piece is still missing from the puzzle.
How are arbitrage opportunities created?
We know how that works in general — big DEX swaps, liquidations, etc. But to start monitoring mempool, we need to know exactly what types of trigger transactions arbitrageurs observe.
Let’s get back to the lucky one who extracted 60k$+ in one transaction. Remember? That guy started my fascinating journey. By exploring the block on BSCScan explore, we can easily find his trigger transaction. This is the transaction in the same block mined immediately before the arbitrageur’s transaction. Here it is:
This is a liquidation transaction of the Alpaca Finance lending protocol. Someone had a very big position in this protocol and, clearly, it wasn’t exactly their day. Their position was liquidated and their collateral was automatically sold on PancakeSwap DEX.
Obviously, checking triggers for each tens of millions of arbitrage transactions isn’t an option. And you know already how I’m going to solve this problem.
Datamint servers and databases, data analysis tools, and a few cups of ☕️.
This query was probably the most complicated and compute-intensive. When I hit “Execute”, I could almost hear the fans of our data servers howling like jet engines. Thankfully, this server torture didn’t last long, and I got my results.
So, the results aren’t surprising at all. More than a third of arbitrage profits involve transactions to the PancakeSwap DEX as a trigger. The most popular function to serve as a trigger is the “swapExactTokensForTokens”. It accounts for ~14% of all profits. This function is a part of Uniswap V2 forked by most DEXes on BSC — e.g. BiSwap, ApeSwap, etc. The only surprising thing is again the FstSwap (or FistSwap?) on second place right after the PancakeRouter.
At this point, we can simply pick any combination of trigger addresses and functions. And now, we’re ready for experiments!
My team and I spent some resources doing more in-depth research on triggers. Mainly, we were analyzing the relationship between competition and the profitability of different triggers. However, this is clearly out of scope for this article. So, please contact us if you’re interested in more details.
At this point, I had all the theory I needed, but the theory is nothing without practice. Which means…
I have to start with a disclaimer — my goal wasn’t to build a profitable flash arbitrage bot. I have no illusions and I totally understand that this is a complex development task.
My actual goal was to cut the corners as much as possible. I wanted to understand how big the distance between an experienced player on the market and a newbie is**.**
So, I decided that I will NOT:
1. Implement simulation or heuristic estimates for potential profit created by every transaction. My guess is that this is where pro arbitrageurs compete in computation speed. For my research, I’ll craft the trigger myself and respond to my own trigger only.
2. Write and deploy an actual arbitrage contract. This isn’t that difficult, but I don’t need it for my research. If I can put my dummy transaction next to my own trigger overrunning competition, then writing the contract is a matter of technique.
3. Make specific optimizations. I won’t tweak blockchain nodes, optimize infrastructure, networking, etc. This is a rabbit hole. And if I can’t win without it, it won’t be an easy walk definitely.
Having all this in mind, I spun up a BSC node and wrote a simple script in Python. This way, I could monitor all new transactions in the mempool by constantly fetching them through the IPC connection. Then, I could look for my crafted trigger and immediately send a responding dummy transaction (transfer of 0 BNB to my own account).
Here’s how it should’ve worked in theory.
But I want to say a few words about crafting a trigger. I needed to send a transaction that would catch the attention of active arbitrageurs. This should be a big swap for at least 10k$. But how can I do this if I don’t want to put real money in the game? Luckily, when competing for reaction speed in the mempool, arbitrageurs don’t have time to check if the sender has funds to make the requested swap.
So, I took my own small swap transaction as an example and edited it.
This way, it appeared as a swap of 1m$ (BUSD) to CAKE at Pancakeswap.
This is what the transaction payload looked like on BSCScan.
The first experiment has shown that this approach works perfectly. My trigger was mined, reverted (as I don’t have 1m$ in my wallet yet 😆), and backrun by approximately 400 arbitrage attempts. 400… At that point, I was thinking that my attempt to overplay the big guys is doomed.
But I was committed to put the thing through. So, I started my Python script, waited for the node connection initialization, and sent my carefully crafted trigger using Metamask’s custom Hex Data field and online ABI (Application Binary Interface) encoder . Almost immediately, I saw a notification in the node’s console that a trigger was detected in the mempool and the dummy response transaction was sent. After a few seconds, both transactions were mined and I opened BSCscan (which isn’t easy to do when you’re crossing your fingers). I evaluated the results:
My expectations were low, but holy cow, the result was quite disappointing. Even though I managed to put my response transaction in the same block as the trigger, I lost the battle to hundreds of other arbitrageurs.
That means all of them had a massive infrastructure advantage — they had tweaked powerful nodes, special network solutions, better peering with other nodes, and some other optimizations that I don’t even know about.
Should we stop here? It would’ve been wise, but I wanted to put things to the extreme. I was going to present a very hard test to active arbitrageurs.
Here’s a short digression to better understand how the Ethereum P2P protocol works. When a node gets new transactions (either from RPC or from other nodes), it adds them to the mempool. Then, it retransmits them to other connected nodes (peers). This process has 2 special things:
Knowing these two things, I could simulate a massive advantage on my side.
I could send both trigger and response transactions in one single packet.
I could also be sure that I’m sending full transactions right away. So, in theory, arbitrageurs shouldn’t be able to insert their transactions between my trigger and my response. They’re joined in one packet, so they don’t have any latency between them.
This idea didn’t seem as wonderful as in the beginning after I had spent hours digging into the source code of bsc-geth, the official node software for the BNB Smart Chain. However, everything passes, and finally, I was able to send custom packets to 150 peers of my node.
I carefully crafted trigger and response transactions, packed them into a single packet, and sent them to peers. I’ve never been so nervous opening BSCScan ever.
What I saw left me speechless.
The result was almost the same as in the previous run! I still lost to 100+ arbitrageurs. How could it be possible? I see only two options here. Either the validators are in the game (which is unlikely based on the results of my research) or… Or, the BNB Smart Chain has so many arbitrageur nodes! This means they can intercept transactions on every route to the validator. Arbitrageurs will find triggers, insert their own response transactions, and delay weaker opponents’ transactions (yes, those like me 😢).
Looks like flash arbitrage profits aren’t easy money at all. I most likely won’t get a Lambo from this market … However, moments before I slipped into depression thinking about my broken dreams, I recalled that my business isn’t arbitrage.
I work in on-chain data analysis and value-added data services. And this research is great for my business! It shows how much you can learn using only public blockchain data, the right tools, and commitment.
This was a fascinating journey that once again, proved the power of on-chain data. But not just any data; I mean data harnessed with proper analytics and efficient tools like Datamint data engine. And the story isn’t over, as we have published the analytical application on our website. You can continue the journey yourself and look for more insights with it.
But to summarize this research, I’ll reiterate key findings:
That’s all for today. If you like this story, please drop us a few lines about topics that interest you in the realm of on-chain data. This would help us to prioritize themes for upcoming content including articles, research, online tools, and webinars.
May the data be with you!
About the author*:* Ivan Vakhmyanin is a data analytics and visualization (BI, Big Data, Data Science) expert with years of experience. He is also a blockchain and Web 3.0 adept making the on-chain data from leading blockchain platforms (Ethereum, BNB Smart Chain, Solana, etc) available for analysis. Ivan is passionate about sharing experience by developing educational programs in the field of Data-driven Management for specialists and executives.
Also Published