Duplicate transactions in Bitcoin: a fun bug with minimal risk

There are two sets of identical transactions in the Bitcoin blockchain, one set of transactions "sandwiches" the other, both of which occurred in mid-November 2010. Duplicate transactions can cause confusion, and Bitcoin developers have been fighting against it for years in various ways. The problem is still not 100% solved, and the next potential duplicate transaction may appear in 2046. Although the risk associated with duplicate transactions is now small, it is an interesting and weird bug worth thinking about.

Overview

A normal Bitcoin transaction uses at least one output from a previous transaction by referencing the transaction ID (TXID) of the previous transaction. These unspent outputs can only be spent once, and if they can be spent twice, you can double-spend Bitcoin, making it worthless. However, there are actually exactly two sets of identical transactions in Bitcoin. This is possible because coinbase transactions do not have any transaction inputs, but newly generated coins. Therefore, it is possible for two different coinbase transactions to send the same amount to the same address and be constructed in exactly the same way, making them identical. Since these transactions are identical, the TXIDs also match because the TXID is a hash summary of the transaction data. The only other way a TXID could be duplicated is through a hash collision, which is considered unlikely and unachievable for cryptographically secure hash functions. Hash collisions like SHA256 have never occurred in Bitcoin or anywhere else.

Both sets of duplicate transactions occurred in close proximity, between 08:37 UTC on November 14, 2010 and 00:38 UTC on November 15, 2010, a span of about 16 hours. The first set of duplicate transactions was sandwiched between the second set. We classify d5d2….8599 as the first duplicate transaction because it became a duplicate first, although, oddly, it first appeared on the blockchain after another duplicate transaction, e3bf….b468.

Duplicate Transaction Details

In the images below, you can see two screenshots from the mempool.space block explorer showing the first duplicate transaction being duplicated in two different blocks.

Interestingly, when the relevant URL is entered in a web browser, the mempool.space block explorer defaults to showing the earlier block in the case of d5d2….8599 and the later block in the case of e3bf….b468. Blockstream.info and Btcscan.org have the same behavior as mempool.space. On the other hand, Blockchain.com and Blockchair.com behave differently and always show the latest version of the duplicate transaction when the URL is entered in the browser, according to our basic testing.

Of the four blocks in question, only one (block 91,812) contains the other transaction. This transaction merges the 1 BTC and 19 BTC outputs into a single 20 BTC output.

Can these outputs be spent?

Since there are two sets of the same TXID, this creates a reference problem for subsequent transactions. Each duplicate transaction is worth 50 BTC. So, these duplicate transactions involve a total of 4 x 50 BTC = 200 BTC, or, depending on how it is understood, 2 x 50 BTC = a100 BTC. In a way, there are 100 BTC that do not actually exist. As of today, all 200 BTC are unspent. As far as we know (and we may be wrong here), if someone has the private keys associated with these outputs, they can spend these bitcoins. However, once spent, the UTXO is deleted from the database and the duplicate 50 BTC will therefore be unspendable and lost, so only 100 BTC may be recovered. As to which block these coins would come from if they were spent, earlier or more recent, this may be undefined or undeterminable.

This person could have spent all the bitcoins before creating the duplicate transaction, and then created the duplicate output, creating a new entry in the database of unspent outputs. This would mean that there are not only duplicate transactions, but also duplicate transactions that may have duplicate spent outputs. If this happens, when these outputs are spent, it will be possible to create more duplicate transactions, forming a kind of duplicate chain. One must be careful about the order of events and always spend before creating a duplicate, otherwise the bitcoins may be lost forever. These new duplicate transactions will not be coinbase transactions, but "normal" transactions. Fortunately, this has never happened.

The problem with duplicate transactions

Duplicate transactions are obviously bad. They cause confusion for wallets and block explorers, and it is unclear where the bitcoins came from. It also brings many attacks and vulnerabilities. For example, you can pay someone twice with two duplicate transactions. Then, when the transacting parties decide to try to spend the funds, they may find that only half of the funds can be recovered. This can be an attack on an exchange, for example, to try to bankrupt it, while the attacker has nothing to lose because they can withdraw the funds immediately after the deposit.

Banning transactions with duplicate TXIDs

To alleviate the problem of duplicate transactions, in February 2012, Bitcoin developer Pieter Wuille proposed the BIP30 soft fork solution, which prohibits transactions with duplicate TXIDs unless the previous TXID has been spent. This soft fork applies to all blocks after March 15, 2012.

In September 2012, Bitcoin developer Greg Maxwell changed this rule so that the BIP30 check applies to all blocks, not just those after March 15, 2012. The exceptions are the two duplicate transactions mentioned earlier in this article. This fixes some DOS vulnerabilities. Technically, this is another soft fork, although the rule change only applies to blocks older than 6 months, so it does not have any of the risks associated with normal protocol rule changes.

This BIP30 check is computationally expensive. Nodes need to check all transaction outputs in the new block and check if those output endpoints already exist in the UTXO. This is probably why Wuille only checks unspent outputs, as checking all outputs would be more computationally expensive and would not allow for pruning.

BIP34

In July 2012, Bitcoin developer Gavin Andresen proposed the BIP34 soft fork, which was activated in March 2013. This protocol change requires the coinbase transaction to include the block height, which also makes block versioning possible. The block height is added as the first item in the coinbase transaction script sig. The first byte in the coinbase script sig is the number of bytes used for the block height number, and the following bytes are the block height number itself. For the first c160 years (223 / (144 blocks per day * 365 days per year)), the first byte should be 0x03. This is why today's coinbase script sig (HEX) always starts with 03. This soft fork seems to have completely solved the duplicate transaction problem, and now all transactions should be unique.

Since BIP34 has been adopted, in November 2015, Bitcoin developer Alex Morcos added a pull request to the Bitcoin Core software repository. This change means that nodes will stop doing BIP30 checks. After all, since BIP34 fixes the problem, this expensive check is no longer necessary. Although it was not known at the time, this was technically a hard fork for some very rare future blocks. Now it seems that the potential hard fork was not important because almost no one is running node software before November 2015. At forkmonitor.info, we are running Bitcoin Core 0.10.3, which was released in October 2015. Therefore, this was a pre-hard fork rule, and the client was still doing the expensive BIP30 check.

Block,983,702 Problem

It turns out that there were some coinbase transactions in blocks before BIP34 was activated, and the first bytes of the scriptSigs used at the time happened to match a valid future block height. So while BIP34 did fix this problem in almost all cases, it was not a complete 100% fix. In 2018, Bitcoin developer John Newbery printed out a full list of these potential duplications, which can be seen in the table below.

*Note: These blocks already had coinbase transactions in 2012 and 2017 that were not duplicates. 209,921 blocks (just 79 blocks away from the first halving) cannot be duplicates because BIP30 was implemented in the meantime.

Number of potential duplicate coinbase transactions by year

Thus, the next block with a possible duplicate transaction is 1,983,702, which will be generated around January 2046. The coinbase transaction in block 164,384 generated in January 2012 sent 170 BTC to seven different output addresses. Therefore, if miners in 2046 wanted to perform this attack, they would not only need to be lucky enough to find this block, but also need to burn less than 170 BTC in fees, with a total cost of slightly more than 170 BTC, including the opportunity cost of the 0.09765625 BTC block subsidy. At the current Bitcoin price of $88,500, this would cost over $15 million. As for who owns the seven addresses from the 2012 coinbase transaction, it is unknown, and the keys are likely lost. Currently, all seven output addresses from this coinbase transaction have been used, three of which were used in the same transaction. We believe that these funds may be related to the Pirate40 Ponzi scheme, but this is just speculation on our part. Therefore, this attack looks not only costly, but also almost useless to the attacker. It would be a considerable expense to remove the November 2015 node from the network 31 years ago in a hard fork.

The next vulnerable block that could be copied is 169985 from March 2012. This coinbase only cost just over 50 BTC, far less than 170 BTC. Of course, 50 BTC was the subsidy at the time, and when this coinbase transaction becomes easily duplicable in 2078, the subsidy will be much lower. So to exploit this, miners would need to burn around 50 BTC in fees that they can’t get back because they would have to be sent to old output from 2012. No one knows what the price of Bitcoin will be in 2078, but the cost of such an attack could also be prohibitively high. So this issue is probably not a major risk for Bitcoin, but still a concern.

Since the 2017 SegWit upgrade, Coinbase transactions can also contain commitments to all transactions in a block. These pre-BIP34 blocks did not contain witness commitments. So to produce a duplicate Coinbase transaction, miners would need to exclude any SegWit output redemption transactions from the block, further increasing the opportunity cost of the attack since the block may not contain many other transactions that pay the fees.

Conclusion

Given the difficulty and cost of copying transactions, and the very rare opportunity to exploit it, this copy transaction vulnerability doesn’t feel like a major security issue for Bitcoin. Still, it’s interesting to think about given the timescales involved and the novelty of the duplicate transactions. Still, developers have spent a lot of time on this issue over the years, and the date 2046 is in some developers' minds as a possible deadline to fix this issue. There are many ways to fix this bug, and it may require a soft fork. One possible fix is to enforce the SegWit commitment.