Page 1 of 4

Forex pairs spread & volume statistics

Posted: Thu Feb 14, 2008 7:19 pm
by michal.kreslik
[align=center] Forex pairs spread & volume statistics[/align]

Hello, friends,

one of the factors that a trading system engineer has to take into account when developing a trading system is the cost of trading for the market he is going to apply the trading system to. I guess we all have been through the trading systems that would make piles of cash if it were not for the trading costs.

Here I'd like to focus on one of the most important elements of the trading cost: the spread between the bid and ask. This value depends on various factors like:
  • market - obviously, every market has a different set of spread characteristics (EUR/USD will be different than CAD/JPY)
  • market access - you're generally going to get a higher average spread at a broker that charges fixed spreads than at an ECN, for instance. The "tiny" spread markups at crooky brokers are often marking the waterline for your trading system
  • time - here we are talking about time of day. During the day, there are distinct peaks and troughs of spread present as you'll see below in the graphs. Possibly, we can look at day of week as well, but the relative range of the day of week-based spread values is virtually non existent compared to the time of day-based range
  • trade size - the bigger your position size, the higher (worse) is also the spread (that's why the mammoths can't trade :))
  • fundamental effects - apart from the regular news announcements, there is no way to predict that the heavy truck of fundamental news is coming at you to widen your spread temporarily. However, in the statistics that I'm going to present here, these fundamental effects are all factored in and averaged out
The source data that I used for the statistics comes from my data collection servers. These servers are placed in a backbone server room. These servers are hosting my automated trading strategies and they are also collecting new quotes from EFX and other brokers to SQL database with millisecond precision for use in testing. I'm only collecting the new quotes because there's no new information in repeated quotes. For the use of this study, a quote is an object that consists of:
  • bid
  • ask
  • bid size
  • ask size
  • time
A new quote is a quote where either the bid or ask or both have changed from the previous quote. A sole change in liquidity flow (bid size or ask size) does not constitute a new quote, thus such a change is not collected.

The statistics I'm presenting to you was done on all data from EFX ranging from Sun Dec/02/2008 (start of Forex session) to Fri Feb/08/2008 (end of Forex session). I removed pairs that had too few quotes or were not traded round the clock. A total of 96,593,341 new quotes were used for the calculations:

[align=center][/align]

Here you can see that certain pairs demonstrate higher tendency towards generating new quotes than others.

I was thinking about restricting the spread statistics only to quotes with some basic, minimum liquidity (like 100000 units). EFX is an ECN type of broker, so anyone bidding or offering just 1000 units inside the interbank spread might potentially skew the spread results to the downside. But I've done a quick comparison and the difference is almost non-existent. Furthermore, if I restrict the statistics based on the minimum liquidity requirement, the average spread actually gets slightly lower. The difference is in the realm of a statistical error, though.

  • The spread graphs:

    every spread graph shows the statistics of an average spread for the particular pair in pips based on the trading hour in US Eastern Time (New York). For every hour, all quotes that happened during and including that hour were used for calculating the average (e.g., for hour 15: all quotes from 15:00.000 to 15.59.999). The values for the multipliers (like 10000 for EUR/USD, 100 for USD/JPY) that were used to get the results in pips (like 6.47 pips or 5.12 pips) rather than in the actual true decimal numbers (like 0.000647 or 0.0512) were chosen based on common trading consensus (I hope). However, there is no official rule anywhere in the world that says what the decimal values of a pip are for the particular currency pairs. In case of doubt, attached to this post you will find the source data for all the graphs - these files include the spread information in the true decimal values as well, so you can check that out.
  • The new quotes volume graphs:

    every new quotes volume graph shows the normalized statistics of the percentage quotient of the number of new quotes that happened during the particular hour in relation to the total number of all quotes that happened for the particular pair. The time partitioning is the same as explained above. Thus, for instance the value of 7% at 14 hours on the new quotes volume graph is telling you that between (and including) 14:00.000 and 14:59.999, there happened 7% of all new quotes for that particular pair.
Enough said, let's get down to brass tacks. If you look carefully, you'll see that spreads and new quotes volume/liquidity go hand in hand:

[align=center]
AUD/CAD





AUD/CHF





AUD/JPY





AUD/NZD





AUD/USD





CAD/CHF





CAD/JPY





CHF/JPY





EUR/AUD





EUR/CAD





EUR/CHF





EUR/GBP





EUR/HUF





EUR/JPY





EUR/NOK





EUR/NZD





EUR/PLN





EUR/SEK





EUR/USD





GBP/AUD





GBP/CAD





GBP/CHF





GBP/JPY





GBP/USD





NZD/JPY





NZD/USD





USD/CAD





USD/CHF





USD/CZK





USD/DKK





USD/JPY





USD/MXN





USD/NOK





USD/PLN





USD/SEK





[/align]

Comparative graphs

The average spread in ppm (parts per million) list shows all forex pairs included in this study, sorted by their relative spread. Ppm is a measure that I'm commonly using in my trading system design to arrive at comparable results across a wide range of forex pairs and/or even across the same forex pair's longer range data time span. The relative spread in ppm is simply calculated as:
  • 1000000 * (ask - bid) / bid
The higher the relative spread, the more costly is the particular pair to trade in respect to its current market value. No wonder EUR/USD has the cheapest average cost in terms of relative spread:

[align=center][/align]


Aveage spread in ppm by hour (EST) and symbol graph is unveiling the distribution of the relative spread in ppm for all forex pairs, included in this study, based on trading hours in EST. This graph is logarithmically scaled because the dispersion of the ppm spread values over all forex pairs and trading hours is rather wide:

[align=center][/align]


New quotes volume by hour (EST) and symbol graph shows us the comparative liquidity for all pairs based on the trading hours:

[align=center][/align]


And the piece de resistance of the study is the graph that is showing us the overall average spread in ppm for all fx pairs in this study, overlaid over an overall liquidity (new quotes volume) for all fx pairs, all based on the trading hours (a proof that London rulez :) ):

[align=center][/align]


I hope this study will help you in your trading systems development. Attached you'll find the source data.

Enjoy,
Michal

Posted: Fri Feb 15, 2008 12:29 am
by olwisepieeye
Michal - Thanks for sharing this info with us. It's appreciated as always.

Did you do a similar study with the quotes you have collected from the other vendors / brokers? If so, did it follow similar patterns?

Posted: Fri Feb 15, 2008 3:50 pm
by 4x=0
Awesome charts Michal. Very interesting and useful, thank you

Posted: Fri Feb 15, 2008 10:04 pm
by michal.kreslik
olwisepieeye wrote:Michal - Thanks for sharing this info with us. It's appreciated as always.

Did you do a similar study with the quotes you have collected from the other vendors / brokers? If so, did it follow similar patterns?


Yes, the general scheme stays the same.

Michal

Posted: Fri Feb 15, 2008 10:47 pm
by Annu
Thanks for sharing.

Posted: Sat Feb 16, 2008 2:32 am
by fx_d2
This some topnotch presentation.Some people pay big bucks for this type of information.Dittos again thanks for sharing.

Posted: Sat Mar 08, 2008 5:18 pm
by LW
Michal,

I have a couple of questions, please forgive me if I'm missing the point:

1) Are you saying that a change in new quotes volume has an effect on (and/or preceeds) the average pip spread or the opposite?
2) What is the magnitude of the effect?
3) What does the graph look like when you overlay price?
4) Can you run the stats in real time?

Thank's for the Info.

Posted: Sat Mar 08, 2008 11:27 pm
by michal.kreslik
LW wrote:Michal,

I have a couple of questions, please forgive me if I'm missing the point:

1) Are you saying that a change in new quotes volume has an effect on (and/or preceeds) the average pip spread or the opposite?
2) What is the magnitude of the effect?
3) What does the graph look like when you overlay price?
4) Can you run the stats in real time?

Thank's for the Info.


1) yes, of course
2) as you can see from the graphs, the correlation is very high
3) you probably mean price change, not price. it would very likely be random
4) yes

Michal

Posted: Sun Mar 09, 2008 2:04 pm
by LW
Michal,

I tried to run some preliminary stats on three currency pairs (USDJPY, AUDUSD, AUDJPY) comparing closing price to the numbers you generated for the pip spreads and new quotes volume %. Again, I may be totally off base but it appears that there are some interesting results: 1) Price and Pip spread correlate well and may be posiive or negative depending on the currency pair. 2)Price and new quotes volume % may correlate well also. 3)The USDJPY pair has a strong negative correlation for price and new quotes volume % and a strong positive correlation for price and average pip spread. This may not be news to anyone, I don't know. I also realize that the numbers may be inaccurate due to many factors such as trending markets, time of year, ect. You would have to crunch alot more numbers before you could come to a definitive conclusion. XL worksheet is attached. Thanks.

Posted: Mon Mar 10, 2008 4:14 am
by LW
Sorry, Here is the updated source data for USDJPY, AUDUSD, AUDJPY with mean price, std dviation of the price and correlation statistics. It was missing a few labels.