by Tajinder Dhillon.
In this ‘Product Insight’, we look at how correlation coefficients can potentially ‘break down’ during periods of extreme volatility.
Refinitiv Datastream allows users to users to access a library of ‘functions’ which provide cloud-based calculations that are delivered instantly into either excel or charting environments. A library of approximately 85 functions span across categories including logical, mathematical, statistical, and technical to name a few.
For correlation, we need to use the CORR# function: CORR#(Variable1,Variable2,Period/frequency).
While straightforward to use within the product, interpreting the end output can be challenging depending on the inputs used. Certain considerations a user must consider include:
To attempt answering this question, we look at the relationship between WTI Oil vs. U.S. Energy sector. U.S. Energy sector is represented by the U.S. Datastream Energy Index (Mnemonic: ENEGYUS) which contains approximately 42 constituents totaling approximately $1.9 trillion in market capitalization.
Initiatively, we would expect a high correlation coefficient without needing to conduct a calculation of this nature.
In Exhibit 1, using daily returns over a rolling 260-day, 520-day, and 780-day window result in high correlation coefficients ranging from 0.50 – 0.70 over a three-year period between 2016-2018.
The grey line in Exhibit 1 which has the longest look back period of 780 days appears to be the most stable (lowest degree of variation) vs. the blue line (260-day) and black line (520-day). Assuming two assets are relatively stable and do not experience many outliers or extreme values, a longer look back period should yield a more precise (and stable) correlation coefficient.
Exhibit 1: Correlation Coefficient for Oil and U.S. Energy Stocks
We now look at Exhibit 2, which is identical to Exhibit 1 with the only difference being that we are now looking at the chart through March 2022.
Across all three lines, the correlation coefficient breaks down in April 2020 when WTI oil prices briefly turned negative on April 20th ($-37.6 a barrel). This resulted in a record 1-day percentage change decline of 306.0%. In comparison, the U.S. Datastream Energy Index only declined by 3.0%.
The massive dislocation in price movement between these two assets works its way through the correlation coefficient calculation and playing a large role in causing it to decline to approximately 0.17-0.20.
Fast forward to April 2021, we note that the blue line starts to stabilize as the outlier drops off. However, if we look at March 2022, the black and grey line still reflect the outlier due to the longer look back window.
This highlights the extreme care that is required for practitioners when using correlation coefficients. For example, correlation matrices are a key part of the research process, and it only displays the most recent value in the matrix and the same applies to users creating a static bar chart. This means a correlation coefficient of 0.17 or 0.20 would appear in either type of chart, which would be misleading (this happened to myself which gave me the premise to write about this).
Exhibit 2: Correlation Coefficients when ‘Outliers’ exist
Refinitiv Datastream allows users to combine multiple functions into a single-nested expression which provides flexibility and customization. For example, if a user wishes to ‘hide’ the 2020 data in the time-series chart, we can incorporate DIS# into the expression.
DIS# enables you to display the values for a series/expression only over a user-defined period. This is most used for controlling the values displayed as a line within a chart, or for determining the set of values to be used when DIS# is nested within another expression.
Exhibit 3 highlights this as we transform the blue line into a new black line which does not display data during 2020.
Exhibit 3: Hiding ‘Outliers’ in Time-Series Chart
To conclude, this example highlights how extreme values or outliers can impact correlation coefficients which ultimately need to be monitored closely. In attempting to answer our original questions above using this example, it may be more advantageous to use 1) a shorter estimation window, and 2) use weekly returns vs. daily returns.
Of course, the answer will not always be clear-cut as there will inevitably be a trade-off between using a longer estimation window for increased precision vs. using a shorter estimation window to better handle outliers and extreme values.
Furthermore, using weekly returns instead of daily returns may help reduce the noise in day-to-day variations.
Refinitiv Datastream – Financial time series database which allows you to identify and examine trends, generate, and test ideas and develop viewpoints on the market.
Refinitiv offers the world’s most comprehensive historical database for numerical macroeconomic and cross-asset financial data which started in the 1950s and has grown into an indispensable resource for financial professionals. Find out more.