Hourly vs daily prices #729

tgibson11 · 2022-08-06T18:45:13Z

I think there is a problem with how hourly and daily prices are collected.

If hourly prices are collected after 23:00, an hourly price with the "magic" time of 23:00 appears to be saved. When daily prices are collected, the hourly price for 23:00 already exists, and the daily price does not overwrite it (as far as I can tell).

In addition to treating the hourly price as if it is the daily close, you'll also only have the volume for that 1 hour instead of for the day.

This behavior is affected by the ignore_future_prices property.

As long as ignore_future_prices == True (the default), that is always how it will work, because the daily price will be ignored until after 23:00, at which time hourly data will probably already exist (as described above).

If ignore_future_prices == False and if you run the price update before 23:00 on the same day as the close, then whatever hourly prices are available will be saved first, then the 23:00 daily price will be saved. So daily prices are OK, but there will be a gap in the hourly prices from whatever time it was run until 23:00. Maybe that's not a big deal. But still not great, because the behavior is dependent on the schedule, and if you happen to not run on a given day, then you won't save the daily price or volume for that day - you'll have an hourly price & volume that will be treated as daily.

js190 · 2022-08-09T11:11:14Z

This is consistent with what I see in my data collection, e.g. for NASDAQ_micro

2022-08-05 19:00:00  13271.00  13276.25  13226.75  13226.75    55.0
2022-08-05 20:00:00  13259.00  13321.50  13253.00  13299.00    41.0
2022-08-05 21:00:00  13305.25  13305.25  13281.25  13290.25    48.0
2022-08-05 23:00:00  13218.25  13386.75  13169.00  13290.25   976.0

Where I'm running this at 10:30pm GMT

Ideally we should refactor the data collection so each Arctic symbol is unique across instrument+contract+frequency.
Then we could dependency inject the hourly and daily symbols into a new data class which had a get_daily_data method that did the right thing and from the hourly and daily symbols.

robcarver17 · 2022-08-17T13:51:38Z

23:00 is the magic marker for daily price data eg it's the daily closing price. The volume on that row will be the total volume for the day.

The code knows about this, for example when calculating volumes it does this:

https://github.com/robcarver17/pysystemtrade/blob/master/sysobjects/futures_per_contract_prices.py#L73

Of course this does lead to the type of incorrect behaviour you describe.

I probably regret the decision to merge the storage of daily and hourly price data more than anything, but it would be a nighmare to unpick. I've thought about this before, and I think discussed it here as well. But let me have another stab at it.

Initially I assumed that we would store individual contracts, multiple, and adjusted prices in both daily and hourly flavours. Rolling would be a nightmare since you'd have to do it for daily and hourly data seperately.

Or do we just store individual contract prices in daily and hourly flavours? Then we would almaganate those to get a series we'd use for multiple prices. That would probably be easier, although it does mean that in a backtest (when you can only see adjusted prices), if you want hourly or daily you'd still have to resample it from the existing mixed index of adjusted prices.

So we'd have:

1- hourly price per contract
2- daily prices per contract
3-mixed daily and hourly prices per contract (basically the existing data)
4- mixed multiple prices (as now)
5- mixed adjusted prices (as now)

We'd probably use (2) whenever we wanted volumes, rather than extracting from (3)

Some questions: Do we store (3) in a seperate table or generate it dynamically from (1) and (2) when required? How we deal with legacy data? Do we need to generate 1 and 2 from 3?

Storing the data for 3 rather than dynamically creating violates DRY but will improve system speed, and could make the handling of legacy data easier.

Dynamically creating 3 when needed, would mean we'd need to generate 1 and 2 from 3 - and store - the first time the new code was run. And we'd have to delete 3 at some point.

Thoughts?

tgibson11 · 2022-08-17T14:24:00Z

I assume you want the hourly prices for your yet-to-be-implemented ST strategy.

Not knowing exactly what you intend to do with hourly data, I would ask: Do you really need hourly multiple or adjusted prices? Do you need so much history that you couldn't just look at hourly data for the current contract? If your ST strategy also needs a longer-term view, would the daily data suffice?

If you could get away without the hourly multiple and adjusted, then I'd suggest 1 + 2 + daily-only multiple & adjusted would be ideal.

Given the existing data & code, a compromise might having them mixed (3), but using a frequency indicator rather than magic timestamps to distinguish. But would Arctic balk at that?

Another note on using timestamps. You have at least some code that implies other frequencies could be supported. Can you imagine how complicated that would get, say if you needed distinguish minute, hourly, and daily data?

robcarver17 · 2022-08-17T14:47:57Z

I assume you want the hourly prices for your yet-to-be-implemented ST strategy.

Yes

Not knowing exactly what you intend to do with hourly data, I would ask: Do you really need hourly multiple or adjusted prices? Do you need so much history that you couldn't just look at hourly data for the current contract? If your ST strategy also needs a longer-term view, would the daily data suffice?

I do ocasionally need adjusted prices at the hourly frequency (if there is a roll in the middle of the day), but in any case to change the backtest so it also pulled in hourly data for the current contract but used daily adjusted elsewere would be a significant and complicated change.

If you could get away without the hourly multiple and adjusted, then I'd suggest 1 + 2 + daily-only multiple & adjusted would be ideal.

Given the existing data & code, a compromise might having them mixed (3), but using a frequency indicator rather than magic timestamps to distinguish. But would Arctic balk at that?

Probably, yes.

Another note on using timestamps. You have at least some code that implies other frequencies could be supported. Can you imagine how complicated that would get, say if you needed distinguish minute, hourly, and daily data?

Not that complicated I think.

I think the basic idea is you pull in per contract price data at N different frequencies and store these seperately. Most of these would have the correct time stamps, except daily where a magic timestamp would have to be used. Strictly speaking this should be the closing time for the relevant market, but it's easier to use 23:00 as for the existing data. Then you create an aggregated mixed price per contract data which is then used further down. To do this I'd probably stack them in columns then do a fill to the right so that the daily took precedence over hourly, which took precedence over 5 minute and so on.

There is an issue if you really did have an hourly price at 2300 which will get over written by the special daily price, but I can't think of an elegant way of dealing with that, and it's a problem with the existing data anyway. At least this way the behaviour is explicit. Also you'd need to configure the price collector to pull in more than two price frequencies, but that's trivial.

This deals with the problem of collecting data and having issues with future prices as the original issue report says, but it still means that if you looked at the merged data series you're going to see a magic time stamp for daily prices, and this would then apply to both multiple and adjusted prices.

But If this is done explicitally, then it also means to get daily prices (or volumes) you could only use the magic timestamps to pull out daily prices, and to get intraday prices you could ignore the magic timestamps. So it's easy to reverse the aggregation process with multiple or adjusted prices in the backtest or elsewhere.

I can't think of a use case where you'd need to distinguish minute by minute back adjusted from hourly data, and that wouldn't be possible here since there is no magic timestamp for hourly data. But again, that's no loss of functionality from what we already have.

tgibson11 · 2022-08-17T15:19:03Z

That sounds fine to me.

Other than the issue of an hourly/daily conflicts, I think the rest is just a question of what is most practical to implement and use.

js190 · 2022-08-18T11:34:54Z

My proposal would be to leave as much of the machinery in tact as possible for these changes, whilst allowing flexibility to change in the future. Specifically:

Store prices for hourly and daily (and whatever freq.) in separate Arctic symbols/tables this would simplify update_historical_prices_for_instrument_and_contract()
have something like:

class mixedFreqfuturesContractPrices(futuresContractPrices):
       def __init__(self, hourly:hourlyfuturesContractPrices, daily:dailyfuturesContractPrices):
            self._daily = daily
            self._hourly = hourly
 
       def return_final_prices(self):
          # blend the frequencies here, first step to replicate hourly mixed in with daily at 23:00 
          # i.e. maintain current behaviour

Modify adjustment code to use mixedFreqfutuersContractPrices rather than futuresContractPrices

That seems fairly simple to me and would not touch that many points in the code. But I'm surely missing some traps.

I think is similar to Rob's proposal.

js190 · 2022-08-22T16:41:42Z

Another effect of separating hourly and daily would mean more robust spike detection, at the moment I often get spikes due to the magic 23:00 print. @robcarver17 have you changed your spike threshold?

robcarver17 · 2022-08-22T18:23:52Z

No

robcarver17 closed this as completed in 3bdbd6c Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hourly vs daily prices #729

Hourly vs daily prices #729

tgibson11 commented Aug 6, 2022

js190 commented Aug 9, 2022

robcarver17 commented Aug 17, 2022

tgibson11 commented Aug 17, 2022

robcarver17 commented Aug 17, 2022

tgibson11 commented Aug 17, 2022

js190 commented Aug 18, 2022 •

edited

Loading

js190 commented Aug 22, 2022

robcarver17 commented Aug 22, 2022

Hourly vs daily prices #729

Hourly vs daily prices #729

Comments

tgibson11 commented Aug 6, 2022

js190 commented Aug 9, 2022

robcarver17 commented Aug 17, 2022

tgibson11 commented Aug 17, 2022

robcarver17 commented Aug 17, 2022

tgibson11 commented Aug 17, 2022

js190 commented Aug 18, 2022 • edited Loading

js190 commented Aug 22, 2022

robcarver17 commented Aug 22, 2022

js190 commented Aug 18, 2022 •

edited

Loading