-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hourly vs daily prices #729
Comments
This is consistent with what I see in my data collection, e.g. for NASDAQ_micro
Where I'm running this at 10:30pm GMT Ideally we should refactor the data collection so each Arctic symbol is unique across instrument+contract+frequency. |
23:00 is the magic marker for daily price data eg it's the daily closing price. The volume on that row will be the total volume for the day. The code knows about this, for example when calculating volumes it does this: Of course this does lead to the type of incorrect behaviour you describe. I probably regret the decision to merge the storage of daily and hourly price data more than anything, but it would be a nighmare to unpick. I've thought about this before, and I think discussed it here as well. But let me have another stab at it. Initially I assumed that we would store individual contracts, multiple, and adjusted prices in both daily and hourly flavours. Rolling would be a nightmare since you'd have to do it for daily and hourly data seperately. Or do we just store individual contract prices in daily and hourly flavours? Then we would almaganate those to get a series we'd use for multiple prices. That would probably be easier, although it does mean that in a backtest (when you can only see adjusted prices), if you want hourly or daily you'd still have to resample it from the existing mixed index of adjusted prices. So we'd have: 1- hourly price per contract We'd probably use (2) whenever we wanted volumes, rather than extracting from (3) Some questions: Do we store (3) in a seperate table or generate it dynamically from (1) and (2) when required? How we deal with legacy data? Do we need to generate 1 and 2 from 3? Storing the data for 3 rather than dynamically creating violates DRY but will improve system speed, and could make the handling of legacy data easier. Dynamically creating 3 when needed, would mean we'd need to generate 1 and 2 from 3 - and store - the first time the new code was run. And we'd have to delete 3 at some point. Thoughts? |
I assume you want the hourly prices for your yet-to-be-implemented ST strategy. Not knowing exactly what you intend to do with hourly data, I would ask: Do you really need hourly multiple or adjusted prices? Do you need so much history that you couldn't just look at hourly data for the current contract? If your ST strategy also needs a longer-term view, would the daily data suffice? If you could get away without the hourly multiple and adjusted, then I'd suggest 1 + 2 + daily-only multiple & adjusted would be ideal. Given the existing data & code, a compromise might having them mixed (3), but using a frequency indicator rather than magic timestamps to distinguish. But would Arctic balk at that? Another note on using timestamps. You have at least some code that implies other frequencies could be supported. Can you imagine how complicated that would get, say if you needed distinguish minute, hourly, and daily data? |
Yes
I do ocasionally need adjusted prices at the hourly frequency (if there is a roll in the middle of the day), but in any case to change the backtest so it also pulled in hourly data for the current contract but used daily adjusted elsewere would be a significant and complicated change.
Probably, yes.
Not that complicated I think. I think the basic idea is you pull in per contract price data at N different frequencies and store these seperately. Most of these would have the correct time stamps, except daily where a magic timestamp would have to be used. Strictly speaking this should be the closing time for the relevant market, but it's easier to use 23:00 as for the existing data. Then you create an aggregated mixed price per contract data which is then used further down. To do this I'd probably stack them in columns then do a fill to the right so that the daily took precedence over hourly, which took precedence over 5 minute and so on. There is an issue if you really did have an hourly price at 2300 which will get over written by the special daily price, but I can't think of an elegant way of dealing with that, and it's a problem with the existing data anyway. At least this way the behaviour is explicit. Also you'd need to configure the price collector to pull in more than two price frequencies, but that's trivial. This deals with the problem of collecting data and having issues with future prices as the original issue report says, but it still means that if you looked at the merged data series you're going to see a magic time stamp for daily prices, and this would then apply to both multiple and adjusted prices. But If this is done explicitally, then it also means to get daily prices (or volumes) you could only use the magic timestamps to pull out daily prices, and to get intraday prices you could ignore the magic timestamps. So it's easy to reverse the aggregation process with multiple or adjusted prices in the backtest or elsewhere. I can't think of a use case where you'd need to distinguish minute by minute back adjusted from hourly data, and that wouldn't be possible here since there is no magic timestamp for hourly data. But again, that's no loss of functionality from what we already have. |
That sounds fine to me. Other than the issue of an hourly/daily conflicts, I think the rest is just a question of what is most practical to implement and use. |
My proposal would be to leave as much of the machinery in tact as possible for these changes, whilst allowing flexibility to change in the future. Specifically:
class mixedFreqfuturesContractPrices(futuresContractPrices):
def __init__(self, hourly:hourlyfuturesContractPrices, daily:dailyfuturesContractPrices):
self._daily = daily
self._hourly = hourly
def return_final_prices(self):
# blend the frequencies here, first step to replicate hourly mixed in with daily at 23:00
# i.e. maintain current behaviour
That seems fairly simple to me and would not touch that many points in the code. But I'm surely missing some traps. I think is similar to Rob's proposal. |
Another effect of separating hourly and daily would mean more robust spike detection, at the moment I often get spikes due to the magic 23:00 print. @robcarver17 have you changed your spike threshold? |
No |
I think there is a problem with how hourly and daily prices are collected.
If hourly prices are collected after 23:00, an hourly price with the "magic" time of 23:00 appears to be saved. When daily prices are collected, the hourly price for 23:00 already exists, and the daily price does not overwrite it (as far as I can tell).
In addition to treating the hourly price as if it is the daily close, you'll also only have the volume for that 1 hour instead of for the day.
This behavior is affected by the ignore_future_prices property.
As long as ignore_future_prices == True (the default), that is always how it will work, because the daily price will be ignored until after 23:00, at which time hourly data will probably already exist (as described above).
If ignore_future_prices == False and if you run the price update before 23:00 on the same day as the close, then whatever hourly prices are available will be saved first, then the 23:00 daily price will be saved. So daily prices are OK, but there will be a gap in the hourly prices from whatever time it was run until 23:00. Maybe that's not a big deal. But still not great, because the behavior is dependent on the schedule, and if you happen to not run on a given day, then you won't save the daily price or volume for that day - you'll have an hourly price & volume that will be treated as daily.
The text was updated successfully, but these errors were encountered: