boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border #3058

pcbl · 2020-12-08T07:40:15Z

Describe the bug
I am using Loki 2.0 and have a Loki Configuration that uses Boltdb-shipper, fileSystem and retention for a Week(168h). Here is the vomplete configuration:

auth_enabled: false

server:
  log_level: 'warn'

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s

schema_config:
  configs:
  - from: 2018-04-15
    store: boltdb-shipper
    object_store: filesystem
    schema: v11
    index:
      prefix: index_
      period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: C:\ProgramData\POC\loki\index
    cache_location: C:\ProgramData\POC\loki\index\cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: C:\ProgramData\POC\loki\chunks
    
compactor:
  working_directory: C:\ProgramData\POC\loki\compactor
  shared_store: filesystem    

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0

table_manager:
  chunk_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  index_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  retention_deletes_enabled: true
  retention_period: 168h

I am facing a situation that after one week passed, the retention period is due and the table manager starts removing old data as expected. But it seems that for some reason the process of removing old data is leaving Loki on an inconsistent state and as soon as I perform some searches one day after the "retention-border", I am getting an object not found in storage error. Scenarios:

Search within the past 7 days(now-7d) (within retendion-border): OK!
Search on an old period that happened before the retention-border, let`s say From 11th to 10th past day (outside retention-border): OK!
Search on the day directly outside the retention border(8th day): object not found in storage
Search on a period that goes from within the retention border to something outside, From 1st day to 11th past day directly after the retention border(8th day): object not found in storage

Would that be connected with this issue here?
#2816

I am wondering if there`s a place where pre-release versions of loki are somewhere available so I could try the a build which contain this PR merged: #2855

The text was updated successfully, but these errors were encountered:

pcbl · 2020-12-08T09:11:21Z

One additional Information. On my Chunks index folder (C:\ProgramData\POC\loki\chunks\index), I have the following structure :

├───index_18591
│       compactor-1606353752.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606353300.gz
│
├───index_18592
│       compactor-1606440150.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606439700.gz
│
├───index_18593
│       compactor-1606526549.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606526100.gz
│
├───index_18594
│       compactor-1606612947.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606612500.gz
│
├───index_18595
│       compactor-1606699345.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606698900.gz
│
├───index_18596
│       compactor-1606784374.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606784400.gz
│
├───index_18597
│       compactor-1606870774.gz
│       EC2AMAZ-79T5VRA-1606231180783341000-1606870800.gz
│
└───index_18598
        compactor-1607418299.gz

I have noticed that the index_18591 folder is the one that seems "corrupted". As soon as I delete it, I got no longer any object not found in storage errors. To my mind when the retention process is removing old logs, it is somehow leaving some reference in the index that no longer exists, therefore throwing the exception...

I am on a kind of dead end as I am not sure if would be possible to overcome this issue.

stale · 2021-01-09T22:20:00Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

pcbl · 2021-01-11T08:35:44Z

The issue still happens, so please do not close it.

pcbl · 2021-01-20T09:37:46Z

Just as an update... Just tested with Loki 2.1 and the issue still happens...

ajs124 · 2021-02-05T03:24:35Z

We're observing what seems to be the same issue.

stale · 2021-03-19T23:29:20Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

dbluxo · 2021-03-25T10:27:35Z

We use S3 as storage backend and have configured different S3 bucket lifecycle configuration rules per sub-folder for different tenants, i.e. for some tenants we delete the chunks after 7 days, for some only after 31 days. As soon as a tenant makes a request that exceeds its lifecycle configuration time, it also gets an object not found in storage error back in Grafana.

owen-d · 2021-05-06T13:28:41Z

Hey, you should set your max_look_back_period to be no longer than your retention period (which is set to retention_period: 168h in your example). This will ensure you don't try to look up chunks which have been deleted.

chunk_store_config:
  max_look_back_period: 0 <----change this:
  max_look_back_period: 0

Please see https://grafana.com/docs/loki/latest/configuration/#chunk_store_config for more details.

dbluxo · 2021-05-06T13:36:09Z

@owen-d Unfortunately, it is not quite that simple. I tried to describe that we have set different retention times for different tenants in S3. Therefore, the approach you suggested unfortunately does not work. In my opinion, there should simply be no error message, but Grafana/Loki should display the logs that are retrievable.

xeor · 2022-01-02T19:30:47Z

Any news here? I get this error after a cluster reinstall. I expect some missing data, but I still want whats available.. Any way to get what's there and ignore this error?

craftyc0der · 2022-03-31T18:35:50Z

I restarted my Loki pod on a new server and lost access to all historical data stored on S3. Seems strange.

pcbl changed the title ~~boltdb-shipper + retention_deletes_: "object not found in storage" when the query goes over the retention-border~~ boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border Dec 8, 2020

pcbl mentioned this issue Dec 8, 2020

query_range error: "object not found in storage" when providing an aggregated query #2978

Closed

stale bot added the stale A stale issue or PR that will automatically be closed. label Jan 9, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Jan 11, 2021

stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 19, 2021

stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 25, 2021

owen-d closed this as completed May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border #3058

boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border #3058

pcbl commented Dec 8, 2020

pcbl commented Dec 8, 2020

stale bot commented Jan 9, 2021

pcbl commented Jan 11, 2021

pcbl commented Jan 20, 2021

ajs124 commented Feb 5, 2021

stale bot commented Mar 19, 2021

dbluxo commented Mar 25, 2021

owen-d commented May 6, 2021 •

edited

Loading

dbluxo commented May 6, 2021 •

edited

Loading

xeor commented Jan 2, 2022

craftyc0der commented Mar 31, 2022

boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border #3058

boltdb-shipper + retention_deletes: "object not found in storage" when the query goes over the retention-border #3058

Comments

pcbl commented Dec 8, 2020

pcbl commented Dec 8, 2020

stale bot commented Jan 9, 2021

pcbl commented Jan 11, 2021

pcbl commented Jan 20, 2021

ajs124 commented Feb 5, 2021

stale bot commented Mar 19, 2021

dbluxo commented Mar 25, 2021

owen-d commented May 6, 2021 • edited Loading

dbluxo commented May 6, 2021 • edited Loading

xeor commented Jan 2, 2022

craftyc0der commented Mar 31, 2022

owen-d commented May 6, 2021 •

edited

Loading

dbluxo commented May 6, 2021 •

edited

Loading