Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file stuck in cache #1195

Closed
ggrrll opened this issue Jul 6, 2023 · 20 comments
Closed

file stuck in cache #1195

ggrrll opened this issue Jul 6, 2023 · 20 comments

Comments

@ggrrll
Copy link

ggrrll commented Jul 6, 2023

Which version of blobfuse was used?

2.0.4

Which OS distribution and version are you using?

Ubuntu 20.04.6 LTS

What was the issue encountered?

file does not sync downstream : the new v. fetched from remote server, gets stuck into cache (also beyond timeout)

@vibhansa-msft
Copy link
Member

Kindly share little more details on your workflow and what issue you have observed ?
Files once downloaded will be there in local disk-cache untill timeout. Post timeout if any handle to that file is open, Blobfuse will not evict them. Kindly ensure your application closes the file-handle after it is done with processing.

@vibhansa-msft vibhansa-msft self-assigned this Jul 7, 2023
@vibhansa-msft vibhansa-msft added this to the V2-2.0.5 milestone Jul 7, 2023
@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

hey,

thanks for your answer
I am not using any special application (just terminal)

A new file is pulled locally immediately, but changes are stuck in the cache folder

is there a specific flag controlling that?

thanks

@vibhansa-msft
Copy link
Member

What do you mean by "changes are stuck in cache folder". I assume what you refer here is changes are present in local cache but not updated on container. This means the file handle was not closed from your end. If you enable log_debug you can see whether a close call was received by blobfuse or not. If received whether there was any sort of failure or not.

@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

hey,

no, I meant the opposite (as I mentioned in the beginning , it's about downstream sync)

@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

also, I have another (maybe related) issues (should I open a new GH issue? )

the blobfuse2 command keeps running at 100% CPU
#1196

@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

here is my config

allow-other: true

logging:
  type: syslog
  level: log_debug

components:
  - libfuse
  - file_cache
  - attr_cache
  - azstorage

libfuse:
  attribute-expiration-sec: 120
  entry-expiration-sec: 120
  negative-entry-expiration-sec: 240

file_cache:
  path: /tmp/blob_cache
  timeout-sec: 120
  max-size-mb: 4096

attr_cache:
  timeout-sec: 20

@vibhansa-msft
Copy link
Member

vibhansa-msft commented Jul 10, 2023

Ok so what you mean is you can see the changes in the container but on blobfuse mount path you still observe contents being stale. This could be due to one of these reasons:

  1. You have file-cache timeout set at 120 so after download next 120 seconds blobfuse will not look for changes. If you do not wish this you shall set it to 0
  2. You have attribute cache enabled which means metadata of file like size, lmt etc are cached by blobfuse for 7200 seconds by default. During this time it will not look for any attribute change on container. You shall remove this from your pipeline (components section in config file)
  3. Also, libfuse caching is set which shall be set to 0 so that kernel does not cache the metadata of file
  4. Last is kernel might cache the file contents in its page-cache. This is tricky to clear. Either you can manually clear the kernel page-cache so that its forced to re-read the data from blobfuse. Or you can use "-o direct_io" cli option to start blobfuse in a mode where it disables kernel cache.

@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

allow-other: true

logging:
  type: syslog
  level: log_debug

components:
  - libfuse
  - file_cache
  - azstorage

libfuse:
  attribute-expiration-sec: 0
  entry-expiration-sec: 0
  negative-entry-expiration-sec: 0

file_cache:
  path: /tmp/blob_cache
  timeout-sec: 0
  max-size-mb: 4096

attr_cache:
  timeout-sec: 20

still no downstream sync (automatic fetch)

@ggrrll
Copy link
Author

ggrrll commented Jul 10, 2023

same problem if running with

sudo blobfuse2 -o direct_io mount poc_pipeline_blob/ --config-file=config.yaml

@vibhansa-msft
Copy link
Member

Can you try manually cleaning up the kernel cache and confirm that its a caching issue? Before you try to read a file (where you expect it to fetch updated contents) just clean up the kernel page cache using "sysctl -w vm.drop_caches=3" command.

@ggrrll
Copy link
Author

ggrrll commented Jul 13, 2023

so yes, after cleaning the cache (sysctl -w vm.drop_caches=3) the file is synced

this happens with both config files shown above

so, what should I change in the config?
thanks

@vibhansa-msft
Copy link
Member

This means kernel is not honoring the direct_io flag and still caching the contents. Are you mounting on inside a container or AKS environment?

@ggrrll
Copy link
Author

ggrrll commented Jul 17, 2023

no, just in the normal linux filesystem (VM) (/home)

@vibhansa-msft
Copy link
Member

We are not able to reproduce this locally. If we provide a potential fix, will you able to try out a private build in your environment.

@ggrrll
Copy link
Author

ggrrll commented Jul 17, 2023

yes, I think so
in case, I have also opened an Azure support ticket on this

@ggrrll
Copy link
Author

ggrrll commented Jul 26, 2023

this seems fixed -- it is changing also depending how the file is opened (I guess this is a microsoft issue):
less , cat and nano gives sometimes different results 🤯

@ggrrll ggrrll closed this as completed Jul 26, 2023
@stegal-bh
Copy link

Any news about this issue ?

@ggrrll
Copy link
Author

ggrrll commented Sep 18, 2023

what do you mean? -- the issues is closed

@stegal-bh
Copy link

I'm facing the same your issue with blobfuse2 version 2.1.0 on Azure Linux VM. I'd like to know what the solution could be. Thanks

@stegal-bh
Copy link

stegal-bh commented Sep 18, 2023

Sorry, I found the solution. In my mount config file for the blob storage I added true value to the disable-writeback-cache
Now it's working fine

libfuse:
  disable-writeback-cache: true

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants