common : fix duplicated file name with hf_repo and hf_file #10550

ngxson · 2024-11-27T18:34:10Z

Currently, when using hf_repo and hf_file, the local file path is automatically set to {cache_dir}/{file_name}

This fails in 2 cases:

Same file name exists in different hf_repo
Same file name exists in the same hf_repo, but in different subdirs

For example:

Same file name, same repo, different subdir:

hf_repo	hf_file
ggml-org/models	bert-bge-small/ggml-model-f16.gguf
ggml-org/models	jina-reranker-v1-tiny-en/ggml-model-f16.gguf

Same file name (Llama-3.2-1B-Instruct-Q4_K_M.gguf), different repo:

I have read the contributing guidelines
Self-reported review complexity:
- Low

Blondy4life2005

Very interesting

ochafik · 2024-12-06T01:08:04Z

Quick script in case anyone else has a large cache directory to migrate to the new naming / doesn't want to redownload:

import json
from pathlib import Path
import platform
import os
import re

cache_dir = os.environ.get('LLAMA_CACHE')
if not cache_dir:
  if platform.system() == 'Darwin':
    cache_dir = os.environ['HOME'] + '/Library/Caches/llama.cpp'
  elif platform.system() == 'Windows':
    cache_dir = os.environ['LOCALAPPDATA'] + '/llama.cpp'
  else:
    cache_dir = os.environ['HOME'] + '/.cache/llama.cpp'
cache_dir = Path(cache_dir)

for json_file in cache_dir.glob('*.gguf.json'):
  data = json.loads(json_file.read_text())
  gguf_file = Path(json_file.as_posix().rstrip('.json'))
  if not gguf_file.exists():
    print(f'WARNING: {gguf_file} file does not exist. Deleting {json_file}')
    json_file.unlink()
    continue
  
  url = data['url']
  filename = os.path.basename(url)
  
  if (match := re.match(r'^https://huggingface.co/([^/]+)/([^/]+)/resolve/main/.*', url)):
    org = match.group(1)
    model = match.group(2)
    if gguf_file.name == filename:
      new_gguf_file = gguf_file.parent / f'{org}_{model}_{filename}'
      print(f'Renaming {gguf_file} to {new_gguf_file}')
      gguf_file.rename(new_gguf_file)
      json_file.rename(new_gguf_file.with_suffix('.json'))

…#10550)

common : fix duplicated file name with hf_repo and hf_file

91fd322

ngxson requested a review from ggerganov November 27, 2024 18:34

github-actions bot added examples python python script changes server labels Nov 27, 2024

ggerganov approved these changes Nov 27, 2024

View reviewed changes

ngxson merged commit 9f91251 into ggerganov:master Nov 27, 2024
52 checks passed

Blondy4life2005 reviewed Nov 28, 2024

View reviewed changes

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

common : fix duplicated file name with hf_repo and hf_file (ggerganov…

416110b

…#10550)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : fix duplicated file name with hf_repo and hf_file #10550

common : fix duplicated file name with hf_repo and hf_file #10550

ngxson commented Nov 27, 2024

Blondy4life2005 left a comment

ochafik commented Dec 6, 2024

common : fix duplicated file name with hf_repo and hf_file #10550

common : fix duplicated file name with hf_repo and hf_file #10550

Conversation

ngxson commented Nov 27, 2024

Blondy4life2005 left a comment

Choose a reason for hiding this comment

ochafik commented Dec 6, 2024