Skip to content

Git LFS

Marinna Martini edited this page Sep 19, 2019 · 8 revisions

Some reminders on how to use git lfs

Installs easily following these instructions: https://git-lfs.github.com/

It will handle files up to 2GB in size. It stores the files somewhere other than the standard gitub and leaves pointers on github to those files.

I want my test data and demos in a separate directory at the top level of the project, not integrated with the standard project structure in case it needed to be moved. The one exception, I have test code that uses the data files.

My first mistake - have git lfs point to the entire /data directory. It did so dutifully, and when I went on my project page on github, for each file one can only see "stored with git lfs". My README.md file for the demo wasn't read - instead the readme area said "stored with git lfs"

So I had to undo what I did. This took some figuring. While one can use the uninstall command as shown here: https://help.github.com/en/articles/removing-files-from-git-large-file-storage

That did not remove the hooks from git, so no matter what I did - those files were still over in git lfs. I had to remove the entire data directory from my project (delete, keep local), push to github, then reinstate git lfs for my project, go down into the directory where the specific files I wanted to track existed, and specifically track those files (git lfs track "9991wh.cdf" for example). To tell git, I needed to execute

git lfs updated

With git lfs updating the hooks in git, I could finally add the data directory and all its sub directories back to git. And this worked. My code can be seen online at github and my README.md displays as it should.

Adding files to git lfs

Git tracks the remote git lfs files using a file called .gitattributes. There are lines in there that are pointers to the files stored by git lfs. The sequence of commands looks like this:

git lfs track "the_new_big_data_file"
git add .gitattributes
git add the_new_big_data_file
git commit -a -m "adding a new data file to a new directory"
git push origin the_branch_I'm_in

That .gitattributes file needs to be updated and tracked properly for git lfs add/delete/push etc. operations.

Removing just one file

Here is the sequence that I did that finally, in the end, worked (the prune command was ineffective):

  • delete the line containing the file from the .gitattributes file in the same directory as the target file
  • use tortoise git to delete the file - fully, e.g. from git tracking. You could also do this from the command line or pycharm, just make sure you delete from git and from your machine.

Then this sequence worked:

(base) c:\projects\python\ADCPy\tests>git lfs status
On branch add-tests

Git LFS objects to be committed:

        ../../../../../C:/projects/python/ADCPy/tests/11121whV23857profiles.pd0 (LFS: 7b2f7c4 -> File: deleted)

Git LFS objects not staged for commit:

        ../../../../../C:/projects/python/ADCPy/tests/.gitattributes (Git: ac242b7 -> File: 7a1b594)

(base) c:\projects\python\ADCPy\tests>git commit -a -m "removing large files from tests, take 2"
[add-tests 6b090be] removing large files from tests, take 2
 2 files changed, 4 deletions(-)
 delete mode 100644 tests/11121whV23857profiles.pd0

(base) c:\projects\python\ADCPy\tests>git push origin add-tests
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 4 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 361 bytes | 361.00 KiB/s, done.
Total 4 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
To https://github.com/mmartini-usgs/ADCPy
   d6dbd4e..6b090be  add-tests -> add-tests

All the fiddling above I did with a file that was ~700MB. And moving that up to github and around cause me to cross the "free" threshold. I got a notice that I had to buy a "datapack", which, for $60 gets you 600 GB/year data transfer and 50 GB of storage. Data usage is tracked under settings->billing. So I made a smaller data (10MB) file for testing and will see, going forward, how much data usage that generates as I work with the package.

Some handy commands to remember:

  • git lfs ls-files to list the files being tracked
Clone this wiki locally