[ACCEPTED]-How to get a list of all blobs in a repository in Git-git

Accepted answer
Score: 14

This is how I get a list of SHAs and filenames 9 for all the blobs in a repository:

$ git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(objecttype) %(rest)' | grep '^[^ ]* blob' | cut -d" " -f1,3-


  1. The 8 %(rest) atom in the format string appends the rest 7 of the input line after the object's SHA 6 to the output. In this case, this rest happens 5 to be the path name (for tree and blob objects).

  2. The 4 grep pattern is intended to match only actual 3 blobs, not tree objects which just happen 2 to have the string blob somewhere in their path 1 name.

Score: 3

First of all, there's very little chance 22 you want to do this by listing blobs. A 21 blob is just raw data; it doesn't know what 20 file it's part of. The true answer depends 19 a little bit on what exactly you're trying 18 to accomplish. For example, do you need 17 to search blobs that are part of commits 16 which aren't even accessible from the commit 15 history? If you don't, here are a couple 14 thoughts.

Perhaps the pickaxe search of git-log would 13 do what you want:

-S<string> Look for differences 12 that introduce or remove an instance of 11 <string>. Note that this is different than the string 10 simply appearing in diff output; see the 9 pickaxe entry in gitdiffcore(7) for more 8 details.

Depending on your end goal, this 7 might be way better than what you suggested 6 - you'll actually see how the string was 5 added or removed. You can of course use 4 the information you get to cat the entire 3 file, if you so desire.

Or maybe you want 2 to list revisions with git-log and use git-grep on the 1 trees (commits) it provides?

Score: 2

As I understand it from the manual, the 1 following lists all objects and their info

git cat-file --batch-all-objects --batch-check
Score: 1

If you are using git cat-file --batch-all-objects --batch-check, as suggested in J. Doe's answer, and 26 presented here, make sure to use Git 2.34 (Q4 2021)

"git cat-file --batch"(man) with 25 the --batch-all-objects option is supposed to iterate over 24 all the objects found in a repository, but 23 it used to translate these object names 22 using the replace mechanism, which defeats 21 the point of enumerating all objects in 20 the repository.

This has been corrected with 19 Git 2.34 (Q4 2021).

See commit bf97289, commit 818e393, commit 5c5b29b, commit c3660cf, commit e879295 (05 Oct 18 2021) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 092228e, 18 Oct 2021)

cat-file: disable refs/replace with --batch-all-objects

Signed-off-by: Jeff King

When we're enumerating all objects 17 in the object database, it doesn't make 16 sense to respect refs/replace.
The point 15 of this option is to enumerate all of the 14 objects in the database at a low level.
By 13 definition we'd already show the replacement 12 object's contents (under its real oid), and 11 showing those contents under another oid 10 is almost certainly working against what 9 the user is trying to do.


cat-file: use packed_object_info() for --batch-all-objects

Signed-off-by: Jeff King

When "cat-file 8 --batch-all-objects" iterates over 7 each object, it knows where to find each 6 one.
But when we look up details of the 5 object, we don't use that information at 4 all.

This patch teaches it to use the pack/offset 3 pair when we're iterating over objects in 2 a pack.
This yields a measurable speed improvement 1 (timings on a fully packed clone of linux.git)

More Related questions