Removing and purging files from git history

SysAdminTechnology



Occasionally (as in, many times), a git source code repository needs to have something removed from it permanently, even from the history. Hint: PRIVATE SSH keys….

Step 1: Create a clone of the repository

Replace MYGITREPOSITORY with the URL of your git repository. This will also track all the branches so all branches can be cleaned as well. (source)

cd /tmp
git clone MY_GIT_REPOSITORY.git workingrepo
cd workingrepo
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
git branch --track ${branch##*/} $branch
done

Step 2: Find the files that you want to remove

Case A: Large deleted files
Large deleted files are stored in the repository and are still transfered for every clone. Here is a command that will find the 20 largest files in your git repository:

git rev-list master | while read rev; do git ls-tree -lr $rev | cut -c54- | sed -r ‘s/^ +//g;’; done | sort -u | perl -e ‘while (<>) { chomp; @stuff=split(“\t”);$sums{$stuff[1]} += $stuff[0];} print “$sums{$_} $_\n” for (keys %sums);’ | sort -rn | head -n 20

Case B: Deleting a file that contains a password
You can grep the history for the password and find the file that contains it:

git grep -i 'mypassword' $(git rev-list --all)

Case C: Deleting entire deleted directories
To get a list of entire directories that have been removed from the repository:

git log —all —pretty=format: —name-only —diff-filter=D | sed -r ‘s|[^/]+$||g’ | sort -u

Step 3: Rewrite history and remove the old files

Replace FILE_LIST with the files or directories that you are removing.

git filter-branch —tag-name-filter cat —index-filter ‘git rm -r —cached —ignore-unmatch FILE_LIST’ —prune-empty -f — —all

Step 4: Prune all references with garbage collection and reclaim space

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now</pre>

Step 5: Verify they have been removed

Run the same command that you used to in step 2 to verify that your removed files are no longer in history.

Step 6: Push the history changes

git push origin --force --all
git push origin --force --tags

Step 7: Garbage collect the server

If you are running your own server, garbage collect there as well. Servers are usually garbage collected periodically if you not running your own

cd MY_SERVER_GIT_REPO
git reflog expire --expire=now --all
git gc --aggressive --prune=now


Trackbacks

Trackback specific URI for this entry

Comments

Display comments as Linear | Threaded

No comments

Add Comment

Submitted comments will be subject to moderation before being displayed.

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
To leave a comment you must approve it via e-mail, which will be sent to your address after submission.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Gravatar, Twitter, Pavatar, Identica author images supported.
Markdown format allowed