Performance & Large Repos — Git Mastery: From Zero to Expert

Learning Objectives

By the end of this module, you will be able to:

Explain how Git's object storage, packing, and garbage collection work under the hood
Configure git gc and git maintenance for automated repository optimization
Enable the commit-graph file for faster log traversal and ancestor queries
Use the filesystem monitor (fsmonitor) to speed up git status in large repos
Apply sparse checkout and partial clone strategies for monorepo development
Enable git rerere to automatically reuse recorded conflict resolutions
Choose appropriate scaling strategies for large teams and large repositories
Diagnose and resolve common performance bottlenecks in Git

1. How Git Stores Data (And Why It Slows Down)

Understanding Git's storage model is essential before optimizing it.

Git stores every piece of content as an object — blobs (file contents), trees (directory listings), commits, and tags. Initially, each object is stored as a separate compressed file in .git/objects/:

.git/objects/
├── 4a/
│   └── 8e3f...   ← a blob (file content)
├── 7c/
│   └── 2b91...   ← a tree (directory listing)
├── a1/
│   └── b2c3...   ← a commit
└── ...            ← thousands of individual files (loose objects)

As these accumulate, Git periodically packs them into a single binary file called a packfile. Packfiles use delta compression — storing only the differences between similar objects:

.git/objects/pack/
├── pack-abc123.idx    ← index: maps object hash → offset in packfile
└── pack-abc123.pack   ← packfile: all objects, delta-compressed

A repository with 100,000 loose objects might compress to a packfile 10-50x smaller, because consecutive versions of the same file share most of their content.

Where Performance Degrades

Operation	Bottleneck	Affected By
`git status`	Scanning working directory	Number of files, filesystem speed
`git log`	Walking commit graph	Number of commits, graph complexity
`git blame`	Traversing history per line	File history length
`git add`	Hashing file contents	File sizes
`git clone`	Downloading objects	Repository size (history + files)
`git merge` / `git rebase`	Conflict detection	Number of changed files
`git diff`	Computing content differences	File sizes, number of files
`git push` / `git fetch`	Transferring objects	Pack size, network speed

2. Garbage Collection (`git gc`)

What `git gc` Does

Garbage collection is Git's housekeeping process. It performs several optimizations:

Packs loose objects into packfiles with delta compression
Removes unreachable objects (orphaned commits, abandoned blobs) that are past the reflog's protection window
Packs references — consolidates individual ref files into a single packed-refs file
Prunes reflogs — removes expired reflog entries
Re-indexes packfiles for faster lookups

# Run garbage collection manually
git gc
 
# Aggressive GC — slower but more thorough compression
git gc --aggressive
 
# Preview what gc would do
git gc --dry-run

Automatic GC

Git triggers garbage collection automatically when certain thresholds are exceeded:

# Check current thresholds
git config gc.auto          # default: 6700 (loose objects before auto-gc)
git config gc.autoPackLimit # default: 50 (packfiles before auto-repack)
 
# Disable auto-gc (if using git maintenance instead)
git config gc.auto 0
 
# Customize thresholds
git config --global gc.auto 10000

When auto-gc runs, you might see:

Auto packing the repository in background for optimum performance.

GC Timing and Safety

# Control how long unreachable objects survive
git config gc.reflogExpire            # default: 90 days
git config gc.reflogExpireUnreachable  # default: 30 days
git config gc.pruneExpire              # default: 2 weeks
 
# For safety on shared servers, extend these:
git config gc.pruneExpire "1 month"

When to Run `git gc --aggressive`

--aggressive uses much more CPU and memory for better compression. Use it:

After importing a repository (e.g., migrating from SVN)
After deleting a large number of branches
After git filter-repo or other history-rewriting operations
On a schedule (monthly) for very active repositories

# Aggressive GC with custom window and depth
git gc --aggressive --prune=now
 
# Check the size before and after
git count-objects -v
# count: 0           ← loose objects
# size: 0            ← loose object size (KB)
# in-pack: 52341     ← packed objects
# packs: 1           ← number of packfiles
# size-pack: 18230   ← packfile size (KB)

3. `git maintenance`: Modern Automated Optimization

git maintenance (introduced in Git 2.29) replaces manual git gc with a scheduled, incremental maintenance system. It's smarter — it runs smaller tasks frequently instead of one big GC pass.

Enabling Maintenance

# Register the current repo for scheduled maintenance
git maintenance register
 
# Start the background scheduler
git maintenance start

This configures your system's task scheduler (launchd on macOS, systemd/cron on Linux, Task Scheduler on Windows) to run maintenance periodically.

Maintenance Tasks

Task	What It Does	Default Schedule
`commit-graph`	Updates the commit-graph file	Hourly
`prefetch`	Fetches latest objects from remotes in background	Hourly
`loose-objects`	Packs loose objects into packfiles	Daily
`incremental-repack`	Gradually repacks packfiles for better compression	Daily
`pack-refs`	Consolidates loose refs into packed-refs file	Never (on demand)
`gc`	Full garbage collection	Never (replaced by above tasks)

# Run a specific task manually
git maintenance run --task=commit-graph
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
 
# Run all scheduled tasks at once
git maintenance run
 
# See what's registered
git config --global --get-regexp maintenance

Maintenance Configuration

# Customize the schedule for specific tasks
git config maintenance.commit-graph.schedule hourly
git config maintenance.loose-objects.schedule daily
git config maintenance.incremental-repack.schedule weekly
 
# Unregister a repo from maintenance
git maintenance unregister
 
# Stop the background scheduler entirely
git maintenance stop

`git maintenance` vs. `git gc`

git gc:            "Stop everything and do a full cleanup"
git maintenance:   "Do small cleanups continuously in the background"

For active development, git maintenance is strictly better — it avoids the periodic pauses caused by auto-gc on large repos.

4. The Commit-Graph File

The Problem

Every git log, git merge-base, git branch --contains, and reachability query must walk the commit graph by reading individual commit objects from packfiles. For repositories with millions of commits, this is slow.

The Solution

The commit-graph file (.git/objects/info/commit-graph) is a precomputed, binary-format index of the commit DAG. It stores:

Each commit's parents
Each commit's root tree hash
The commit timestamp (for generation number computation)
Generation numbers — a topological metric that lets Git skip entire branches of the graph during traversal

# Generate the commit-graph file
git commit-graph write --reachable
 
# Generate with changed-paths Bloom filters (speeds up path-limited log)
git commit-graph write --reachable --changed-paths
 
# Verify the commit-graph is consistent
git commit-graph verify

Performance Impact

Without commit-graph:
  git log --oneline | wc -l     →  3.2 seconds  (100k commits)
  git merge-base A B            →  0.8 seconds

With commit-graph:
  git log --oneline | wc -l     →  0.4 seconds  (8x faster)
  git merge-base A B            →  0.01 seconds  (80x faster)

Enabling Commit-Graph Globally

# Write commit-graph on every gc/fetch/repack
git config --global fetch.writeCommitGraph true
git config --global gc.writeCommitGraph true
 
# Or just use git maintenance, which handles this automatically
git maintenance start

Changed-Path Bloom Filters

When you run git log -- path/to/file, Git normally must inspect every commit to check if it touched that path. Bloom filters precompute a probabilistic answer ("this commit definitely didn't touch this path"), dramatically speeding up path-limited log queries:

# Write commit-graph with Bloom filters
git commit-graph write --reachable --changed-paths
 
# Now this is much faster:
git log --oneline -- src/components/Header.tsx

5. Filesystem Monitor (`fsmonitor`)

The Problem

git status compares every file in the working directory against the index. For repositories with 100,000+ files, this stat-call-per-file approach takes several seconds, even when nothing has changed.

The Solution

The filesystem monitor (fsmonitor) integrates with your OS's file-change notification system (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) to tell Git which files actually changed since the last check. Git then only stats those files.

Built-in FSMonitor Daemon (Git 2.37+)

# Enable the built-in fsmonitor daemon
git config core.fsmonitor true
 
# Verify it's running
git fsmonitor--daemon status
 
# Check the improvement
time git status   # With fsmonitor: ~0.1s vs ~2s without

Using Watchman (Alternative)

Facebook's Watchman is a mature filesystem watcher that predates Git's built-in fsmonitor:

# Install Watchman
brew install watchman    # macOS
# See https://facebook.github.io/watchman/docs/install for other platforms
 
# Configure Git to use Watchman
git config core.fsmonitor "$(which watchman)"
 
# Or use the helper query script
git config core.fsmonitor .git/hooks/query-watchman

Performance Impact

Repository Size	`git status` Without FSMonitor	With FSMonitor
10,000 files	0.3s	0.05s
100,000 files	2.5s	0.08s
500,000 files	12s	0.1s

When to Enable FSMonitor

Working directories with > 10,000 files
Monorepos
Projects where git status noticeably pauses
Any developer who runs git status frequently (shell prompts, IDE integrations)

Untrackedcache

A complementary optimization that caches the state of untracked file directories:

# Enable untracked cache
git config core.untrackedCache true
 
# Update the untracked cache
git update-index --untracked-cache
 
# Check if it's active
git config core.untrackedCache

6. Sparse Checkout for Monorepos

Module 19 introduced sparse checkout briefly. Here we explore it in depth for monorepo workflows.

The Monorepo Problem

monorepo/
├── packages/
│   ├── frontend/          ← 50,000 files, you work here
│   ├── backend/           ← 30,000 files, you never touch
│   ├── mobile/            ← 40,000 files, you never touch
│   └── shared/            ← 5,000 files, you need this
├── tools/                 ← 10,000 files
└── docs/                  ← 2,000 files

With 137,000 files, git status is slow, your editor indexes everything, and disk usage is high — even though you only work on 2 of the 6 directories.

Cone Mode Sparse Checkout

Cone mode (recommended) operates on entire directories, which is much faster than pattern-based matching:

# Clone with sparse checkout
git clone --filter=blob:none --sparse https://github.com/org/monorepo.git
cd monorepo
 
# Only root files are checked out initially
ls
# README.md  package.json  ...
 
# Add the directories you work on
git sparse-checkout set --cone packages/frontend packages/shared
 
# Verify
git sparse-checkout list
# packages/frontend
# packages/shared
 
# Your working tree now only has:
# monorepo/
# ├── packages/
# │   ├── frontend/    ← fully checked out
# │   └── shared/      ← fully checked out
# ├── README.md
# └── package.json
 
# Add more directories later
git sparse-checkout add tools/linting
 
# Temporarily get everything (e.g., for a full build)
git sparse-checkout disable
 
# Re-enable
git sparse-checkout set --cone packages/frontend packages/shared

Sparse Checkout with Partial Clone

The optimal monorepo setup combines both:

# --filter=blob:none: don't download file contents until needed
# --sparse: only check out root files initially
git clone --filter=blob:none --sparse https://github.com/org/monorepo.git
cd monorepo
 
# Set your working directories
git sparse-checkout set --cone packages/frontend packages/shared
 
# Result:
# - Full commit history is available (git log, git blame work)
# - Only frontend + shared file contents are downloaded
# - Other packages' blobs are fetched on demand if needed

Sparse Index (Git 2.32+)

Even with sparse checkout, the index (.git/index) normally contains entries for every file in the repository. The sparse index collapses non-checked-out directories into a single tree entry:

# Enable sparse index
git sparse-checkout init --cone --sparse-index
 
# Or on an existing sparse checkout
git config index.sparse true
 
# Verify
GIT_TRACE2_PERF=1 git status 2>&1 | grep "sparse"

With sparse index, operations on the index (git status, git add, git commit) only process the checked-out files, not the entire repo.

7. Partial Clone (`--filter`)

Filter Types

# Blobless clone: skip all file contents (download on demand)
git clone --filter=blob:none <url>
 
# Treeless clone: skip tree objects too (even smaller initial clone)
git clone --filter=tree:0 <url>
 
# Size-filtered: skip blobs larger than a threshold
git clone --filter=blob:limit=1m <url>
 
# Combined filters (Git 2.27+)
git clone --filter=combine:blob:none+tree:0 <url>

How Partial Clone Works

Full clone:
┌───────────────────────────────┐
│  All commits ✓                │
│  All trees ✓                  │
│  All blobs ✓                  │
│  Total: ~500 MB               │
└───────────────────────────────┘

Blobless clone (--filter=blob:none):
┌───────────────────────────────┐
│  All commits ✓                │
│  All trees ✓                  │
│  Blobs: only checked-out      │   ← fetched lazily on checkout
│  Total: ~50 MB + on-demand    │
└───────────────────────────────┘

Treeless clone (--filter=tree:0):
┌───────────────────────────────┐
│  All commits ✓                │
│  Trees: only current HEAD     │   ← fetched lazily on checkout
│  Blobs: only checked-out      │   ← fetched lazily on checkout
│  Total: ~20 MB + on-demand    │
└───────────────────────────────┘

Promisor Remotes

When Git needs an object that wasn't downloaded, it fetches it from the promisor remote (the server that promised to supply objects on demand):

# See which remote is the promisor
git config remote.origin.promisor    # true
git config remote.origin.partialCloneFilter    # blob:none

When to Use Each Filter

Filter	Best For	Trade-off
`blob:none`	Daily development on large repos	Lazy blob fetches on checkout, blame, diff
`tree:0`	CI/CD builds that only need HEAD	Lazy tree + blob fetches; `git log -- path` slower
`blob:limit=1m`	Repos with a few large binaries	Only large files deferred
Full clone	Small repos, offline work	No trade-offs, most disk/bandwidth

8. `git rerere`: Reuse Recorded Resolution

The Problem

When you're rebasing a long-lived branch or frequently merging the same branches, you encounter the same conflicts repeatedly. Resolving the identical conflict for the 5th time is tedious and error-prone.

How `git rerere` Works

rerere stands for "reuse recorded resolution." When enabled, Git:

Records the conflicted state and your resolution when you resolve a merge conflict
Recognizes the same conflict in future merges/rebases
Automatically applies the recorded resolution

# Enable rerere
git config --global rerere.enabled true

How It Looks in Practice

# First encounter of a conflict
git merge feature
# CONFLICT (content): Merge conflict in app.js
# Recorded preimage for 'app.js'     ← rerere notes the conflict
 
# You resolve it manually
vim app.js    # fix the conflict markers
git add app.js
git commit
 
# Recorded resolution for 'app.js'   ← rerere saves your resolution
 
# Later, the same conflict appears (e.g., during a rebase)
git rebase main
# CONFLICT (content): Merge conflict in app.js
# Resolved 'app.js' using previous resolution.  ← automatic!
 
# Verify the auto-resolution looks correct
git diff app.js
git add app.js
git rebase --continue

Managing Recorded Resolutions

# See which conflicts have recorded resolutions
git rerere status
 
# See the diff of a recorded resolution
git rerere diff
 
# Forget a specific resolution (if it was wrong)
git rerere forget app.js
 
# Clear all recorded resolutions
git rerere gc

Where Rerere Stores Resolutions

Resolutions are stored in .git/rr-cache/:

.git/rr-cache/
├── abc123def456.../
│   ├── preimage    ← the conflicted state
│   └── postimage   ← your resolution
└── ...

These are local only — they don't transfer with push/pull. Each developer builds their own rerere cache.

When Rerere Shines

Long-lived feature branches that get rebased onto main repeatedly
Release branches where bug fixes are cherry-picked from main
Integration testing where you repeatedly merge and reset experimental branches
Git bisect where you skip the same conflicts at each test point

9. Scaling Strategies for Large Teams

The Repository Size Problem

Dimension	Small	Medium	Large	Massive
Files	< 10K	10K–100K	100K–1M	> 1M
Commits	< 10K	10K–100K	100K–1M	> 1M
Contributors	< 10	10–50	50–200	> 200
Repo size (`.git`)	< 100 MB	100 MB–1 GB	1–10 GB	> 10 GB

Strategy Matrix

Problem: Too many files
├── Sparse checkout          (check out only what you need)
├── FSMonitor                (speed up git status)
└── Sparse index             (speed up index operations)

Problem: Too much history
├── Partial clone            (download objects on demand)
├── Shallow clone            (CI/CD only needs HEAD)
└── Commit-graph             (speed up log and ancestor queries)

Problem: Large binary files
├── Git LFS                  (store binaries on separate server)
└── blob:limit filter        (defer download of large blobs)

Problem: Too many contributors
├── Branch protection rules  (prevent chaos on shared branches)
├── CODEOWNERS file          (route reviews to right people)
├── Merge queues             (serialize merges, prevent conflicts)
└── git rerere               (automate repeated conflict resolution)

Problem: Slow CI/CD
├── Shallow clone --depth 1  (minimize clone time)
├── Caching .git directory   (avoid re-cloning)
└── Changed-file detection   (only test what changed)

Monorepo Performance Stack

For teams with large monorepos, the recommended configuration combines multiple optimizations:

# 1. Partial clone + sparse checkout
git clone --filter=blob:none --sparse <url>
git sparse-checkout set --cone <your-directories>
 
# 2. Enable sparse index
git config index.sparse true
 
# 3. Enable fsmonitor
git config core.fsmonitor true
 
# 4. Enable untracked cache
git config core.untrackedCache true
 
# 5. Write commit-graph with Bloom filters
git commit-graph write --reachable --changed-paths
 
# 6. Enable rerere for repeated merges
git config rerere.enabled true
 
# 7. Start background maintenance
git maintenance start

CODEOWNERS for Routing Reviews

Large teams use a CODEOWNERS file to automatically assign reviewers based on which files are changed:

# .github/CODEOWNERS (GitHub) or CODEOWNERS (GitLab)
# Syntax: <pattern> <owners>
 
# Default owners for everything
*                       @org/core-team
 
# Frontend team owns frontend code
/packages/frontend/     @org/frontend-team
*.tsx                   @org/frontend-team
 
# Backend team owns backend code
/packages/backend/      @org/backend-team
 
# DevOps owns CI/CD and infrastructure
/.github/               @org/devops
/terraform/             @org/devops
Dockerfile              @org/devops
 
# Specific individuals for critical files
/packages/auth/         @security-lead @auth-team-lead

Merge Queues

For high-velocity teams where multiple PRs merge simultaneously, merge queues serialize merges to prevent integration failures:

Without merge queue:
PR-1 passes CI ✓  → merge
PR-2 passes CI ✓  → merge   ← But PR-1 changed something PR-2 depends on!
                              → main broken

With merge queue:
PR-1 passes CI ✓  → enters queue → CI runs with PR-1 → merge ✓
PR-2 passes CI ✓  → enters queue → CI runs with PR-1 + PR-2 → merge ✓

GitHub, GitLab, and Bors all provide merge queue functionality.

10. Diagnosing Performance Issues

Measuring Git Operations

# Time any Git command
time git status
time git log --oneline | wc -l
 
# Enable trace output for detailed timing
GIT_TRACE=1 git status
GIT_TRACE_PERFORMANCE=1 git status
GIT_TRACE2_PERF=1 git status 2>&1 | head -30
 
# Check repository statistics
git count-objects -v
# count: 234          ← loose objects
# size: 1024          ← loose object size (KB)
# in-pack: 524130     ← packed objects
# packs: 3            ← number of packfiles
# size-pack: 182300   ← total packfile size (KB)
# prune-packable: 0
# garbage: 0
# size-garbage: 0

Common Bottlenecks and Fixes

# Symptom: git status is slow
# Diagnose:
GIT_TRACE2_PERF=1 git status 2>&1 | grep "data\|region_leave"
# Fix:
git config core.fsmonitor true
git config core.untrackedCache true
 
# Symptom: git log is slow
# Diagnose:
time git log --oneline | wc -l
# Fix:
git commit-graph write --reachable --changed-paths
 
# Symptom: git clone is slow
# Diagnose:
du -sh .git/
# Fix:
# Use --filter=blob:none for development
# Use --depth 1 for CI/CD
 
# Symptom: Too many packfiles
# Diagnose:
git count-objects -v | grep packs
# Fix:
git repack -a -d --depth=250 --window=250
 
# Symptom: Large .git directory
# Diagnose:
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sort -rnk3 | head -20
# Fix:
# Migrate large files to LFS, filter-repo to remove from history

Command Reference

Command	Description
`git gc`	Run garbage collection (pack objects, prune, pack refs)
`git gc --aggressive`	Thorough GC with maximum compression
`git count-objects -v`	Show object storage statistics
`git maintenance start`	Enable scheduled background maintenance
`git maintenance run`	Run all maintenance tasks now
`git maintenance register`	Register repo for scheduled maintenance
`git maintenance stop`	Disable background maintenance
`git commit-graph write --reachable`	Generate the commit-graph file
`git commit-graph write --reachable --changed-paths`	Commit-graph with Bloom filters
`git commit-graph verify`	Verify commit-graph integrity
`git config core.fsmonitor true`	Enable built-in filesystem monitor
`git config core.untrackedCache true`	Enable untracked file cache
`git config index.sparse true`	Enable sparse index
`git sparse-checkout set --cone <dirs>`	Set sparse checkout directories
`git sparse-checkout add <dir>`	Add directory to sparse checkout
`git sparse-checkout list`	List checked-out directories
`git sparse-checkout disable`	Disable sparse checkout (get all files)
`git clone --filter=blob:none`	Blobless partial clone
`git clone --filter=tree:0`	Treeless partial clone
`git clone --filter=blob:limit=<size>`	Size-limited partial clone
`git config rerere.enabled true`	Enable reuse of recorded resolutions
`git rerere status`	Show conflicts with recorded resolutions
`git rerere diff`	Show resolution diffs
`git rerere forget <file>`	Forget a recorded resolution
`git repack -a -d`	Repack all objects into one packfile
`GIT_TRACE2_PERF=1 git <cmd>`	Trace performance of a Git command

Hands-On Lab: Performance Optimization

Setup

mkdir perf-lab && cd perf-lab
git init
 
# Create a repository with meaningful history for testing
for i in $(seq 1 100); do
    echo "content for file $i" > "file-$i.txt"
done
git add . && git commit -m "initial: add 100 files"
 
for i in $(seq 1 50); do
    echo "change $i" >> "file-$((RANDOM % 100 + 1)).txt"
    git add . && git commit -m "update $i: modify a random file"
done

Part 1: Garbage Collection and Object Storage

Goal: Understand loose objects, packfiles, and the impact of git gc.

# 1. Check current object storage
git count-objects -v
# Note the 'count' (loose objects) and 'in-pack' values
 
# 2. Create loose objects by making many small commits
for i in $(seq 1 30); do
    echo "extra $i" >> extra.txt
    git add . && git commit -m "extra commit $i"
done
 
# 3. Check again — more loose objects
git count-objects -v
# 'count' should have increased
 
# 4. Run garbage collection
git gc
git count-objects -v
# 'count' should be 0 (all packed)
# 'packs' should be 1
 
# 5. Check the packfile
ls -lh .git/objects/pack/

Checkpoint: After git gc, loose object count is 0, and all objects are in a single packfile.

Part 2: Commit-Graph

Goal: Measure the performance impact of the commit-graph file.

# 1. Create a larger history for measurable impact
for i in $(seq 1 200); do
    echo "more content $i" >> "file-$((RANDOM % 100 + 1)).txt"
    git add . && git commit -m "batch commit $i"
done
 
# 2. Time log traversal WITHOUT commit-graph
rm -f .git/objects/info/commit-graph
time git log --oneline | wc -l
 
# 3. Generate the commit-graph
git commit-graph write --reachable --changed-paths
 
# 4. Time log traversal WITH commit-graph
time git log --oneline | wc -l
 
# 5. Verify the file exists
ls -lh .git/objects/info/commit-graph
 
# 6. Test path-limited log (Bloom filters help here)
time git log --oneline -- file-42.txt

Checkpoint: Log traversal should be noticeably faster with the commit-graph (the improvement scales with commit count — more visible on larger repos).

Part 3: FSMonitor

Goal: Enable fsmonitor and measure git status improvement.

# 1. Create many files for a measurable difference
mkdir -p src
for i in $(seq 1 5000); do
    echo "module $i" > "src/module-$i.js"
done
git add . && git commit -m "add 5000 source files"
 
# 2. Time git status WITHOUT fsmonitor
git config core.fsmonitor false
time git status
 
# 3. Enable fsmonitor
git config core.fsmonitor true
 
# 4. Run status once to "warm up" the monitor
git status
 
# 5. Time git status WITH fsmonitor
time git status
 
# 6. Also enable untracked cache
git config core.untrackedCache true
git update-index --untracked-cache
time git status

Checkpoint: With fsmonitor and untracked cache enabled, git status should be faster, especially on subsequent runs after the initial warm-up.

Part 4: `git rerere` — Automated Conflict Resolution

Goal: Record a conflict resolution and watch Git replay it automatically.

# 1. Enable rerere
git config rerere.enabled true
 
# 2. Create conflicting branches
git checkout main 2>/dev/null || git checkout -b main
echo "main version of the config" > config.txt
git add config.txt && git commit -m "main: add config"
 
git checkout -b feature/change-config
echo "feature version of the config" > config.txt
git add config.txt && git commit -m "feature: change config"
 
git checkout main
echo "main updated config" > config.txt
git add config.txt && git commit -m "main: update config"
 
# 3. Merge — this will conflict
git merge feature/change-config
# CONFLICT (content): Merge conflict in config.txt
# Recorded preimage for 'config.txt'    ← rerere recording!
 
# 4. Resolve the conflict
echo "resolved: combined config" > config.txt
git add config.txt
git commit -m "merge: resolve config conflict"
# Recorded resolution for 'config.txt'  ← resolution saved!
 
# 5. Now simulate encountering the same conflict again
# Reset back to before the merge
git reset --hard HEAD~1
 
# 6. Merge again — rerere auto-resolves!
git merge feature/change-config
# CONFLICT (content): Merge conflict in config.txt
# Resolved 'config.txt' using previous resolution.   ← automatic!
 
# 7. Verify the resolution was applied correctly
cat config.txt
# Should show "resolved: combined config"
 
git add config.txt
git commit -m "merge: resolve config conflict (rerere)"

Checkpoint: The second merge conflict was automatically resolved by rerere. cat config.txt shows the same resolution you applied manually the first time.

Part 5: Sparse Checkout Workflow

Goal: Set up a sparse checkout and verify only selected directories are materialized.

cd ~/perf-lab
 
# 1. Create a "monorepo" structure
mkdir -p packages/{frontend,backend,mobile,shared}
echo "import React from 'react';" > packages/frontend/App.tsx
echo "const express = require('express');" > packages/backend/server.js
echo "import SwiftUI" > packages/mobile/ContentView.swift
echo "export const utils = {};" > packages/shared/utils.ts
echo "# Monorepo" > README.md
git add . && git commit -m "monorepo structure"
 
# 2. Create a clone with sparse checkout
cd ..
git clone --sparse perf-lab perf-lab-sparse
cd perf-lab-sparse
 
# 3. Check what's available
ls
# Only root files (README.md, etc.)
ls packages/ 2>/dev/null || echo "packages/ not checked out"
 
# 4. Add only the directories you need
git sparse-checkout set --cone packages/frontend packages/shared
 
# 5. Verify
ls packages/
# frontend/  shared/  (no backend/ or mobile/)
 
ls packages/frontend/
# App.tsx
 
ls packages/backend/ 2>/dev/null || echo "backend/ not checked out (expected)"
 
# 6. Git log still has full history
git log --oneline

Checkpoint: Only packages/frontend/ and packages/shared/ are checked out. packages/backend/ and packages/mobile/ don't exist in the working tree. Full commit history is available.

Part 6: Diagnosing Performance

Goal: Use Git's tracing tools to identify bottlenecks.

cd ~/perf-lab
 
# 1. Basic timing
time git status
time git log --oneline | wc -l
 
# 2. Detailed performance trace
GIT_TRACE2_PERF=1 git status 2>/tmp/git-perf.log
cat /tmp/git-perf.log | grep "region_leave" | sort -t'|' -k4 -rn | head -10
# Shows which internal operations took the longest
 
# 3. Object storage analysis
git count-objects -v
 
# 4. Find the largest objects in the repository
git rev-list --objects --all \
  | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
  | grep blob \
  | sort -rnk3 \
  | head -10
# Shows the 10 largest blobs — candidates for LFS
 
# 5. Check if commit-graph exists and is valid
git commit-graph verify 2>&1
ls -lh .git/objects/info/commit-graph
 
# 6. Full diagnostic summary
echo "=== Repo Stats ==="
echo "Files: $(find . -not -path './.git/*' -type f | wc -l)"
echo "Commits: $(git rev-list --count HEAD)"
echo "Branches: $(git branch -a | wc -l)"
git count-objects -vH

Checkpoint: You can identify the largest objects, the slowest operations, and the current optimization state of the repository.

Challenge: Optimize a Sluggish Repository

Create a deliberately unoptimized repository:

Generate 1,000 commits across 500 files
Include some large binary files (use dd if=/dev/urandom of=big.bin bs=1M count=5)
Measure baseline performance (git status, git log, git blame)
Apply every optimization from this module: git gc, commit-graph, fsmonitor, untracked cache, rerere
Measure performance again and document the improvements

Common Pitfalls

Pitfall	Why It Happens	How to Avoid It
Running `git gc --aggressive` too often	Thinking more GC = better performance	Only use on import, migration, or monthly schedule
Not enabling commit-graph	Unaware of the feature	`git config --global fetch.writeCommitGraph true`
FSMonitor not working after enable	Daemon not started or system doesn't support it	Check `git fsmonitor--daemon status`; verify OS support
Sparse checkout conflicts with IDE	IDE tries to open/index non-existent files	Configure IDE to respect `.git/info/sparse-checkout`
Partial clone fetching too many blobs	Running `git log -p` or `git diff` on large ranges	Use `--stat` or path-limited queries to minimize blob fetches
Rerere applying an incorrect resolution	Bad resolution was recorded	`git rerere forget <file>` to clear and re-resolve
Forgetting to write commit-graph after repack	Manual GC doesn't always auto-write commit-graph	Use `git maintenance` instead of manual GC
Sparse checkout losing files on branch switch	Switching to a branch with different sparse paths	Update sparse-checkout patterns before switching if needed
Too many packfiles degrading performance	Auto-GC disabled, manual repack never done	Use `git maintenance` for automatic incremental repack
Shallow clone breaking `git bisect`	History truncated before the bug was introduced	Use `git fetch --deepen=N` or `--unshallow` before bisecting

Pro Tips

Start with git maintenance start on every repo you work on regularly. It handles commit-graph, repacking, prefetching, and loose object cleanup automatically. It's the single highest-impact optimization command.
Enable core.fsmonitor globally. For most developers, the built-in fsmonitor has no downside and makes git status near-instant. Set it once: git config --global core.fsmonitor true.
Use --changed-paths when writing commit-graph. The Bloom filters for changed paths make git log -- <path> dramatically faster. This is especially valuable in large repos where path-limited log queries are common.
Enable rerere before your next rebase. If you're maintaining a feature branch that's rebased onto main regularly, rerere will save you from resolving the same conflicts repeatedly. It's a pure quality-of-life improvement with no downside.
Profile before optimizing. Use GIT_TRACE2_PERF=1 and time to measure which operations are actually slow before applying optimizations. Different repos have different bottlenecks.
For CI/CD, use --depth 1 --single-branch. CI jobs rarely need history or other branches. This minimizes clone time. If you need git blame or git bisect in CI, use --filter=blob:none instead.

Quiz / Self-Assessment

1. What does git gc do, and when does Git run it automatically?

Show Answer

git gc (garbage collection) performs several housekeeping tasks:

Packs loose objects into packfiles with delta compression
Removes unreachable objects that are past the reflog protection window
Packs loose references into a packed-refs file
Prunes expired reflog entries

Git runs it automatically when the number of loose objects exceeds gc.auto (default: 6700) or the number of packfiles exceeds gc.autoPackLimit (default: 50). This typically happens during git push, git fetch, or git commit.

2. How does git maintenance differ from git gc?

Show Answer

git gc is an all-or-nothing operation — it runs every cleanup task in a single pass, which can cause noticeable pauses on large repos.

git maintenance breaks the work into smaller, incremental tasks (commit-graph updates, loose object packing, incremental repacking, prefetching) that run on separate schedules (hourly, daily, weekly). It runs in the background via the system task scheduler, so it never blocks your workflow. It's the modern replacement for manual/auto git gc.

3. What is the commit-graph file and why does it speed up git log?

Show Answer

The commit-graph file (.git/objects/info/commit-graph) is a precomputed binary index of the commit DAG. It stores each commit's parents, root tree hash, timestamp, and generation numbers.

Without it, Git must read individual commit objects from packfiles and decompress them during graph traversal. With the commit-graph, Git reads from a compact, random-access binary file — making operations like git log, git merge-base, and git branch --contains up to 10-80x faster on large repos. Adding --changed-paths includes Bloom filters that further accelerate path-limited queries like git log -- <path>.

4. How does the filesystem monitor (fsmonitor) speed up git status?

Show Answer

Normally, git status must stat every file in the working directory to check for modifications — for a repo with 100,000 files, that's 100,000 system calls. The filesystem monitor hooks into the OS's file-change notification system (FSEvents on macOS, inotify on Linux) to track which files actually changed since the last check. Git then only needs to stat those specific files, reducing git status from seconds to milliseconds.

Enable it with git config core.fsmonitor true (Git 2.37+).

5. What's the difference between sparse checkout and partial clone, and when would you use both together?

Show Answer

Partial clone (--filter=blob:none) affects what's downloaded — Git gets all commits and trees but skips file content (blobs) until they're actually needed.

Sparse checkout affects what's checked out to disk — Git only materializes specific directories in the working tree.

Use both together for monorepos:

git clone --filter=blob:none --sparse <url>
git sparse-checkout set --cone packages/my-service

This minimizes both network transfer (only download blobs for your directories) and disk usage (only materialize your directories). Full commit history remains available for git log, git blame, etc.

6. What does git rerere do, and how do you enable it?

Show Answer

git rerere (reuse recorded resolution) records how you resolve merge conflicts and automatically replays those resolutions when the same conflict appears again. Enable it with:

git config --global rerere.enabled true

When you resolve a conflict, rerere saves the "before" (preimage) and "after" (postimage) in .git/rr-cache/. If the same conflict pattern appears in a future merge or rebase, Git applies your previous resolution automatically. You can clear a bad resolution with git rerere forget <file>.

7. A repository's .git directory is 5 GB. How would you diagnose what's consuming the space?

Show Answer

Check overall statistics: git count-objects -vH

Find the largest blobs in history:

git rev-list --objects --all \
  | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
  | grep blob | sort -rnk3 | head -20

This reveals the largest files ever committed — often accidentally committed binaries, build artifacts, or data files.
Solutions: migrate large files to Git LFS, or remove them from history with git filter-repo.

8. What is the sparse index, and how does it improve performance beyond regular sparse checkout?

Show Answer

Regular sparse checkout only affects the working tree — the .git/index still contains entries for every file in the repository. Operations that touch the index (git status, git add, git commit) must process all entries.

The sparse index (enabled with git config index.sparse true) collapses non-checked-out directories into a single tree entry in the index. This means index operations only process entries for your checked-out files, not the entire repository. It's especially impactful in monorepos with hundreds of thousands of files.

9. For a CI/CD pipeline that only needs to build the latest code, what's the optimal clone strategy?

Show Answer

git clone --depth 1 --single-branch --branch main <url>

--depth 1: Only download the latest commit (no history)
--single-branch: Only download the specified branch (no other branches/tags)
--branch main: Specify which branch to clone

This is the fastest and smallest possible clone. If you also need to run git blame or git bisect in CI, use --filter=blob:none instead of --depth 1 to retain full commit history.

For even faster CI runs, consider caching the .git directory between builds and using git fetch to update incrementally.

10. You enabled rerere and it auto-resolved a conflict, but the resolution is wrong. What do you do?

Show Answer

Clear the bad recorded resolution: git rerere forget <file>
Reset the file to its conflicted state: git checkout -m <file>
Resolve the conflict correctly this time
Stage and commit — rerere will record the new, correct resolution

You can also check git rerere diff before committing to preview what rerere applied, catching bad resolutions before they're committed.