Learning Objectives
By the end of this module, you will be able to:
- Explain how Git's object storage, packing, and garbage collection work under the hood
- Configure
git gcandgit maintenancefor automated repository optimization - Enable the commit-graph file for faster log traversal and ancestor queries
- Use the filesystem monitor (
fsmonitor) to speed upgit statusin large repos - Apply sparse checkout and partial clone strategies for monorepo development
- Enable
git rerereto automatically reuse recorded conflict resolutions - Choose appropriate scaling strategies for large teams and large repositories
- Diagnose and resolve common performance bottlenecks in Git
1. How Git Stores Data (And Why It Slows Down)
Understanding Git's storage model is essential before optimizing it.
Objects and Packfiles
Git stores every piece of content as an object — blobs (file contents), trees (directory listings), commits, and tags. Initially, each object is stored as a separate compressed file in .git/objects/:
.git/objects/
├── 4a/
│ └── 8e3f... ← a blob (file content)
├── 7c/
│ └── 2b91... ← a tree (directory listing)
├── a1/
│ └── b2c3... ← a commit
└── ... ← thousands of individual files (loose objects)
As these accumulate, Git periodically packs them into a single binary file called a packfile. Packfiles use delta compression — storing only the differences between similar objects:
.git/objects/pack/
├── pack-abc123.idx ← index: maps object hash → offset in packfile
└── pack-abc123.pack ← packfile: all objects, delta-compressed
A repository with 100,000 loose objects might compress to a packfile 10-50x smaller, because consecutive versions of the same file share most of their content.
Where Performance Degrades
| Operation | Bottleneck | Affected By |
|---|---|---|
git status | Scanning working directory | Number of files, filesystem speed |
git log | Walking commit graph | Number of commits, graph complexity |
git blame | Traversing history per line | File history length |
git add | Hashing file contents | File sizes |
git clone | Downloading objects | Repository size (history + files) |
git merge / git rebase | Conflict detection | Number of changed files |
git diff | Computing content differences | File sizes, number of files |
git push / git fetch | Transferring objects | Pack size, network speed |
2. Garbage Collection (git gc)
What git gc Does
Garbage collection is Git's housekeeping process. It performs several optimizations:
- Packs loose objects into packfiles with delta compression
- Removes unreachable objects (orphaned commits, abandoned blobs) that are past the reflog's protection window
- Packs references — consolidates individual ref files into a single
packed-refsfile - Prunes reflogs — removes expired reflog entries
- Re-indexes packfiles for faster lookups
# Run garbage collection manually
git gc
# Aggressive GC — slower but more thorough compression
git gc --aggressive
# Preview what gc would do
git gc --dry-runAutomatic GC
Git triggers garbage collection automatically when certain thresholds are exceeded:
# Check current thresholds
git config gc.auto # default: 6700 (loose objects before auto-gc)
git config gc.autoPackLimit # default: 50 (packfiles before auto-repack)
# Disable auto-gc (if using git maintenance instead)
git config gc.auto 0
# Customize thresholds
git config --global gc.auto 10000When auto-gc runs, you might see:
Auto packing the repository in background for optimum performance.
GC Timing and Safety
# Control how long unreachable objects survive
git config gc.reflogExpire # default: 90 days
git config gc.reflogExpireUnreachable # default: 30 days
git config gc.pruneExpire # default: 2 weeks
# For safety on shared servers, extend these:
git config gc.pruneExpire "1 month"When to Run git gc --aggressive
--aggressive uses much more CPU and memory for better compression. Use it:
- After importing a repository (e.g., migrating from SVN)
- After deleting a large number of branches
- After
git filter-repoor other history-rewriting operations - On a schedule (monthly) for very active repositories
# Aggressive GC with custom window and depth
git gc --aggressive --prune=now
# Check the size before and after
git count-objects -v
# count: 0 ← loose objects
# size: 0 ← loose object size (KB)
# in-pack: 52341 ← packed objects
# packs: 1 ← number of packfiles
# size-pack: 18230 ← packfile size (KB)3. git maintenance: Modern Automated Optimization
git maintenance (introduced in Git 2.29) replaces manual git gc with a scheduled, incremental maintenance system. It's smarter — it runs smaller tasks frequently instead of one big GC pass.
Enabling Maintenance
# Register the current repo for scheduled maintenance
git maintenance register
# Start the background scheduler
git maintenance startThis configures your system's task scheduler (launchd on macOS, systemd/cron on Linux, Task Scheduler on Windows) to run maintenance periodically.
Maintenance Tasks
| Task | What It Does | Default Schedule |
|---|---|---|
commit-graph | Updates the commit-graph file | Hourly |
prefetch | Fetches latest objects from remotes in background | Hourly |
loose-objects | Packs loose objects into packfiles | Daily |
incremental-repack | Gradually repacks packfiles for better compression | Daily |
pack-refs | Consolidates loose refs into packed-refs file | Never (on demand) |
gc | Full garbage collection | Never (replaced by above tasks) |
# Run a specific task manually
git maintenance run --task=commit-graph
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
# Run all scheduled tasks at once
git maintenance run
# See what's registered
git config --global --get-regexp maintenanceMaintenance Configuration
# Customize the schedule for specific tasks
git config maintenance.commit-graph.schedule hourly
git config maintenance.loose-objects.schedule daily
git config maintenance.incremental-repack.schedule weekly
# Unregister a repo from maintenance
git maintenance unregister
# Stop the background scheduler entirely
git maintenance stopgit maintenance vs. git gc
git gc: "Stop everything and do a full cleanup"
git maintenance: "Do small cleanups continuously in the background"
For active development, git maintenance is strictly better — it avoids the periodic pauses caused by auto-gc on large repos.
4. The Commit-Graph File
The Problem
Every git log, git merge-base, git branch --contains, and reachability query must walk the commit graph by reading individual commit objects from packfiles. For repositories with millions of commits, this is slow.
The Solution
The commit-graph file (.git/objects/info/commit-graph) is a precomputed, binary-format index of the commit DAG. It stores:
- Each commit's parents
- Each commit's root tree hash
- The commit timestamp (for generation number computation)
- Generation numbers — a topological metric that lets Git skip entire branches of the graph during traversal
# Generate the commit-graph file
git commit-graph write --reachable
# Generate with changed-paths Bloom filters (speeds up path-limited log)
git commit-graph write --reachable --changed-paths
# Verify the commit-graph is consistent
git commit-graph verifyPerformance Impact
Without commit-graph:
git log --oneline | wc -l → 3.2 seconds (100k commits)
git merge-base A B → 0.8 seconds
With commit-graph:
git log --oneline | wc -l → 0.4 seconds (8x faster)
git merge-base A B → 0.01 seconds (80x faster)
Enabling Commit-Graph Globally
# Write commit-graph on every gc/fetch/repack
git config --global fetch.writeCommitGraph true
git config --global gc.writeCommitGraph true
# Or just use git maintenance, which handles this automatically
git maintenance startChanged-Path Bloom Filters
When you run git log -- path/to/file, Git normally must inspect every commit to check if it touched that path. Bloom filters precompute a probabilistic answer ("this commit definitely didn't touch this path"), dramatically speeding up path-limited log queries:
# Write commit-graph with Bloom filters
git commit-graph write --reachable --changed-paths
# Now this is much faster:
git log --oneline -- src/components/Header.tsx5. Filesystem Monitor (fsmonitor)
The Problem
git status compares every file in the working directory against the index. For repositories with 100,000+ files, this stat-call-per-file approach takes several seconds, even when nothing has changed.
The Solution
The filesystem monitor (fsmonitor) integrates with your OS's file-change notification system (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) to tell Git which files actually changed since the last check. Git then only stats those files.
Built-in FSMonitor Daemon (Git 2.37+)
# Enable the built-in fsmonitor daemon
git config core.fsmonitor true
# Verify it's running
git fsmonitor--daemon status
# Check the improvement
time git status # With fsmonitor: ~0.1s vs ~2s withoutUsing Watchman (Alternative)
Facebook's Watchman is a mature filesystem watcher that predates Git's built-in fsmonitor:
# Install Watchman
brew install watchman # macOS
# See https://facebook.github.io/watchman/docs/install for other platforms
# Configure Git to use Watchman
git config core.fsmonitor "$(which watchman)"
# Or use the helper query script
git config core.fsmonitor .git/hooks/query-watchmanPerformance Impact
| Repository Size | git status Without FSMonitor | With FSMonitor |
|---|---|---|
| 10,000 files | 0.3s | 0.05s |
| 100,000 files | 2.5s | 0.08s |
| 500,000 files | 12s | 0.1s |
When to Enable FSMonitor
- Working directories with > 10,000 files
- Monorepos
- Projects where
git statusnoticeably pauses - Any developer who runs
git statusfrequently (shell prompts, IDE integrations)
Untrackedcache
A complementary optimization that caches the state of untracked file directories:
# Enable untracked cache
git config core.untrackedCache true
# Update the untracked cache
git update-index --untracked-cache
# Check if it's active
git config core.untrackedCache6. Sparse Checkout for Monorepos
Module 19 introduced sparse checkout briefly. Here we explore it in depth for monorepo workflows.
The Monorepo Problem
monorepo/
├── packages/
│ ├── frontend/ ← 50,000 files, you work here
│ ├── backend/ ← 30,000 files, you never touch
│ ├── mobile/ ← 40,000 files, you never touch
│ └── shared/ ← 5,000 files, you need this
├── tools/ ← 10,000 files
└── docs/ ← 2,000 files
With 137,000 files, git status is slow, your editor indexes everything, and disk usage is high — even though you only work on 2 of the 6 directories.
Cone Mode Sparse Checkout
Cone mode (recommended) operates on entire directories, which is much faster than pattern-based matching:
# Clone with sparse checkout
git clone --filter=blob:none --sparse https://github.com/org/monorepo.git
cd monorepo
# Only root files are checked out initially
ls
# README.md package.json ...
# Add the directories you work on
git sparse-checkout set --cone packages/frontend packages/shared
# Verify
git sparse-checkout list
# packages/frontend
# packages/shared
# Your working tree now only has:
# monorepo/
# ├── packages/
# │ ├── frontend/ ← fully checked out
# │ └── shared/ ← fully checked out
# ├── README.md
# └── package.json
# Add more directories later
git sparse-checkout add tools/linting
# Temporarily get everything (e.g., for a full build)
git sparse-checkout disable
# Re-enable
git sparse-checkout set --cone packages/frontend packages/sharedSparse Checkout with Partial Clone
The optimal monorepo setup combines both:
# --filter=blob:none: don't download file contents until needed
# --sparse: only check out root files initially
git clone --filter=blob:none --sparse https://github.com/org/monorepo.git
cd monorepo
# Set your working directories
git sparse-checkout set --cone packages/frontend packages/shared
# Result:
# - Full commit history is available (git log, git blame work)
# - Only frontend + shared file contents are downloaded
# - Other packages' blobs are fetched on demand if neededSparse Index (Git 2.32+)
Even with sparse checkout, the index (.git/index) normally contains entries for every file in the repository. The sparse index collapses non-checked-out directories into a single tree entry:
# Enable sparse index
git sparse-checkout init --cone --sparse-index
# Or on an existing sparse checkout
git config index.sparse true
# Verify
GIT_TRACE2_PERF=1 git status 2>&1 | grep "sparse"With sparse index, operations on the index (git status, git add, git commit) only process the checked-out files, not the entire repo.
7. Partial Clone (--filter)
Filter Types
# Blobless clone: skip all file contents (download on demand)
git clone --filter=blob:none <url>
# Treeless clone: skip tree objects too (even smaller initial clone)
git clone --filter=tree:0 <url>
# Size-filtered: skip blobs larger than a threshold
git clone --filter=blob:limit=1m <url>
# Combined filters (Git 2.27+)
git clone --filter=combine:blob:none+tree:0 <url>How Partial Clone Works
Full clone:
┌───────────────────────────────┐
│ All commits ✓ │
│ All trees ✓ │
│ All blobs ✓ │
│ Total: ~500 MB │
└───────────────────────────────┘
Blobless clone (--filter=blob:none):
┌───────────────────────────────┐
│ All commits ✓ │
│ All trees ✓ │
│ Blobs: only checked-out │ ← fetched lazily on checkout
│ Total: ~50 MB + on-demand │
└───────────────────────────────┘
Treeless clone (--filter=tree:0):
┌───────────────────────────────┐
│ All commits ✓ │
│ Trees: only current HEAD │ ← fetched lazily on checkout
│ Blobs: only checked-out │ ← fetched lazily on checkout
│ Total: ~20 MB + on-demand │
└───────────────────────────────┘
Promisor Remotes
When Git needs an object that wasn't downloaded, it fetches it from the promisor remote (the server that promised to supply objects on demand):
# See which remote is the promisor
git config remote.origin.promisor # true
git config remote.origin.partialCloneFilter # blob:noneWhen to Use Each Filter
| Filter | Best For | Trade-off |
|---|---|---|
blob:none | Daily development on large repos | Lazy blob fetches on checkout, blame, diff |
tree:0 | CI/CD builds that only need HEAD | Lazy tree + blob fetches; git log -- path slower |
blob:limit=1m | Repos with a few large binaries | Only large files deferred |
| Full clone | Small repos, offline work | No trade-offs, most disk/bandwidth |
8. git rerere: Reuse Recorded Resolution
The Problem
When you're rebasing a long-lived branch or frequently merging the same branches, you encounter the same conflicts repeatedly. Resolving the identical conflict for the 5th time is tedious and error-prone.
How git rerere Works
rerere stands for "reuse recorded resolution." When enabled, Git:
- Records the conflicted state and your resolution when you resolve a merge conflict
- Recognizes the same conflict in future merges/rebases
- Automatically applies the recorded resolution
# Enable rerere
git config --global rerere.enabled trueHow It Looks in Practice
# First encounter of a conflict
git merge feature
# CONFLICT (content): Merge conflict in app.js
# Recorded preimage for 'app.js' ← rerere notes the conflict
# You resolve it manually
vim app.js # fix the conflict markers
git add app.js
git commit
# Recorded resolution for 'app.js' ← rerere saves your resolution
# Later, the same conflict appears (e.g., during a rebase)
git rebase main
# CONFLICT (content): Merge conflict in app.js
# Resolved 'app.js' using previous resolution. ← automatic!
# Verify the auto-resolution looks correct
git diff app.js
git add app.js
git rebase --continueManaging Recorded Resolutions
# See which conflicts have recorded resolutions
git rerere status
# See the diff of a recorded resolution
git rerere diff
# Forget a specific resolution (if it was wrong)
git rerere forget app.js
# Clear all recorded resolutions
git rerere gcWhere Rerere Stores Resolutions
Resolutions are stored in .git/rr-cache/:
.git/rr-cache/
├── abc123def456.../
│ ├── preimage ← the conflicted state
│ └── postimage ← your resolution
└── ...
These are local only — they don't transfer with push/pull. Each developer builds their own rerere cache.
When Rerere Shines
- Long-lived feature branches that get rebased onto
mainrepeatedly - Release branches where bug fixes are cherry-picked from
main - Integration testing where you repeatedly merge and reset experimental branches
- Git bisect where you skip the same conflicts at each test point
9. Scaling Strategies for Large Teams
The Repository Size Problem
| Dimension | Small | Medium | Large | Massive |
|---|---|---|---|---|
| Files | < 10K | 10K–100K | 100K–1M | > 1M |
| Commits | < 10K | 10K–100K | 100K–1M | > 1M |
| Contributors | < 10 | 10–50 | 50–200 | > 200 |
Repo size (.git) | < 100 MB | 100 MB–1 GB | 1–10 GB | > 10 GB |
Strategy Matrix
Problem: Too many files
├── Sparse checkout (check out only what you need)
├── FSMonitor (speed up git status)
└── Sparse index (speed up index operations)
Problem: Too much history
├── Partial clone (download objects on demand)
├── Shallow clone (CI/CD only needs HEAD)
└── Commit-graph (speed up log and ancestor queries)
Problem: Large binary files
├── Git LFS (store binaries on separate server)
└── blob:limit filter (defer download of large blobs)
Problem: Too many contributors
├── Branch protection rules (prevent chaos on shared branches)
├── CODEOWNERS file (route reviews to right people)
├── Merge queues (serialize merges, prevent conflicts)
└── git rerere (automate repeated conflict resolution)
Problem: Slow CI/CD
├── Shallow clone --depth 1 (minimize clone time)
├── Caching .git directory (avoid re-cloning)
└── Changed-file detection (only test what changed)
Monorepo Performance Stack
For teams with large monorepos, the recommended configuration combines multiple optimizations:
# 1. Partial clone + sparse checkout
git clone --filter=blob:none --sparse <url>
git sparse-checkout set --cone <your-directories>
# 2. Enable sparse index
git config index.sparse true
# 3. Enable fsmonitor
git config core.fsmonitor true
# 4. Enable untracked cache
git config core.untrackedCache true
# 5. Write commit-graph with Bloom filters
git commit-graph write --reachable --changed-paths
# 6. Enable rerere for repeated merges
git config rerere.enabled true
# 7. Start background maintenance
git maintenance startCODEOWNERS for Routing Reviews
Large teams use a CODEOWNERS file to automatically assign reviewers based on which files are changed:
# .github/CODEOWNERS (GitHub) or CODEOWNERS (GitLab)
# Syntax: <pattern> <owners>
# Default owners for everything
* @org/core-team
# Frontend team owns frontend code
/packages/frontend/ @org/frontend-team
*.tsx @org/frontend-team
# Backend team owns backend code
/packages/backend/ @org/backend-team
# DevOps owns CI/CD and infrastructure
/.github/ @org/devops
/terraform/ @org/devops
Dockerfile @org/devops
# Specific individuals for critical files
/packages/auth/ @security-lead @auth-team-leadMerge Queues
For high-velocity teams where multiple PRs merge simultaneously, merge queues serialize merges to prevent integration failures:
Without merge queue:
PR-1 passes CI ✓ → merge
PR-2 passes CI ✓ → merge ← But PR-1 changed something PR-2 depends on!
→ main broken
With merge queue:
PR-1 passes CI ✓ → enters queue → CI runs with PR-1 → merge ✓
PR-2 passes CI ✓ → enters queue → CI runs with PR-1 + PR-2 → merge ✓
GitHub, GitLab, and Bors all provide merge queue functionality.
10. Diagnosing Performance Issues
Measuring Git Operations
# Time any Git command
time git status
time git log --oneline | wc -l
# Enable trace output for detailed timing
GIT_TRACE=1 git status
GIT_TRACE_PERFORMANCE=1 git status
GIT_TRACE2_PERF=1 git status 2>&1 | head -30
# Check repository statistics
git count-objects -v
# count: 234 ← loose objects
# size: 1024 ← loose object size (KB)
# in-pack: 524130 ← packed objects
# packs: 3 ← number of packfiles
# size-pack: 182300 ← total packfile size (KB)
# prune-packable: 0
# garbage: 0
# size-garbage: 0Common Bottlenecks and Fixes
# Symptom: git status is slow
# Diagnose:
GIT_TRACE2_PERF=1 git status 2>&1 | grep "data\|region_leave"
# Fix:
git config core.fsmonitor true
git config core.untrackedCache true
# Symptom: git log is slow
# Diagnose:
time git log --oneline | wc -l
# Fix:
git commit-graph write --reachable --changed-paths
# Symptom: git clone is slow
# Diagnose:
du -sh .git/
# Fix:
# Use --filter=blob:none for development
# Use --depth 1 for CI/CD
# Symptom: Too many packfiles
# Diagnose:
git count-objects -v | grep packs
# Fix:
git repack -a -d --depth=250 --window=250
# Symptom: Large .git directory
# Diagnose:
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sort -rnk3 | head -20
# Fix:
# Migrate large files to LFS, filter-repo to remove from historyCommand Reference
| Command | Description |
|---|---|
git gc | Run garbage collection (pack objects, prune, pack refs) |
git gc --aggressive | Thorough GC with maximum compression |
git count-objects -v | Show object storage statistics |
git maintenance start | Enable scheduled background maintenance |
git maintenance run | Run all maintenance tasks now |
git maintenance register | Register repo for scheduled maintenance |
git maintenance stop | Disable background maintenance |
git commit-graph write --reachable | Generate the commit-graph file |
git commit-graph write --reachable --changed-paths | Commit-graph with Bloom filters |
git commit-graph verify | Verify commit-graph integrity |
git config core.fsmonitor true | Enable built-in filesystem monitor |
git config core.untrackedCache true | Enable untracked file cache |
git config index.sparse true | Enable sparse index |
git sparse-checkout set --cone <dirs> | Set sparse checkout directories |
git sparse-checkout add <dir> | Add directory to sparse checkout |
git sparse-checkout list | List checked-out directories |
git sparse-checkout disable | Disable sparse checkout (get all files) |
git clone --filter=blob:none | Blobless partial clone |
git clone --filter=tree:0 | Treeless partial clone |
git clone --filter=blob:limit=<size> | Size-limited partial clone |
git config rerere.enabled true | Enable reuse of recorded resolutions |
git rerere status | Show conflicts with recorded resolutions |
git rerere diff | Show resolution diffs |
git rerere forget <file> | Forget a recorded resolution |
git repack -a -d | Repack all objects into one packfile |
GIT_TRACE2_PERF=1 git <cmd> | Trace performance of a Git command |
Hands-On Lab: Performance Optimization
Setup
mkdir perf-lab && cd perf-lab
git init
# Create a repository with meaningful history for testing
for i in $(seq 1 100); do
echo "content for file $i" > "file-$i.txt"
done
git add . && git commit -m "initial: add 100 files"
for i in $(seq 1 50); do
echo "change $i" >> "file-$((RANDOM % 100 + 1)).txt"
git add . && git commit -m "update $i: modify a random file"
donePart 1: Garbage Collection and Object Storage
Goal: Understand loose objects, packfiles, and the impact of git gc.
# 1. Check current object storage
git count-objects -v
# Note the 'count' (loose objects) and 'in-pack' values
# 2. Create loose objects by making many small commits
for i in $(seq 1 30); do
echo "extra $i" >> extra.txt
git add . && git commit -m "extra commit $i"
done
# 3. Check again — more loose objects
git count-objects -v
# 'count' should have increased
# 4. Run garbage collection
git gc
git count-objects -v
# 'count' should be 0 (all packed)
# 'packs' should be 1
# 5. Check the packfile
ls -lh .git/objects/pack/Checkpoint: After git gc, loose object count is 0, and all objects are in a single packfile.
Part 2: Commit-Graph
Goal: Measure the performance impact of the commit-graph file.
# 1. Create a larger history for measurable impact
for i in $(seq 1 200); do
echo "more content $i" >> "file-$((RANDOM % 100 + 1)).txt"
git add . && git commit -m "batch commit $i"
done
# 2. Time log traversal WITHOUT commit-graph
rm -f .git/objects/info/commit-graph
time git log --oneline | wc -l
# 3. Generate the commit-graph
git commit-graph write --reachable --changed-paths
# 4. Time log traversal WITH commit-graph
time git log --oneline | wc -l
# 5. Verify the file exists
ls -lh .git/objects/info/commit-graph
# 6. Test path-limited log (Bloom filters help here)
time git log --oneline -- file-42.txtCheckpoint: Log traversal should be noticeably faster with the commit-graph (the improvement scales with commit count — more visible on larger repos).
Part 3: FSMonitor
Goal: Enable fsmonitor and measure git status improvement.
# 1. Create many files for a measurable difference
mkdir -p src
for i in $(seq 1 5000); do
echo "module $i" > "src/module-$i.js"
done
git add . && git commit -m "add 5000 source files"
# 2. Time git status WITHOUT fsmonitor
git config core.fsmonitor false
time git status
# 3. Enable fsmonitor
git config core.fsmonitor true
# 4. Run status once to "warm up" the monitor
git status
# 5. Time git status WITH fsmonitor
time git status
# 6. Also enable untracked cache
git config core.untrackedCache true
git update-index --untracked-cache
time git statusCheckpoint: With fsmonitor and untracked cache enabled, git status should be faster, especially on subsequent runs after the initial warm-up.
Part 4: git rerere — Automated Conflict Resolution
Goal: Record a conflict resolution and watch Git replay it automatically.
# 1. Enable rerere
git config rerere.enabled true
# 2. Create conflicting branches
git checkout main 2>/dev/null || git checkout -b main
echo "main version of the config" > config.txt
git add config.txt && git commit -m "main: add config"
git checkout -b feature/change-config
echo "feature version of the config" > config.txt
git add config.txt && git commit -m "feature: change config"
git checkout main
echo "main updated config" > config.txt
git add config.txt && git commit -m "main: update config"
# 3. Merge — this will conflict
git merge feature/change-config
# CONFLICT (content): Merge conflict in config.txt
# Recorded preimage for 'config.txt' ← rerere recording!
# 4. Resolve the conflict
echo "resolved: combined config" > config.txt
git add config.txt
git commit -m "merge: resolve config conflict"
# Recorded resolution for 'config.txt' ← resolution saved!
# 5. Now simulate encountering the same conflict again
# Reset back to before the merge
git reset --hard HEAD~1
# 6. Merge again — rerere auto-resolves!
git merge feature/change-config
# CONFLICT (content): Merge conflict in config.txt
# Resolved 'config.txt' using previous resolution. ← automatic!
# 7. Verify the resolution was applied correctly
cat config.txt
# Should show "resolved: combined config"
git add config.txt
git commit -m "merge: resolve config conflict (rerere)"Checkpoint: The second merge conflict was automatically resolved by rerere. cat config.txt shows the same resolution you applied manually the first time.
Part 5: Sparse Checkout Workflow
Goal: Set up a sparse checkout and verify only selected directories are materialized.
cd ~/perf-lab
# 1. Create a "monorepo" structure
mkdir -p packages/{frontend,backend,mobile,shared}
echo "import React from 'react';" > packages/frontend/App.tsx
echo "const express = require('express');" > packages/backend/server.js
echo "import SwiftUI" > packages/mobile/ContentView.swift
echo "export const utils = {};" > packages/shared/utils.ts
echo "# Monorepo" > README.md
git add . && git commit -m "monorepo structure"
# 2. Create a clone with sparse checkout
cd ..
git clone --sparse perf-lab perf-lab-sparse
cd perf-lab-sparse
# 3. Check what's available
ls
# Only root files (README.md, etc.)
ls packages/ 2>/dev/null || echo "packages/ not checked out"
# 4. Add only the directories you need
git sparse-checkout set --cone packages/frontend packages/shared
# 5. Verify
ls packages/
# frontend/ shared/ (no backend/ or mobile/)
ls packages/frontend/
# App.tsx
ls packages/backend/ 2>/dev/null || echo "backend/ not checked out (expected)"
# 6. Git log still has full history
git log --onelineCheckpoint: Only packages/frontend/ and packages/shared/ are checked out. packages/backend/ and packages/mobile/ don't exist in the working tree. Full commit history is available.
Part 6: Diagnosing Performance
Goal: Use Git's tracing tools to identify bottlenecks.
cd ~/perf-lab
# 1. Basic timing
time git status
time git log --oneline | wc -l
# 2. Detailed performance trace
GIT_TRACE2_PERF=1 git status 2>/tmp/git-perf.log
cat /tmp/git-perf.log | grep "region_leave" | sort -t'|' -k4 -rn | head -10
# Shows which internal operations took the longest
# 3. Object storage analysis
git count-objects -v
# 4. Find the largest objects in the repository
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| grep blob \
| sort -rnk3 \
| head -10
# Shows the 10 largest blobs — candidates for LFS
# 5. Check if commit-graph exists and is valid
git commit-graph verify 2>&1
ls -lh .git/objects/info/commit-graph
# 6. Full diagnostic summary
echo "=== Repo Stats ==="
echo "Files: $(find . -not -path './.git/*' -type f | wc -l)"
echo "Commits: $(git rev-list --count HEAD)"
echo "Branches: $(git branch -a | wc -l)"
git count-objects -vHCheckpoint: You can identify the largest objects, the slowest operations, and the current optimization state of the repository.
Challenge: Optimize a Sluggish Repository
Create a deliberately unoptimized repository:
- Generate 1,000 commits across 500 files
- Include some large binary files (use
dd if=/dev/urandom of=big.bin bs=1M count=5) - Measure baseline performance (
git status,git log,git blame) - Apply every optimization from this module:
git gc, commit-graph, fsmonitor, untracked cache, rerere - Measure performance again and document the improvements
Common Pitfalls
| Pitfall | Why It Happens | How to Avoid It |
|---|---|---|
Running git gc --aggressive too often | Thinking more GC = better performance | Only use on import, migration, or monthly schedule |
| Not enabling commit-graph | Unaware of the feature | git config --global fetch.writeCommitGraph true |
| FSMonitor not working after enable | Daemon not started or system doesn't support it | Check git fsmonitor--daemon status; verify OS support |
| Sparse checkout conflicts with IDE | IDE tries to open/index non-existent files | Configure IDE to respect .git/info/sparse-checkout |
| Partial clone fetching too many blobs | Running git log -p or git diff on large ranges | Use --stat or path-limited queries to minimize blob fetches |
| Rerere applying an incorrect resolution | Bad resolution was recorded | git rerere forget <file> to clear and re-resolve |
| Forgetting to write commit-graph after repack | Manual GC doesn't always auto-write commit-graph | Use git maintenance instead of manual GC |
| Sparse checkout losing files on branch switch | Switching to a branch with different sparse paths | Update sparse-checkout patterns before switching if needed |
| Too many packfiles degrading performance | Auto-GC disabled, manual repack never done | Use git maintenance for automatic incremental repack |
Shallow clone breaking git bisect | History truncated before the bug was introduced | Use git fetch --deepen=N or --unshallow before bisecting |
Pro Tips
-
Start with
git maintenance starton every repo you work on regularly. It handles commit-graph, repacking, prefetching, and loose object cleanup automatically. It's the single highest-impact optimization command. -
Enable
core.fsmonitorglobally. For most developers, the built-in fsmonitor has no downside and makesgit statusnear-instant. Set it once:git config --global core.fsmonitor true. -
Use
--changed-pathswhen writing commit-graph. The Bloom filters for changed paths makegit log -- <path>dramatically faster. This is especially valuable in large repos where path-limited log queries are common. -
Enable
rererebefore your next rebase. If you're maintaining a feature branch that's rebased ontomainregularly,rererewill save you from resolving the same conflicts repeatedly. It's a pure quality-of-life improvement with no downside. -
Profile before optimizing. Use
GIT_TRACE2_PERF=1andtimeto measure which operations are actually slow before applying optimizations. Different repos have different bottlenecks. -
For CI/CD, use
--depth 1 --single-branch. CI jobs rarely need history or other branches. This minimizes clone time. If you needgit blameorgit bisectin CI, use--filter=blob:noneinstead.
Quiz / Self-Assessment
1. What does git gc do, and when does Git run it automatically?
Show Answer
git gc (garbage collection) performs several housekeeping tasks:
- Packs loose objects into packfiles with delta compression
- Removes unreachable objects that are past the reflog protection window
- Packs loose references into a
packed-refsfile - Prunes expired reflog entries
Git runs it automatically when the number of loose objects exceeds gc.auto (default: 6700) or the number of packfiles exceeds gc.autoPackLimit (default: 50). This typically happens during git push, git fetch, or git commit.
2. How does git maintenance differ from git gc?
Show Answer
git gc is an all-or-nothing operation — it runs every cleanup task in a single pass, which can cause noticeable pauses on large repos.
git maintenance breaks the work into smaller, incremental tasks (commit-graph updates, loose object packing, incremental repacking, prefetching) that run on separate schedules (hourly, daily, weekly). It runs in the background via the system task scheduler, so it never blocks your workflow. It's the modern replacement for manual/auto git gc.
3. What is the commit-graph file and why does it speed up git log?
Show Answer
The commit-graph file (.git/objects/info/commit-graph) is a precomputed binary index of the commit DAG. It stores each commit's parents, root tree hash, timestamp, and generation numbers.
Without it, Git must read individual commit objects from packfiles and decompress them during graph traversal. With the commit-graph, Git reads from a compact, random-access binary file — making operations like git log, git merge-base, and git branch --contains up to 10-80x faster on large repos. Adding --changed-paths includes Bloom filters that further accelerate path-limited queries like git log -- <path>.
4. How does the filesystem monitor (fsmonitor) speed up git status?
Show Answer
Normally, git status must stat every file in the working directory to check for modifications — for a repo with 100,000 files, that's 100,000 system calls. The filesystem monitor hooks into the OS's file-change notification system (FSEvents on macOS, inotify on Linux) to track which files actually changed since the last check. Git then only needs to stat those specific files, reducing git status from seconds to milliseconds.
Enable it with git config core.fsmonitor true (Git 2.37+).
5. What's the difference between sparse checkout and partial clone, and when would you use both together?
Show Answer
Partial clone (--filter=blob:none) affects what's downloaded — Git gets all commits and trees but skips file content (blobs) until they're actually needed.
Sparse checkout affects what's checked out to disk — Git only materializes specific directories in the working tree.
Use both together for monorepos:
git clone --filter=blob:none --sparse <url>
git sparse-checkout set --cone packages/my-serviceThis minimizes both network transfer (only download blobs for your directories) and disk usage (only materialize your directories). Full commit history remains available for git log, git blame, etc.
6. What does git rerere do, and how do you enable it?
Show Answer
git rerere (reuse recorded resolution) records how you resolve merge conflicts and automatically replays those resolutions when the same conflict appears again. Enable it with:
git config --global rerere.enabled trueWhen you resolve a conflict, rerere saves the "before" (preimage) and "after" (postimage) in .git/rr-cache/. If the same conflict pattern appears in a future merge or rebase, Git applies your previous resolution automatically. You can clear a bad resolution with git rerere forget <file>.
7. A repository's .git directory is 5 GB. How would you diagnose what's consuming the space?
Show Answer
- Check overall statistics:
git count-objects -vH - Find the largest blobs in history:
git rev-list --objects --all \ | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \ | grep blob | sort -rnk3 | head -20 - This reveals the largest files ever committed — often accidentally committed binaries, build artifacts, or data files.
- Solutions: migrate large files to Git LFS, or remove them from history with
git filter-repo.
8. What is the sparse index, and how does it improve performance beyond regular sparse checkout?
Show Answer
Regular sparse checkout only affects the working tree — the .git/index still contains entries for every file in the repository. Operations that touch the index (git status, git add, git commit) must process all entries.
The sparse index (enabled with git config index.sparse true) collapses non-checked-out directories into a single tree entry in the index. This means index operations only process entries for your checked-out files, not the entire repository. It's especially impactful in monorepos with hundreds of thousands of files.
9. For a CI/CD pipeline that only needs to build the latest code, what's the optimal clone strategy?
Show Answer
git clone --depth 1 --single-branch --branch main <url>--depth 1: Only download the latest commit (no history)--single-branch: Only download the specified branch (no other branches/tags)--branch main: Specify which branch to clone
This is the fastest and smallest possible clone. If you also need to run git blame or git bisect in CI, use --filter=blob:none instead of --depth 1 to retain full commit history.
For even faster CI runs, consider caching the .git directory between builds and using git fetch to update incrementally.
10. You enabled rerere and it auto-resolved a conflict, but the resolution is wrong. What do you do?
Show Answer
- Clear the bad recorded resolution:
git rerere forget <file> - Reset the file to its conflicted state:
git checkout -m <file> - Resolve the conflict correctly this time
- Stage and commit — rerere will record the new, correct resolution
You can also check git rerere diff before committing to preview what rerere applied, catching bad resolutions before they're committed.