Learning Objectives
By the end of this module, you will be able to:
- Explain what version control is and why every software project needs it
- Distinguish between centralized and distributed version control systems
- Describe Git's origin story and its core design philosophy
- Compare Git with other version control systems (Mercurial, SVN, Fossil)
- Understand repository hosting options and organizational structures on GitHub
1. What Is Version Control and Why It Matters
The Problem
Imagine you're writing a program. You have a working version. You decide to add a new feature. Halfway through, you realize the feature breaks everything. You want to go back — but you've already overwritten your files.
Now imagine this with a team of five developers, all editing the same codebase, all at the same time.
Without version control, teams resort to desperate measures:
project/
├── main.py
├── main_v2.py
├── main_v2_fixed.py
├── main_v2_fixed_FINAL.py
├── main_v2_fixed_FINAL_actually_final.py
├── main_backup_sarah.py
└── main_DO_NOT_DELETE.py
This is chaos. Files get lost. Work gets overwritten. Nobody knows which version is "the real one." Merging two people's changes means sitting down with two printouts and a highlighter.
The Solution
A Version Control System (VCS) — also called a Source Control Management (SCM) tool — solves this by:
- Recording snapshots of your project at meaningful points in time
- Tracking who changed what, when, and why
- Allowing parallel work through branching — developers work independently without stepping on each other
- Enabling rollback — you can always return to any previous state
- Facilitating collaboration — changes from multiple developers can be merged together systematically
A VCS is one of the cornerstones of modern software development. It's not optional — it's as fundamental as a compiler or a text editor. And yet many developers never take the time to truly learn how theirs works.
Key insight: A VCS doesn't just record what changed. It records snapshots of your project at moments you consider meaningful. You make changes, you make more changes, and when you consider them relevant enough, you ask your VCS to record that historic event.
2. A Brief History of Version Control
Understanding where version control came from helps you appreciate why Git works the way it does.
Generation 1: Local-Only (1970s–1980s)
SCCS (1972) and RCS (1982) were the earliest tools. They tracked changes to individual files on a single machine. No networking, no collaboration. If you wanted to share changes, you mailed a patch file.
- One file at a time
- Single developer, single machine
- Stored reverse deltas (the differences between versions)
Generation 2: Centralized VCS (1990s–2000s)
CVS (1990) and Subversion (SVN) (2000) introduced a client-server model. A single central server held the "true" repository. Developers checked out files, made changes, and committed back to the server.
┌──────────────┐
│ Central │
┌────────│ Server │────────┐
│ │ (SVN) │ │
│ └──────────────┘ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Developer A │ │ Developer B │
│ (checkout) │ │ (checkout) │
└─────────────┘ └─────────────┘
Advantages over local VCS:
- Multiple developers could work on the same project
- Administrators could control who had access
Critical problems:
- Single point of failure — if the server died, nobody could work
- Must be online — you couldn't commit on an airplane or from a location without network access
- Branching was expensive — in SVN, creating a branch often meant copying the entire directory tree on the server
- Slow operations — every commit, diff, and log query required a network round-trip
Generation 3: Distributed VCS (2005–present)
In 2005, something pivotal happened. The Linux kernel development team had been using BitKeeper, a proprietary distributed VCS. When BitKeeper revoked its free license, Linus Torvalds — the creator of Linux — decided to build his own tool. He had specific requirements:
- Must handle a massive project (the Linux kernel had millions of lines of code and thousands of contributors)
- Must be fast — branching and merging needed to be nearly instantaneous
- Must be distributed — every developer gets a complete copy of the entire history
- Must guarantee data integrity — corruption should be detectable
He built Git in roughly two weeks. It was functional enough to manage the Linux kernel within a month.
Just 12 days after Git's initial release, Matt Mackall released Mercurial — solving many of the same problems with a different philosophy. Both tools are still alive and widely used, though Git has become the dominant choice in the industry.
3. Centralized vs. Distributed: What's the Difference?
This is the single most important architectural concept to understand before using Git.
Centralized Model (SVN, CVS, Perforce)
┌──────────────────┐
│ Central Server │
│ │
│ Full History │
│ All Branches │
│ Single Source │
│ of Truth │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Dev A │ │ Dev B │ │ Dev C │
│ │ │ │ │ │
│ Working │ │ Working │ │ Working │
│ Copy │ │ Copy │ │ Copy │
│ ONLY │ │ ONLY │ │ ONLY │
└──────────┘ └──────────┘ └──────────┘
- Developers have only a working copy — the latest files, no history
- Every operation (commit, log, diff, branch) requires server connection
- The server is the single source of truth — and a single point of failure
Distributed Model (Git, Mercurial)
┌──────────────────┐
│ Server (GitHub) │
│ │
│ Full History │
│ All Branches │
│ │
└────────┬──────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Dev A │ │ Dev B │ │ Dev C │
│ │ │ │ │ │
│ Full │ │ Full │ │ Full │
│ History │ │ History │ │ History │
│ All │ │ All │ │ All │
│ Branches │ │ Branches │ │ Branches │
│ │ │ │ │ │
│ (Complete │ │ (Complete │ │ (Complete │
│ Clone) │ │ Clone) │ │ Clone) │
└────────────┘ └────────────┘ └────────────┘
- Every developer has a complete clone of the entire repository, including full history
- All copies are equally powerful — there is no central authority baked into the system
- A server is entirely optional (but most teams choose to use one for synchronization)
- Most operations (commit, log, diff, branch, blame) happen locally — they are instant and require no network
- The only operations that require a network are push (send changes to a remote) and fetch/pull (get changes from a remote)
Why Distributed Wins
| Capability | Centralized (SVN) | Distributed (Git) |
|---|---|---|
| Work offline | No | Yes — full history is local |
| Commit speed | Network round-trip | Instant (local operation) |
| Branch creation | Expensive (copies files) | Near-instantaneous (moves a pointer) |
| Backup resilience | Server dies = potential data loss | Every clone is a full backup |
| History browsing | Requires server connection | Instant (local data) |
| Concurrent workflows | Limited | Excellent — each developer has their own repo |
4. Git's Design Philosophy
Git isn't just "another VCS." It has a distinctive philosophy that influences everything about how it works:
Content-Addressable Filesystem
At its core, Git is a content-addressable filesystem. Every piece of data — every file, every directory listing, every commit — is identified by a SHA-1 hash of its contents. This means:
- Two files with identical contents always have the same hash, regardless of their name or location
- Any change to content, no matter how small, produces a completely different hash
- Corruption is detectable — if the data doesn't match its hash, you know something is wrong
We'll explore this in depth in Module 3.
Snapshots, Not Deltas
Most earlier VCS tools stored deltas — the differences between successive versions of each file. Git takes a fundamentally different approach: it stores snapshots. Every time you commit, Git records the complete state of every file in your project. If a file hasn't changed, Git doesn't store a copy — it stores a pointer to the previous identical version.
Delta-based (SVN): Snapshot-based (Git):
File A: Commit 1: [A1] [B1] [C1]
v1 → Δ1 → Δ2 → Δ3 Commit 2: [A1] [B2] [C1] ← A1 and C1 are pointers
File B: Commit 3: [A2] [B2] [C2]
v1 → Δ1 → Δ2
File C:
v1 → Δ1
This makes certain operations (like switching branches or comparing distant commits) much faster than in delta-based systems.
Immutability
Once data is written to Git's object database, it is never modified. New commits don't overwrite old ones — they create new objects that point back to the old ones. This immutable, append-only design is what makes Git so reliable: history, once recorded, doesn't change.
Nearly Every Operation Is Local
Because every clone contains the full history, operations like log, diff, blame, branch, and commit are entirely local. This is why Git feels fast even on large projects — you're not waiting for a server.
5. Git vs. Other Modern VCS
Git vs. Mercurial (Hg)
Mercurial was created just 12 days after Git, solving many of the same problems. From a user's perspective, they are remarkably similar.
| Aspect | Git | Mercurial |
|---|---|---|
| Created | 2005 by Linus Torvalds | 2005 by Matt Mackall |
| Philosophy | Power and flexibility; the "sharp tool" | Simplicity and safety; the "friendly tool" |
| Branching model | Lightweight branch pointers | Named branches are permanent; bookmarks ≈ Git branches |
| Learning curve | Steeper — many commands, many flags | Gentler — more consistent CLI |
| Extensibility | Shell scripts, aliases | Python plugin system |
| Market share | Dominant (~95%+) | Used at Facebook, Mozilla; declining elsewhere |
| Hosting | GitHub, GitLab, Bitbucket | Bitbucket (dropped Hg in 2020), self-hosted |
Git vs. Subversion (SVN)
SVN is a centralized VCS still found in some enterprises and legacy projects.
| Aspect | Git | SVN |
|---|---|---|
| Architecture | Distributed | Centralized |
| Offline work | Full capability | Requires server for most operations |
| Branching cost | Near-zero (pointer move) | Expensive (directory copy) |
| History storage | Snapshots | Deltas |
| Atomic commits | Yes (entire repo) | Yes (entire repo, unlike CVS) |
| Partial checkout | Possible (sparse checkout) | Native (checkout subdirectories) |
| Large binary files | Needs Git LFS | Handles natively |
Git vs. Fossil
Fossil is a lesser-known distributed VCS created by D. Richard Hipp (creator of SQLite). It bundles version control, bug tracking, wiki, and a web interface into a single binary. It's an interesting alternative for small projects that want an all-in-one tool, but it lacks Git's ecosystem and community.
Git vs. Pijul
Pijul is an experimental VCS based on a mathematical model of patches (category theory). It handles certain merge scenarios more elegantly than Git but is still in early development and not widely used. Worth watching for the future.
6. Choosing a Topology
Even though Git is distributed and technically allows any developer to synchronize directly with any other developer, in practice most teams use a centralized server as a hub.
Common Topologies
Hub-and-spoke (most common):
┌─────────────────┐
│ Central Server │
│ (e.g. GitHub) │
└────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
Dev A Dev B Dev C
All syncs go through the server.
Devs never sync directly with each other.
This is the standard in the industry. It's simple, transparent, and works well with code review tools like pull requests.
Peer-to-peer (rare):
Dev A ◄────► Dev B
▲ ▲
│ │
└─────►◄─────┘
Dev C
Technically possible but rarely used. Hard to coordinate, no single source of truth.
Integration manager (open source):
Contributor forks → pushes to their fork → opens PR to upstream
┌──────────────┐ ┌──────────────┐
│ Upstream │◄────│ Fork (Dev) │
│ (blessed) │ │ │
└──────────────┘ └──────────────┘
Common in open source. The "blessed" repository is owned by a maintainer. Contributors fork it, make changes, and submit pull requests.
7. Choosing a Host
Once you've decided on a topology, you need somewhere to host your central server. The two major players:
GitHub
- Founded in 2008; now owned by Microsoft
- ~100 million developers (as of 2023)
- The de facto home of open source software
- Features: pull requests, issues, Actions (CI/CD), Packages, Codespaces, Copilot
- Free for public and private repositories
GitLab
- Founded in 2011; publicly traded company
- Can be self-hosted (open-source Community Edition) or used as SaaS
- Built-in CI/CD pipeline (arguably more mature than GitHub Actions historically)
- Features: merge requests, issues, CI/CD, container registry, security scanning
- Popular with enterprises that need to self-host
Other Options
- Bitbucket (Atlassian) — integrates tightly with Jira; popular in Atlassian shops
- Gitea/Forgejo — lightweight, self-hosted; good for small teams or home labs
- Azure DevOps — Microsoft's enterprise offering; good if you're in the Azure ecosystem
- Self-hosted — you can run a bare Git server on any machine with SSH access (even a Raspberry Pi)
For this course, we'll use GitHub — it's the most popular and its interface is what you'll most likely encounter professionally.
8. Repository Organization: Mono-repo vs. Multi-repo
When an organization has multiple projects, a CTO-level decision must be made: how do you map projects to repositories?
Multi-repo (one project per repository)
Organization/
├── repo-frontend/ ← its own Git repository
├── repo-backend-api/ ← its own Git repository
├── repo-mobile-app/ ← its own Git repository
└── repo-shared-libs/ ← its own Git repository
Pros: Clear boundaries, independent release cycles, smaller clone sizes, simpler CI/CD per project Cons: Cross-project changes require coordinating across repos, dependency management is your responsibility
Mono-repo (all projects in one repository)
Organization/
└── monorepo/ ← single Git repository
├── frontend/
├── backend-api/
├── mobile-app/
└── shared-libs/
Pros: Atomic cross-project changes, single source of truth, easier dependency management, shared tooling Cons: Repository grows large fast (every developer clones everything), CI/CD must detect which projects actually changed, requires specialized tooling (Nx, Turborepo, Bazel) at scale
Google, Meta, and Microsoft famously use monorepos (with custom tooling). Most smaller teams use multi-repo.
Hybrid Approaches
- Git subtrees — a monorepo where subdirectories can also be their own independent repositories
- Git submodules — each project is its own repo, but a parent repo contains links (references) to specific commits of each sub-repo
We'll revisit submodules and subtrees in Module 19. For this course, we'll use the simple multi-repo approach: one project per repository.
9. GitHub Organizations
GitHub allows you to create organizations — shared accounts where teams collaborate across multiple repositories.
- An organization can have many members with different roles (owner, member, outside collaborator)
- Repositories belong to the organization, not to individual users
- Teams within the organization can have granular access to specific repositories
- URL pattern:
github.com/<org-name>/<repo-name>
For a real company this is essential. For a solo tutorial it's overkill — but it mirrors professional practice.
Command Reference
| Command | Description |
|---|---|
git --version | Check your installed Git version |
git help <command> | Open the manual page for a command |
git help -a | List all available Git commands |
These are the only commands for this module. We'll start using Git properly in Module 2.
Hands-On Lab: The Problem Git Solves
This lab deliberately does not use Git — it demonstrates the problem that Git exists to solve.
Setup
Open a terminal and create a project directory:
mkdir ~/git-course-lab1
cd ~/git-course-lab1Step 1: Create a "project"
echo "def greet(name):" > app.py
echo " return f'Hello, {name}!'" >> app.py
echo "" >> app.py
echo "print(greet('World'))" >> app.pyStep 2: Manual versioning (the bad old days)
Your app works. Let's "save" a version:
cp app.py app_v1.pyNow make a change:
cat > app.py << 'EOF'
def greet(name, greeting="Hello"):
return f'{greeting}, {name}!'
print(greet('World'))
print(greet('World', 'Howdy'))
EOFSave another version:
cp app.py app_v2.pyStep 3: Simulate a teammate
Your teammate (you, in another terminal) makes a conflicting change. Create their version:
cat > app_teammate.py << 'EOF'
def greet(name):
return f'Hello, {name}!'
def farewell(name):
return f'Goodbye, {name}!'
print(greet('World'))
print(farewell('World'))
EOFStep 4: Try to merge manually
Now combine your v2 changes (custom greeting) with your teammate's changes (farewell function). You need both features in app.py.
Try it. Open app.py in an editor and combine them.
Checkpoint: Your merged app.py should look something like:
def greet(name, greeting="Hello"):
return f'{greeting}, {name}!'
def farewell(name):
return f'Goodbye, {name}!'
print(greet('World'))
print(greet('World', 'Howdy'))
print(farewell('World'))Step 5: Feel the pain
ls ~/git-course-lab1/You should see:
app.py app_teammate.py app_v1.py app_v2.py
Now imagine this with 50 files, 10 developers, and 6 months of history.
Questions to consider:
- Which version was deployed to production?
- When did the farewell function get added? By whom?
- What did
app.pylook like three weeks ago? - Can you undo just the "custom greeting" change without losing the "farewell" feature?
With manual file copying, the answers range from "difficult" to "impossible." This is exactly what Git solves.
Step 6: Observe Git's answer (preview)
Install Git if you haven't already (we'll cover installation fully in Module 2), and try:
cd ~/git-course-lab1
git init
git add app.py
git commit -m "Initial version with greet and farewell"One command to record. One clean directory. Full history preserved internally. No _v1, _v2, _backup files.
Challenge
Create a more complex scenario: a project with 5 files, 3 "versions," and 2 "teammates" making conflicting changes. Try to manage it with file copying. Write down every question that comes up that you can't answer — those are exactly the problems Git was designed to solve.
Cleanup
rm -rf ~/git-course-lab1Common Pitfalls & Troubleshooting
| Pitfall | Explanation |
|---|---|
| "I don't need version control for solo projects" | Even solo, you'll want to undo mistakes, experiment on branches, and keep history. The habit of using Git for everything (scripts, config files, notes) pays off. |
| Confusing Git with GitHub | Git is the tool. GitHub is a hosting service. You can use Git without GitHub, and GitHub supports repositories managed by other tools (like Mercurial, until Bitbucket dropped it). |
| Thinking the server is "the real" repository | In a distributed VCS, every clone is a complete repository. The server is just an agreed-upon synchronization point — not more authoritative than your local copy. |
| "I'll learn Git later when I need it" | You need it now. Every team uses it. Learning it under pressure during your first job (or first open-source contribution) is stressful. Learn it before you need it. |
Pro Tips
-
Git isn't just for code. Writers track books in Git. Lawyers track contract revisions. Scientists track datasets and papers. If it's text and it changes over time, Git can track it.
-
Think in snapshots, not files. The mental shift from "tracking file changes" to "recording project snapshots" is fundamental to understanding Git. Every commit is a complete picture of your entire project at one point in time.
-
The server is optional. You can use Git entirely locally, with no GitHub, no internet, no server. This is useful for personal projects, experimentation, or working in air-gapped environments.
-
Learn the internals. Most developers treat Git as a black box and then panic when something goes wrong. Modules 3 and 4 will show you what's actually happening under the hood. Once you understand the object model and the commit graph, Git stops being scary.
-
Branch naming convention matters early. Even before you learn branching (Module 6), know that teams typically use prefixes like
feature/,bugfix/,hotfix/, andrelease/to keep branches organized. Start this habit from day one.
Quiz / Self-Assessment
1. What is the difference between an SCM and a VCS?
Answer
2. Name two critical problems with centralized version control systems.
Answer
3. In a distributed VCS like Git, what does every developer have?
Answer
4. Who created Git and why?
Answer
5. What VCS was created just 12 days after Git?
Answer
6. Does Git store deltas (differences between versions) or snapshots?
Answer
7. Is a server required to use Git?
Answer
8. What is a mono-repo?
Answer
9. What is the most common topology for teams using Git?
Answer
10. What does "content-addressable filesystem" mean in the context of Git?
Answer