Learning Objectives

By the end of this module, you will be able to:

  1. Explain what version control is and why every software project needs it
  2. Distinguish between centralized and distributed version control systems
  3. Describe Git's origin story and its core design philosophy
  4. Compare Git with other version control systems (Mercurial, SVN, Fossil)
  5. Understand repository hosting options and organizational structures on GitHub

1. What Is Version Control and Why It Matters

The Problem

Imagine you're writing a program. You have a working version. You decide to add a new feature. Halfway through, you realize the feature breaks everything. You want to go back — but you've already overwritten your files.

Now imagine this with a team of five developers, all editing the same codebase, all at the same time.

Without version control, teams resort to desperate measures:

project/
├── main.py
├── main_v2.py
├── main_v2_fixed.py
├── main_v2_fixed_FINAL.py
├── main_v2_fixed_FINAL_actually_final.py
├── main_backup_sarah.py
└── main_DO_NOT_DELETE.py

This is chaos. Files get lost. Work gets overwritten. Nobody knows which version is "the real one." Merging two people's changes means sitting down with two printouts and a highlighter.

The Solution

A Version Control System (VCS) — also called a Source Control Management (SCM) tool — solves this by:

  • Recording snapshots of your project at meaningful points in time
  • Tracking who changed what, when, and why
  • Allowing parallel work through branching — developers work independently without stepping on each other
  • Enabling rollback — you can always return to any previous state
  • Facilitating collaboration — changes from multiple developers can be merged together systematically

A VCS is one of the cornerstones of modern software development. It's not optional — it's as fundamental as a compiler or a text editor. And yet many developers never take the time to truly learn how theirs works.

Key insight: A VCS doesn't just record what changed. It records snapshots of your project at moments you consider meaningful. You make changes, you make more changes, and when you consider them relevant enough, you ask your VCS to record that historic event.


2. A Brief History of Version Control

Understanding where version control came from helps you appreciate why Git works the way it does.

Generation 1: Local-Only (1970s–1980s)

SCCS (1972) and RCS (1982) were the earliest tools. They tracked changes to individual files on a single machine. No networking, no collaboration. If you wanted to share changes, you mailed a patch file.

  • One file at a time
  • Single developer, single machine
  • Stored reverse deltas (the differences between versions)

Generation 2: Centralized VCS (1990s–2000s)

CVS (1990) and Subversion (SVN) (2000) introduced a client-server model. A single central server held the "true" repository. Developers checked out files, made changes, and committed back to the server.

                    ┌──────────────┐
                    │  Central     │
          ┌────────│  Server      │────────┐
          │        │  (SVN)       │        │
          │        └──────────────┘        │
          ▼                                ▼
   ┌─────────────┐                  ┌─────────────┐
   │ Developer A │                  │ Developer B │
   │ (checkout)  │                  │ (checkout)  │
   └─────────────┘                  └─────────────┘

Advantages over local VCS:

  • Multiple developers could work on the same project
  • Administrators could control who had access

Critical problems:

  • Single point of failure — if the server died, nobody could work
  • Must be online — you couldn't commit on an airplane or from a location without network access
  • Branching was expensive — in SVN, creating a branch often meant copying the entire directory tree on the server
  • Slow operations — every commit, diff, and log query required a network round-trip

Generation 3: Distributed VCS (2005–present)

In 2005, something pivotal happened. The Linux kernel development team had been using BitKeeper, a proprietary distributed VCS. When BitKeeper revoked its free license, Linus Torvalds — the creator of Linux — decided to build his own tool. He had specific requirements:

  • Must handle a massive project (the Linux kernel had millions of lines of code and thousands of contributors)
  • Must be fast — branching and merging needed to be nearly instantaneous
  • Must be distributed — every developer gets a complete copy of the entire history
  • Must guarantee data integrity — corruption should be detectable

He built Git in roughly two weeks. It was functional enough to manage the Linux kernel within a month.

Just 12 days after Git's initial release, Matt Mackall released Mercurial — solving many of the same problems with a different philosophy. Both tools are still alive and widely used, though Git has become the dominant choice in the industry.


3. Centralized vs. Distributed: What's the Difference?

This is the single most important architectural concept to understand before using Git.

Centralized Model (SVN, CVS, Perforce)

                    ┌──────────────────┐
                    │  Central Server  │
                    │                  │
                    │  Full History    │
                    │  All Branches    │
                    │  Single Source   │
                    │  of Truth        │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │ Dev A    │  │ Dev B    │  │ Dev C    │
        │          │  │          │  │          │
        │ Working  │  │ Working  │  │ Working  │
        │ Copy     │  │ Copy     │  │ Copy     │
        │ ONLY     │  │ ONLY     │  │ ONLY     │
        └──────────┘  └──────────┘  └──────────┘
  • Developers have only a working copy — the latest files, no history
  • Every operation (commit, log, diff, branch) requires server connection
  • The server is the single source of truth — and a single point of failure

Distributed Model (Git, Mercurial)

        ┌──────────────────┐
        │  Server (GitHub)  │
        │                   │
        │  Full History     │
        │  All Branches     │
        │                   │
        └────────┬──────────┘
                 │
  ┌──────────────┼──────────────┐
  │              │              │
  ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Dev A      │ │ Dev B      │ │ Dev C      │
│            │ │            │ │            │
│ Full       │ │ Full       │ │ Full       │
│ History    │ │ History    │ │ History    │
│ All        │ │ All        │ │ All        │
│ Branches   │ │ Branches   │ │ Branches   │
│            │ │            │ │            │
│ (Complete  │ │ (Complete  │ │ (Complete  │
│  Clone)    │  │  Clone)    │ │  Clone)    │
└────────────┘ └────────────┘ └────────────┘
  • Every developer has a complete clone of the entire repository, including full history
  • All copies are equally powerful — there is no central authority baked into the system
  • A server is entirely optional (but most teams choose to use one for synchronization)
  • Most operations (commit, log, diff, branch, blame) happen locally — they are instant and require no network
  • The only operations that require a network are push (send changes to a remote) and fetch/pull (get changes from a remote)

Why Distributed Wins

CapabilityCentralized (SVN)Distributed (Git)
Work offlineNoYes — full history is local
Commit speedNetwork round-tripInstant (local operation)
Branch creationExpensive (copies files)Near-instantaneous (moves a pointer)
Backup resilienceServer dies = potential data lossEvery clone is a full backup
History browsingRequires server connectionInstant (local data)
Concurrent workflowsLimitedExcellent — each developer has their own repo

4. Git's Design Philosophy

Git isn't just "another VCS." It has a distinctive philosophy that influences everything about how it works:

Content-Addressable Filesystem

At its core, Git is a content-addressable filesystem. Every piece of data — every file, every directory listing, every commit — is identified by a SHA-1 hash of its contents. This means:

  • Two files with identical contents always have the same hash, regardless of their name or location
  • Any change to content, no matter how small, produces a completely different hash
  • Corruption is detectable — if the data doesn't match its hash, you know something is wrong

We'll explore this in depth in Module 3.

Snapshots, Not Deltas

Most earlier VCS tools stored deltas — the differences between successive versions of each file. Git takes a fundamentally different approach: it stores snapshots. Every time you commit, Git records the complete state of every file in your project. If a file hasn't changed, Git doesn't store a copy — it stores a pointer to the previous identical version.

Delta-based (SVN):        Snapshot-based (Git):

File A:                   Commit 1:  [A1] [B1] [C1]
  v1 → Δ1 → Δ2 → Δ3     Commit 2:  [A1] [B2] [C1]  ← A1 and C1 are pointers
File B:                   Commit 3:  [A2] [B2] [C2]
  v1 → Δ1 → Δ2
File C:
  v1 → Δ1

This makes certain operations (like switching branches or comparing distant commits) much faster than in delta-based systems.

Immutability

Once data is written to Git's object database, it is never modified. New commits don't overwrite old ones — they create new objects that point back to the old ones. This immutable, append-only design is what makes Git so reliable: history, once recorded, doesn't change.

Nearly Every Operation Is Local

Because every clone contains the full history, operations like log, diff, blame, branch, and commit are entirely local. This is why Git feels fast even on large projects — you're not waiting for a server.


5. Git vs. Other Modern VCS

Git vs. Mercurial (Hg)

Mercurial was created just 12 days after Git, solving many of the same problems. From a user's perspective, they are remarkably similar.

AspectGitMercurial
Created2005 by Linus Torvalds2005 by Matt Mackall
PhilosophyPower and flexibility; the "sharp tool"Simplicity and safety; the "friendly tool"
Branching modelLightweight branch pointersNamed branches are permanent; bookmarks ≈ Git branches
Learning curveSteeper — many commands, many flagsGentler — more consistent CLI
ExtensibilityShell scripts, aliasesPython plugin system
Market shareDominant (~95%+)Used at Facebook, Mozilla; declining elsewhere
HostingGitHub, GitLab, BitbucketBitbucket (dropped Hg in 2020), self-hosted

Git vs. Subversion (SVN)

SVN is a centralized VCS still found in some enterprises and legacy projects.

AspectGitSVN
ArchitectureDistributedCentralized
Offline workFull capabilityRequires server for most operations
Branching costNear-zero (pointer move)Expensive (directory copy)
History storageSnapshotsDeltas
Atomic commitsYes (entire repo)Yes (entire repo, unlike CVS)
Partial checkoutPossible (sparse checkout)Native (checkout subdirectories)
Large binary filesNeeds Git LFSHandles natively

Git vs. Fossil

Fossil is a lesser-known distributed VCS created by D. Richard Hipp (creator of SQLite). It bundles version control, bug tracking, wiki, and a web interface into a single binary. It's an interesting alternative for small projects that want an all-in-one tool, but it lacks Git's ecosystem and community.

Git vs. Pijul

Pijul is an experimental VCS based on a mathematical model of patches (category theory). It handles certain merge scenarios more elegantly than Git but is still in early development and not widely used. Worth watching for the future.


6. Choosing a Topology

Even though Git is distributed and technically allows any developer to synchronize directly with any other developer, in practice most teams use a centralized server as a hub.

Common Topologies

Hub-and-spoke (most common):

              ┌─────────────────┐
              │  Central Server │
              │  (e.g. GitHub)  │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          │            │            │
       Dev A        Dev B        Dev C

  All syncs go through the server.
  Devs never sync directly with each other.

This is the standard in the industry. It's simple, transparent, and works well with code review tools like pull requests.

Peer-to-peer (rare):

       Dev A ◄────► Dev B
         ▲            ▲
         │            │
         └─────►◄─────┘
              Dev C

Technically possible but rarely used. Hard to coordinate, no single source of truth.

Integration manager (open source):

  Contributor forks → pushes to their fork → opens PR to upstream

  ┌──────────────┐     ┌──────────────┐
  │  Upstream     │◄────│  Fork (Dev)  │
  │  (blessed)    │     │              │
  └──────────────┘     └──────────────┘

Common in open source. The "blessed" repository is owned by a maintainer. Contributors fork it, make changes, and submit pull requests.


7. Choosing a Host

Once you've decided on a topology, you need somewhere to host your central server. The two major players:

GitHub

  • Founded in 2008; now owned by Microsoft
  • ~100 million developers (as of 2023)
  • The de facto home of open source software
  • Features: pull requests, issues, Actions (CI/CD), Packages, Codespaces, Copilot
  • Free for public and private repositories

GitLab

  • Founded in 2011; publicly traded company
  • Can be self-hosted (open-source Community Edition) or used as SaaS
  • Built-in CI/CD pipeline (arguably more mature than GitHub Actions historically)
  • Features: merge requests, issues, CI/CD, container registry, security scanning
  • Popular with enterprises that need to self-host

Other Options

  • Bitbucket (Atlassian) — integrates tightly with Jira; popular in Atlassian shops
  • Gitea/Forgejo — lightweight, self-hosted; good for small teams or home labs
  • Azure DevOps — Microsoft's enterprise offering; good if you're in the Azure ecosystem
  • Self-hosted — you can run a bare Git server on any machine with SSH access (even a Raspberry Pi)

For this course, we'll use GitHub — it's the most popular and its interface is what you'll most likely encounter professionally.


8. Repository Organization: Mono-repo vs. Multi-repo

When an organization has multiple projects, a CTO-level decision must be made: how do you map projects to repositories?

Multi-repo (one project per repository)

Organization/
├── repo-frontend/       ← its own Git repository
├── repo-backend-api/    ← its own Git repository
├── repo-mobile-app/     ← its own Git repository
└── repo-shared-libs/    ← its own Git repository

Pros: Clear boundaries, independent release cycles, smaller clone sizes, simpler CI/CD per project Cons: Cross-project changes require coordinating across repos, dependency management is your responsibility

Mono-repo (all projects in one repository)

Organization/
└── monorepo/            ← single Git repository
    ├── frontend/
    ├── backend-api/
    ├── mobile-app/
    └── shared-libs/

Pros: Atomic cross-project changes, single source of truth, easier dependency management, shared tooling Cons: Repository grows large fast (every developer clones everything), CI/CD must detect which projects actually changed, requires specialized tooling (Nx, Turborepo, Bazel) at scale

Google, Meta, and Microsoft famously use monorepos (with custom tooling). Most smaller teams use multi-repo.

Hybrid Approaches

  • Git subtrees — a monorepo where subdirectories can also be their own independent repositories
  • Git submodules — each project is its own repo, but a parent repo contains links (references) to specific commits of each sub-repo

We'll revisit submodules and subtrees in Module 19. For this course, we'll use the simple multi-repo approach: one project per repository.


9. GitHub Organizations

GitHub allows you to create organizations — shared accounts where teams collaborate across multiple repositories.

  • An organization can have many members with different roles (owner, member, outside collaborator)
  • Repositories belong to the organization, not to individual users
  • Teams within the organization can have granular access to specific repositories
  • URL pattern: github.com/<org-name>/<repo-name>

For a real company this is essential. For a solo tutorial it's overkill — but it mirrors professional practice.


Command Reference

CommandDescription
git --versionCheck your installed Git version
git help <command>Open the manual page for a command
git help -aList all available Git commands

These are the only commands for this module. We'll start using Git properly in Module 2.


Hands-On Lab: The Problem Git Solves

This lab deliberately does not use Git — it demonstrates the problem that Git exists to solve.

Setup

Open a terminal and create a project directory:

mkdir ~/git-course-lab1
cd ~/git-course-lab1

Step 1: Create a "project"

echo "def greet(name):" > app.py
echo "    return f'Hello, {name}!'" >> app.py
echo "" >> app.py
echo "print(greet('World'))" >> app.py

Step 2: Manual versioning (the bad old days)

Your app works. Let's "save" a version:

cp app.py app_v1.py

Now make a change:

cat > app.py << 'EOF'
def greet(name, greeting="Hello"):
    return f'{greeting}, {name}!'
 
print(greet('World'))
print(greet('World', 'Howdy'))
EOF

Save another version:

cp app.py app_v2.py

Step 3: Simulate a teammate

Your teammate (you, in another terminal) makes a conflicting change. Create their version:

cat > app_teammate.py << 'EOF'
def greet(name):
    return f'Hello, {name}!'
 
def farewell(name):
    return f'Goodbye, {name}!'
 
print(greet('World'))
print(farewell('World'))
EOF

Step 4: Try to merge manually

Now combine your v2 changes (custom greeting) with your teammate's changes (farewell function). You need both features in app.py.

Try it. Open app.py in an editor and combine them.

Checkpoint: Your merged app.py should look something like:

def greet(name, greeting="Hello"):
    return f'{greeting}, {name}!'
 
def farewell(name):
    return f'Goodbye, {name}!'
 
print(greet('World'))
print(greet('World', 'Howdy'))
print(farewell('World'))

Step 5: Feel the pain

ls ~/git-course-lab1/

You should see:

app.py  app_teammate.py  app_v1.py  app_v2.py

Now imagine this with 50 files, 10 developers, and 6 months of history.

Questions to consider:

  • Which version was deployed to production?
  • When did the farewell function get added? By whom?
  • What did app.py look like three weeks ago?
  • Can you undo just the "custom greeting" change without losing the "farewell" feature?

With manual file copying, the answers range from "difficult" to "impossible." This is exactly what Git solves.

Step 6: Observe Git's answer (preview)

Install Git if you haven't already (we'll cover installation fully in Module 2), and try:

cd ~/git-course-lab1
git init
git add app.py
git commit -m "Initial version with greet and farewell"

One command to record. One clean directory. Full history preserved internally. No _v1, _v2, _backup files.

Challenge

Create a more complex scenario: a project with 5 files, 3 "versions," and 2 "teammates" making conflicting changes. Try to manage it with file copying. Write down every question that comes up that you can't answer — those are exactly the problems Git was designed to solve.

Cleanup

rm -rf ~/git-course-lab1

Common Pitfalls & Troubleshooting

PitfallExplanation
"I don't need version control for solo projects"Even solo, you'll want to undo mistakes, experiment on branches, and keep history. The habit of using Git for everything (scripts, config files, notes) pays off.
Confusing Git with GitHubGit is the tool. GitHub is a hosting service. You can use Git without GitHub, and GitHub supports repositories managed by other tools (like Mercurial, until Bitbucket dropped it).
Thinking the server is "the real" repositoryIn a distributed VCS, every clone is a complete repository. The server is just an agreed-upon synchronization point — not more authoritative than your local copy.
"I'll learn Git later when I need it"You need it now. Every team uses it. Learning it under pressure during your first job (or first open-source contribution) is stressful. Learn it before you need it.

Pro Tips

  1. Git isn't just for code. Writers track books in Git. Lawyers track contract revisions. Scientists track datasets and papers. If it's text and it changes over time, Git can track it.

  2. Think in snapshots, not files. The mental shift from "tracking file changes" to "recording project snapshots" is fundamental to understanding Git. Every commit is a complete picture of your entire project at one point in time.

  3. The server is optional. You can use Git entirely locally, with no GitHub, no internet, no server. This is useful for personal projects, experimentation, or working in air-gapped environments.

  4. Learn the internals. Most developers treat Git as a black box and then panic when something goes wrong. Modules 3 and 4 will show you what's actually happening under the hood. Once you understand the object model and the commit graph, Git stops being scary.

  5. Branch naming convention matters early. Even before you learn branching (Module 6), know that teams typically use prefixes like feature/, bugfix/, hotfix/, and release/ to keep branches organized. Start this habit from day one.


Quiz / Self-Assessment

1. What is the difference between an SCM and a VCS?

Answer
They're the same thing — different names for the same category of tools. SCM = Source Control Management, VCS = Version Control System. Git's website is git-scm.com.

2. Name two critical problems with centralized version control systems.

Answer
(1) Single point of failure — if the server goes down, nobody can commit or view history. (2) Requires network connectivity for most operations — you can't work offline.

3. In a distributed VCS like Git, what does every developer have?

Answer
A complete clone of the entire repository, including the full history of all branches. Every copy is equally powerful.

4. Who created Git and why?

Answer
Linus Torvalds created Git in 2005 after BitKeeper (a proprietary VCS used for Linux kernel development) revoked its free license. He needed a fast, distributed, reliable tool to manage the Linux kernel source code.

5. What VCS was created just 12 days after Git?

Answer
Mercurial (Hg), created by Matt Mackall. From a user perspective, Git and Mercurial are very similar, though they differ in implementation philosophy.

6. Does Git store deltas (differences between versions) or snapshots?

Answer
Snapshots. Every commit records the complete state of every file. Unchanged files are stored as pointers to previously stored identical content, not as deltas.

7. Is a server required to use Git?

Answer
No. A server is entirely optional. Git works perfectly on a single machine with no network. In practice, most teams use a server (like GitHub) as a synchronization hub, but it's a choice, not a requirement.

8. What is a mono-repo?

Answer
A repository structure where multiple projects (frontend, backend, libraries, etc.) all live in a single Git repository. The alternative is multi-repo, where each project has its own repository. Both approaches have trade-offs.

9. What is the most common topology for teams using Git?

Answer
Hub-and-spoke: every developer has a complete clone, and a central server (e.g., GitHub) acts as the synchronization point. All pushes and pulls go through the server. Developers don't sync directly with each other.

10. What does "content-addressable filesystem" mean in the context of Git?

Answer
Every piece of data in Git is identified by a cryptographic hash (SHA-1) of its contents. The hash IS the address. This means identical content always has the same hash, and any corruption is detectable because the data won't match its hash.