Introduction to Git History of Version Control
Welcome to our Git tutorial! In this series, we'll go from the basics commands of Git to the more advanced topics. Before we dive into Git, let's talk a bit about history. If you already have Git installed and are ready to learn, you may jump directly into Git Fundamentals.
Why Version Control?
When working on any project that involves revisions, it's imperative that you keep track of the changes you make. As a programmer, you'll need to pinpoint where a bug might have been introduced by sifting through each update checkpoint. As a data scientist, you'll need to keep track of each dataset and analysis script used to obtain your results.
But how can we handle version control? Here are just some of naive ways we could:
- Keeping specific versions of files using names such as textfile-version1.txt per major editing. This is obviously not the cleanest nor most efficient way to organize files within your directory.
- Compressing and backing up our files frequently. Storing these backup files can eat up a ton of hard drive space.
- Copying files into specific time-stamped folders. This can be error prone and time-consuming.
To share, we could use a file-sharing platform, such as Dropbox, but this too has its drawbacks. Files can go missing and edit conflicts may occur. As great as cloud-sharing services are, they not built specifically designed for collaboration and version control.
Developing the Linux Kernel
The Linux Kernel project was initiated in 1991 by Linus Torvalds as an open-source, collaborative project. The aim was to develop a working operating system kernel under GNU licensing. Since the project included contributors from all over the world, it required a system that could track all edits and resolve any conflicts.
Linus and his team had been using a proprietary Distributed Version Control System (DVCS) known as BitKeeper. Although BitKeeper had been free for years, the owner, Larry McVoy, decided he wanted to start charging a licensing fee. Initially, Linus looked towards other free version control systems, but found none good enough for his project. In 2005, Linus decided to create his own management system; this new Version Control System came to be known as Git.
The term git is a British slang term for "an unpleasant person." Torvalds said "I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'."
- Support a distributed workflow. Each user has his own copy of the entire code base, making it possible to make changes without Internet connection. We'll see how advantageous this is on the next page.
- Safeguard against malicious attacks or accidents. Everything that is committed should be checksummed before storing it. This makes it near impossible to change contents without Git knowing about it. Furthermore, since Git operations almost always add data, it's difficult to lose data or perform an undoable function.
- Support for non-linear development. Contributors can create a branch and keep their changes separate from the production, or main branch. When all edits on the branch are complete, they may merge them to the main branch.
- Speed. Running Git subcommands should be quick and efficient.
- Local. Git's core operations do not require an Internet connection, making development fast and convenient. Even browsing the entire history of a project can be done locally.
- Free. Git should be a free software distributed under the GNU General Public License.
How Version Control Systems Work
In order to fully appreciate Git's features, we should go over the three main types of Version Control Systems.
1) Local Data Models
In the earliest form of version control, developers had to use the same filesystem. Here, a multi-user system, such as UNIX, would have to be necessary to handle multiple users along the filesystem tree.
In one of these models, Revision Control System (RCS), the most current copy of files is stored with patch sets created per change. By summing up these patches and applying them to the files, users were able to recreate the project at any point in time.
2) Centralized Version Control Systems
Although a Local Data Model could work, it was difficult to manage once collaborators across the world joined in on projects. Thus, Centralized Version Control Systems (CVCS, also known as Client-Server Models), were developed. Here, the entire repository lived on a server, and developers checked out the files they wanted to edit. Once they were done with their edits, they would commit them back to the main server.
In order to improve upon all the drawbacks of the previous systems, developers came up with Distributed Version Control Systems (DVCS), where a copy of the entire code base could be stored locally on every contributor's system. This makes files and version control operations increasingly easy to work, as no Internet connection is required to edit any files. Furthermore, if the main repository goes down or is deleted, it can easily be backed up by any local repository.
You'll see that Git falls into the category of DVCS. Now let's install Git and get started!
In this tutorial series, we'll be using Git via the command line. If you're not familiar with the Command Line, or are intimidated by the thought of using a non-GUI interface, check out our Beginner's Guide to the Linux Command Line. Be sure to understand at least the Basic and Intermediate commands sections before moving on.
Installing Git is simple. You may either install from source, directly from a website or via a package management system.
On a Windows computer, you want to proceed to this download page and follow the instructions.
Mac OS X Installation
If you're used to using homebrew (Mac's package manager), then type:
$ brew install git
To install on Linux via package management system, use the following if you're on a Debian distro:
$ apt-get install git-core
If you're using
yum as your package management system, use:
$ yum install git-core
Configuring Git's Global Settings
To check that Git is successfully installed, run the following command:
$ git --version
git version 2.2.1
The git command
git command works with a variety of subcommands. Simply pass in these subcommands following
Let's start by setting global configuration parameters. Use the subcommand
config to see a list of all the configurable parameters.
$ git config
usage: git config [options] Config file location --global use global config file --system use system config file --local use repository config file -f, --file <file> use given config file --blob <blob-id> read config from given blob object Action --get get value: name [value-regex] --get-all get all values: key [value-regex] --get-regexp get values for regexp: name-regex [value-regex] --get-urlmatch get value specific for the URL: section[.var] URL --replace-all replace all matching variables: name value [value_regex] --add add a new variable: name value --unset remove a variable: name [value-regex] --unset-all remove all matches: name [value-regex] --rename-section rename section: old-name new-name --remove-section remove a section: name -l, --list list all -e, --edit open an editor --get-color <slot> find the color configured: [default] --get-colorbool <slot> find the color setting: [stdout-is-tty] Type --bool value is "true" or "false" --int value is decimal number --bool-or-int value is --bool or --int --path value is a path (file or directory name) Other -z, --null terminate values with NUL byte --includes respect include directives on lookup
Git configuration files will live in your /etc/gitconfig file. Those that are specific to the user will live in ~/.gitconfig. As we'll see shortly, we can edit the settings within ~/.gitconfig with the
$ cat ~/.gitconfig
[user] email = email@example.com name = John Doe
Setting your username and email
Set your username and email so that you may identify yourself and your computer with others.
$ git config --global user.name "john doe" $ git config --global user.email firstname.lastname@example.org
Setting your default editor
When using Git, sometimes you'll need to enter a message to describe the changes you are making to the repository. When inputting these messages, you could have Git automatically open a text editor for you to type in.
By default, the text editor of choice is Vim, but in case you use emacs or another editor, you may set it with:
$ git config --global core.editor emacs
Setting a diff tool
The diff tool is used to resolve merge conflicts. For Vim users, we can use
$ git config --global merge.tool vimdiff
Turning on Color
To turn on color in the black and white terminal screen, use the following command:
$ git config --global color.ui true
This will make deletions red, and additions green.
Check your configurations
To check the list configuration edits you made, run:
$ git config --list
Git has many subcommands that it runs. For example, the subcommand
add is used to track the files in your repository (we'll cover this in more detail later). To find any help (in case you get lost along the way), run
git help with the subcommand.
$ git help add
NAME git-add - Add file contents to the index SYNOPSIS git add [-n] [-v] [--force | -f] [--interactive | -i] [--patch | -p] [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]] [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--] [<pathspec>...] DESCRIPTION This command updates the index using the current content found in the working tree, to prepare the content staged for the next commit. It typically adds the current content of existing paths as a whole, but with some options it can also be used to add content with only part of the changes made to the working tree files applied, or remove paths that do not exist in the working tree anymore.
And don't forget git's man page!
$ git man