##Table of Contents
##Summary
Git is a local software version control system. The service GitHub is one of many companies that does remote hosting of your git repositories.
###Tools
Tig - curses interface for git available at: https://github.com/jonas/tig/
####How Git works
The concept of git is that data is seen as a set of snapshots of this mini filesystem. Every time you commit and save your project, it basically takes a picture of what all your files look like at that moment and stores a reference to the snapshot.
If files have not changed, git does not store the file again (just a link to the previous identical file it has already stored). Git is different than some other systems (like mercurial) in that some other systems store the deltas between saves (which saves space).
A good video is here: https://www.youtube.com/watch?v=1ffBJ4sVUb4
####Stages of Git
To see your current status and branch, do: git status
.
This tells you the state (tracked or untracked) of each file.
The three main stages are:
####Setup
Download git and optionally signup for GitHub.
git config --list
$git config --global user.name "William Liu
$git config --global user.email "william.q.liu@gmail.com"
ssh-keygen -t rsa -c "william.q.liu@gmail.com"
C:\Users\wliu.ssh
Mac HD > Users > williamliu
id_rsa
, you get two files:
id_rsa
is the private half of the key (keep this secret)id_rsa.pub
is the public half of the key (free to give away)####Basic Workflow
So how does Git work in the real world?
To get started, you can either create an empty project or copy an existing repository from another server (like GitHub).
If you want to create a new project, you can do the following:
git init
initializes a new project directory.git
subdirectory that has all your necessary repository files
(e.g. your .gitignore
file, a file that says what files to ignore).gitignore
file (what to ignore) and add a License####Clone an existing repository
You can clone an existing repository from another server (like GitHub) using the command: git clone https://github.com/WilliamQLiu/myrepo.git
.
https
to git
if you want to use SSH to transfer.mynewrepo
after myrepo.git
)####Add and Commit
git add . # Adds all files or specify the specific files
git commit -m "This is a git message for the commit" # Commit with quick message
git commit -a -m "Made a change" # Automatic Adds and Commits
git commit # Pulls up your editor and lets you make your commit message and description
git commit --amend # modify the most recent commit instead of creating an entirely new commit, be careful it replaces entirely as a new commit
Note that git commit --amend
is commonly used to edit a few files that we would like to say add to the commit
or to modify the commit message. Don’t amend if you’re on a publish branch that others are working off of because
you’ll rewrite history!
####Commit Message
Try to follow these rules for making a good git commit message: https://chris.beams.io/posts/git-commit/ The key things are:
####Create Branch, Checkout Branch
git branch features # Create a branch called features
git checkout features # Check out a branch named features
git checkout master # Check out the master branch
git fetch
is used to update your remote-tracking branches; it downloads
new data from a remote repository, but it doesn’t integrate any of the new
data into your working files
git fetch origin
or git fetch remote
git pull
does a git fetch
followed by a git merge
####Merging changes back to master branch
git pull # Fetch and merge changes on the remote server to your working dir
git merge features # Merge a different branch into your active branch
git diff # View all the merge conflicts
git reset --hard # Undo a bad merge
git push # push changes back to a remote repository (e.g. on GitHub)
git push --force # use your copy, don't care about everything else
git push --force-with-lease # better than --force, checks remote branch hasn't been updated
Delete a local branch with: git branch -d mybranch"
Force delete a local branch (e.g. unmerged to master):
git branch -D mybranch”
Delete a remote branch with: git push origin --delete mybranch
HEAD
is the name Git uses to refer to “where your file system is pointing right now”. Usually HEAD is
pointed towards a named branch, but it doesn’t always have to.
Scenario:
HEAD
is pointing to master
.B
and need to change the history, you could normally just make a new commit with the changes.
However, since we’re practicing, we’ll git checkout B
. HEAD will now be at B
in detached HEAD
.git checkout master
git rebase temporarybranch
to reply commits C and D on top of our new B.####Stashing
git stash # hide your current changes on branch
git stash pop # get your hidden changes on branch
##Undo with Reset, Checkout, and Revert
To undo changes to your repository, you first need to think about the scope of what you want to change and then what command to use.
####Undo Scope
In a Git repository, we have the following components.
####Undo Commands
With these git commands, you can pass in the above component (e.g. the working directory) as a parameter (e.g. --soft
, --mixed
, --hard
) and that determines the scope of the undo.
git reset
- moves the tip of a branch to a different commit; this is used to remove commits from the current branch (e.g. go back two commits would be git checkout hotfix
and then git reset HEAD~2
). We end up throwing away these last two commits.git checkout
- moves HEAD
to a different branch and updates the working directory to match. If there are any differences, you have to commit or stash any changes in the working directory first.git revert
- undoes a commit by creating a new commit. This is safe since it does not re-write a commit history.####Undo Parameters
Again when you do an undo, you can pass in an optional parameter with your command to specify the scope of the change. To be safe, only use HEAD
as the parameter.
For example, with a git reset
, we have:
--soft
means we reset only the commit history (code in the working directory and staged snapshot is untouched)--mixed
means we reset the staged snapshot and the commit history (code in the working directory is untouched)--hard
means we reset everythinggit revert
is the only one that does not have a file-level counterpart.
####Undo Example
Say we have a previous commit that deleted a lot of files you needed. You can go back and grab those files.
git checkout -b RESTOREFILES
git log | grep -B 10 -A 10 whateveryourelookingfor
git checkout dfdjkf343jkdfsakljflsafjllkds3~1 mydir/myfile1.py mydir/myfile2.py
git commit -m "Restored files"
The ~1
means to take the previous commit
The -B 10
and -A 10
in grep means to show before and after 10 lines.
BFG Repo-Cleaner is a tool to remove large (e.g. blobs bigger than 1M) or bad data (e.g. passwords, credentials, private data) and this is faster and easier to use than the git-filter-branch
command.
####Remove passwords and secret information
--mirror
flag: e.g. git clone --mirror https://github.com/WilliamQLiu/reponame.git
BFG Repo-Cleaner
here. You will get a file that looks like this bfg-1.12.3.jar
. If you are on a mac, just setup with homebrew using brew install bfg
and then you can use the bfg
command.passwords.txt
file and add in all the data you want to remove (e.g. mypassword, ‘mypassword’)java -jar bfg-1.12.3.jar --replace-text passwords.txt
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
Funny story: When I first ran this, I accidentally put the passwords.txt file in Git. I had to rerun BFG to remove the passwords file. Oops.
####Remove a file
bfg --delete-files myfile.txt
(files from earlier step)git reflog expire --expire=now --all && git gc --prune=now --aggressive
when you’re complete with all the files from Option 3Once you start working with other people, you may want to make suggestions to their code or they might have suggestions for your code. This is called a pull request and involves these steps:
##Rebase
Rebasing (git rebase
) is an alternative to merging (git merge
), but does this in a destructive manner (opposed to merging’s non-destructive operation).
What this means is that if you work on a feature branch, merge ties together the histories of both branches. The advantage is that this is non-destructive, but the issue is that we can have a polluted feature branch history if there were a lot of commits in the master branch (which makes it hard for other developers to understand the history of the project). If this happens, look into git log
options.
With rebase, you can rebase the feature branch to begin on the tip of the master branch. This basically moves the entire feature branch to begin at the end of the master branch. The issue with rebase is that you re-write the project history by creating brand new commits for each commit in the original branch. You get a cleaner project history (linear project history), but this is dangerous. If you have to, consider doing an interactive rebasing to alter commits as they are moved to the new branch so you can get complete control over the branch’s commit history. Helpful commands are pick
and fixup
. Do NOT use rebase on a public branch (i.e. if someone else might be looking at the branch).
git checkout feature
git rebase -i master
git rebase with n being the number of commits you need to access; change ‘pick’ to squash, leave the top commit as ‘pick’
git rebase -i HEAD~n
git fetch origin pull/<pull_id>/head:<branch_name>
git push <wherever> --force-with-lease # don't overwrite others work, e.g. git push origin MYBRANCH --force-with-lease
git rebase --abort # if something messes up
Besides using git as just a simple save in time on one branch workflow, we can have other options depending on project and team size.
The Feature Branch Workflow is a git workflow where all feature development takes place in a dedicated branch instead of the master branch. This means that the master branch will only contain valid code and that work on a particular feature does not disturb the main code.
####GitFlow Workflow
GitFlow is a specific type of workflow for larger projects and is built off of the Feature Branch Workflow. The branch structure is slightly more complicated by having more specific roles to different branches and adding in tags around a project release.
Open Source projects normally have a different workflow than if you were a normal contributor to your own project. For an open source project, you probably need to fork your own copy instead of creating branches and being able to push commits to the repository itself. After you make your changes to your fork, you then create a pull request back to the original.
https://github.com/apache/incubator-airflow
https://github.com/WilliamQLiu/incubator-airflow
git clone git@github.com:WilliamQLiu/incubator-airflow.git
git remote add upstream https://github.com/apache/incubator-airflow
git remote -v
git fetch upstream
to sync up with the original projectgit merge upstream/master
####Tilde
The tilde ~
operator is used in git to point to a parent of a commit.
An example of HEAD~
indicates the revision before the last one committed.
To move further back, just indicate HEAD~N
(e.g. HEAD~3
) to take you back N (e.g. 3) levels back.
This works great until you run into merges since merge commits have two parents. The ~
just selects the
first one.
####Carrot
The carrot ^
operator moves to a specific parent of the specificed version. You use a number to
indicate which parent. For example, HEAD^2
tells git to select the second parent of the last one
committed, not the ‘grandparent’.
You can repeat this multiple times to move back further.
HEAD^2^^
takes you back three levels, selecting the second parent on the first step.
If you don’t give a number, Git assumes 1.
###Git Commands
Some good commands to know are:
git checkout myotherbranch fileonmyotherbranch.py
git log --all --decorate --oneline --graph
git log --oneline -5 --before "Sat Aug 20 2018"
git blame
git revert
git shortlog
git reflog
git fetch
git merge
git branch
git checkout -d somebranch
The double dot notation is for specifying ranges, e.g.
This says show me commits after fb6
… up to and including 2c30
…
This also says “show me all commits that are included in the second commit that are not included in the first commit”
git log fb6a21154cc5fd2a09bc905ff4745a2b3b4fd4ec..2c3090f55c042c7c23c1f63fc5d764ff1670f4d6
The triple dot notation is for showing all commits that are in either revision that are not included in both revisions.