cdnjs git repositories visualization using gource

Gource is a famous version control visualization tool, supports git, svn, hgbzr and cvs2cl, I tried to use gource to visualize the CDNJS development history, and here are the videos I uploaded to youtube.

CDNJS main repository:
(https://github.com/cdnjs/cdnjs)

https://www.youtube.com/watch?v=ehwK-KM4uYQ

CDNJS new-website repository:
(https://github.com/cdnjs/new-website)

https://www.youtube.com/watch?v=GLH7Ovzi5z8

閱讀全文

How to sync/update forked git repository with upstream?

中文版: https://www.peterdavehello.org/2014/02/update_forked_repository/

I found the origin post in Chinese is more popular among all the posts, so just rewrite it in English now :)

Once you forked a repository on GitHub, the commit history from the upstream will stay at the moment you forked it, sometimes we’ll need to update it, so that the parent of our commit won’t be so old, which can help us have a better look, easier to trace git commit history, and prevent some potential or known conflicts, maybe you have other reasons, anyway. This method can also sync a repository between different git servers.

NOTE:
If you forked repository is behind the upstream repository for more than 1k commits and the repository is fat, you can consider to delete your forked repository and re-fork the origin one, it may be faster and more efficient.

Let’s start it.

First, if you didn’t have the repository locally, you have to clone the forked repository to local, you can set the clone depth to save the bandwidth and disk space usage:
$ git clone --depth 1 https://github.com/user/repo.git

(Replace the url to your forked repository)

Once you cloned it, check its ‘remote’, usually you’ll get only one remote after clone, like this:
$ git remote -v
origin https://github.com/user/repo.git (fetch)
origin https://github.com/user/repo.git (push)

Now we should add another ‘remote’ – the origin upstream, so that we can pull the updates from, in this case, use read-only git protocol will be faster, more efficient (but note that some firewall may not allow that protocol, so you’ll need to use https in that case):
$ git remote add upstream git://github.com/otheruser/upstreamRepo.git

PS: ‘upstream’ is the name I gave it, you can give it another name as you want.

To verify new added remote, let’s check it again, you should have two remotes now:

origin https://github.com/user/repo.git (fetch)
origin https://github.com/user/repo.git (push)
upstream git://github.com/otheruser/upstreamRepo.git (fetch)
upstream git://github.com/otheruser/upstreamRepo.git (push)

Now we can start the “update” works, I assume the branch you’re going to update is the master branch, if you are going to update a non-master branch, just checkout to the branch you want, but don’t forget to change the branch from the below examples!

If your branch is only behind the upstream, no any “ahead” commits(which means you didn’t commit any new things on the same branch came from upstream), you can directly pull the updates from upstream:
$ git pull upstream master

If your branch also contains your own commits, you should better pull with “–rebase” parameter:
$ git pull --rebase upstream master

Now, almost done, if there is no error or conflicts(we don’t discuss conflicts here), push your master to origin remote, then you’ll found that your forked repo is fresh again:
$ git push origin master

善用 Git 的 sparse checkout 跟 shallow clone/pull 來提高工作效率

當初也是因為在摸比較肥大的專案才開始接觸到的東西,不過貌似大家平常用不太到,所以很多人不知道有這樣的功能,也是做個筆記,有人問的時候可以直接丟這篇 …

先講 git shallow clone/pull:

man git-clone 理面的說明:

–depth
Create a shallow clone with a history truncated to the specified number of revisions.

簡單來說就是把太久以前不需要的歷史給丟掉,大於給定數量以前的 commit 紀錄就會被忽略,進而省下 clone 時頻寬、空間及時間,這點在數千到數萬個 commits 以上的repository 理面效果會非常明顯,像 Travis CI 在做 CI build 的時候預設的 clone depth 就是 50,很久以前是 100,缺點除了 git log 只看的到一定數量的提交紀錄外,git blame 跟bi-sect 等會需要 trace 先前紀錄的功能都會變的不可靠或不可用就是了。

另外一個就是 sparse-checkout 了,這個功能的作用是只 checkout 出我們想要的檔案,以 cdnjs 為例,.git 資料夾也才600多MB而以,可是整個專案的資料夾卻高達 13GB 左右,由於理面的檔案大多是非常容易壓縮的 source code(文字檔),所以就會有 .git 資料夾明明占用很少空間,可是實際上整個專案占用的空間卻非常龐大的現象,而這麼大的專案,很可能會有檔案系統操作的效率低落的問題(尤其在 rebase 等操作),在我們已知只需要取得某專案某些目錄或檔案的情況下,根本沒必要把所有檔案都 checkout 出來,這時候就可以使用 sparse-checkout,在送 pull request 到不是自己常態性參與的專案時很好用!

用法大概是這樣(步驟2~4順序可換):

  1. 建立一個空的 git 專案:
    $ git init new.project && cd new.project
  2. 在專案裡面啟用 sparse-checkout:
    $ git config core.sparseCheckout true
  3. 設定你要 checkout 哪些檔案 (直接寫到 .git/info/sparse-checkout,多個規則可寫多行),例如我只要 /ajax/libs/jquery/ 底下的所有檔案:
    $ echo '/ajax/libs/jquery/*' >> .git/info/sparse-checkout
  4. 設定 remote (要從哪裡clone/pull?):
    $ git remote add origin git://github.com/cdnjs/cdnjs.git
  5. 然後就可以開始 pull 了(這邊可以加上前面說的 shallow pull,加上 --depth=n ):
    $ git pull origin master

到這邊就完成了,整個專案所占用的空間應該會小非常多,這邊以 cdnjs 搭配 shallow clone depth=10 為例,看一下空間使用:

$ du -d 1 -h
18M ./ajax
587M ./.git
605M .

總共605MB而已,而原本的長這樣:

$ du -d 1 -h
682M ./.git
43M ./scratch
16M ./node_modules
12G ./ajax
24K ./test
32K ./build
13G .


高達 13GB … 少了 12GB 的 checkout 快了很多啊 …

如果之後想改變要 checkout 的檔案呢?

就直接更改專案底下的 .git/info/sparse-checkout 檔案,改好之後做一次 git reset --hard 即可(記得更改之前確認沒有未儲存的修改即可)

範例:

/ajax/libs/jquery/*
/build
/CONTRIBUTING.md
/MIT-LICENSE
/README.md
/sparseCheckout.md
//cdn2.peterdavehello.org/auto-update.js
/circle.yml
/CONTRIBUTING-WIP.md
//cdn2.peterdavehello.org/package.json
/update-script.sh

有一點要注意就是檔名前面代表專案根目錄的斜線不要省略,若非要 checkout 所有同名檔案,就要把完整路徑寫清楚,例如 /package.json 如果寫成 package.json,則所有的 package.json 都會被 checkout 出來