A while back we compared the issue tracking features of four major SCM tools. Since people were quite interested in the comparison, we decided to continue the series with a similar post about Git-powered wiki features. After all, an effective and reliable documentation is a must-have for any software development team.
If you do not want to read the detailed breakdown of the wiki features and their characteristics, you can jump straight to the summary table.
Microsoft wanted to move to Git because of Git’s features, like its easy branching and its popularity among developers. But the transition faced three problems. Git wasn’t designed for such vast numbers of developers—more than 3,000 actively working on the codebase. Also, Git wasn’t designed for a codebase that was so large, either in terms of the number of files and version history for each file, or in terms of sheer size, coming in at more than 300GB. When using standard Git, working with the source repository was unacceptably slow. Common operations (such as checking which files have been modified) would take multiple minutes.
The company’s solution was to develop Git Virtual File System (GVFS). With GVFS, a local replica of a Git repository is virtualized such that it contains metadata and only the source code files that have been explicitly retrieved. By eliminating the need to replicate every file (and, hence, check every file for modifications), both the disk footprint of the repository and the speed of working with it were greatly improved. Microsoft modified Git to handle this virtual file system. The client was altered so that it didn’t needlessly try to access files that weren’t available locally and a new transfer protocol was added for selectively retrieving individual files from a remote repository.
.. The biggest complexity is that Git has a very conservative approach to compatibility, requiring that repositories remain compatible across versions.
.. GitHub’s interest and involvement is motivated by the company’s desire to address the needs of enterprise customers
.. Certain industries have large repositories that pose problems with Git; for example, game repositories are often physically large not because they have millions of files and decades of history, but because of their large number of graphics and other assets. The scaling improvements that Microsoft has made to Git are useful for this kind of large repository, too. As such, having the same family of improvements available in GitHub will enable the company to better serve these communities.
>The “why” is not technical.
This is one of thr few essays about decentralization that understands that the problem isn’t technical.
However, I’d go further than his explanations of incentives and say that the fundamental problem is that decentralized technical protocols do not solve the centralization of how money is spent.
Examples of that misunderstanding:
– SMTP the protocol is decentralized (technical) and yet we have giant email providers GMail/Hotmail/Yahoo which is centralized (money). The big providers spent $$$ on 1 gigabyte mail storage + backups + convenience. SMTP specifies how fields are laid out but it doesn’t put money in everyone’s bank account so they can run residential SMTP servers so the email ecosystem stays decentralized.
– Git the protocol is decentralized (technical) but Github the service is centralized (money). Why? Because Git the technical protocol is not a bank fund that gives every programmer a free $10 VPS account to host their own git repo. The centralization of money spent (Github invests in a datacenter but individual programmers do not) results in centralization.
– Bitcoin protocol is decentralized (technical) and yet the phenomenon of giant China “mining pools” emerges which is centralization (money). The ability to spend money on liquid cooled ASIC chips in a datacenter located near the Artic Circle is “centralized” to the entities that can spend that vast amount of money. The exceeds the ability for the home enthusiasts to compute hashes on a spare computer in their bedroom.
The common theme: technical protocols can be decentralized but the real-world implementation of those protocols end up centralized because physical things like cpus, harddrives, network bandwidth, etc cost money.
This pattern of decentralized technical protocols vs centralized economic behavior is ignored by virtually all decentralization enthusiasts.
So the real puzzle to decentralization is, “How do we _decentralize economic behavior_ when everybody doesn’t have the same amount of money to spend?” Nobody I’ve read about so far has figured that out . That includes Sandstorm/IPFS/Filecoin/Mastadon/Diaspora/Ethereum etc.