IPFS stands for Interplanetary File System and it is an open-source, peer-to-peer distributed hypermedia protocol that aims to function as a ubiquitous file system for all computing devices. It is a complex and highly ambitious project with some serious and profound implications on the future development and structure of the Internet as we know it.
Why IPFS and How Did It Start
The current iteration of the Internet is not nearly as decentralized as it was idealistically and initially perceived to become. It is also predicated on some outdated protocols that have led to a myriad of issues. The issues addressed by IPFS revolve around those associated with the current HTTP protocol of the Internet.
If you are unfamiliar with the function of HTTP relative to the Internet, it basically underpins data communication across the entire Internet. HTTP was invented in 1991, adopted by web browsers in 1996, and it fundamentally establishes how messages are transmitted across the Internet as well as how browsers should respond to commands and servers deal with requests. Basically, it is the underlying protocol of how we browse the web and the protocol backbone of the client-server paradigm.
HTTP vs IPFS, Image from MaxCDN
While HTTP has provided us with Internet as we know it today, it has become outdated, and after more than 20 years the prevailing issues are becoming more and more readily apparent. Key problems stemming from the implementation of HTTP today are a result of the massive increase in Internet traffic and the resulting stress points that have been amplified. With the current implementation of HTTP, problems such as the following have emerged.
- Inefficient content delivery stemming from downloading files from a single server at a time.
- Expensive bandwidth costs and file duplication leading to bloated storage.
- Increasing centralization of servers and providers leading to increased Internet censorship.
- Fragile history of information stored on the Internet and short lifespans of webpages.
- Intermittent connections that lead to an offline developing world and slow connection speeds.
The list of problems go on and it is no surprise that a technology more than 20 years old is becoming more noticeably outdated in an age of technological innovation. IPFS provides the distributed storage and file system that the Internet needs to achieve its true potential.
Instead of downloading files from single servers, in IPFS, you ask peers in the network to give you a path to a file rather than it coming from a central server. This enables high volume data distribution with high efficiency, historic versioning, resilient networks, and persistent availability of content secured and verified through cryptographic hashing and distributed across a network of peers. All of this sounds promising, but how does it work?
How Does IPFS Work?
Basically, IPFS is a similar concept to the World Wide Web as we know it today, but resembles more of a single BitTorrent swarm that exchanges objects within a single Git repository. Files are distributed through a BitTorrent-based protocol. Importantly, IPFS acts as a sort of combination of Kodemila, BitTorrent, and Git to create a distributed subsystem of the Internet.
The design of the protocol provides historic versioning of the Internet like with Git. Each file and all blocks within it are given a unique identifier, which is a cryptographic hash. Duplicates are removed across the network and version history is tracked for every file. This leads to persistently available content where web pages do not disappear because of a failed server or bankrupted web host. Further, the authenticity of content is guaranteed through this mechanism and when looking up files, you are essentially asking the network to find nodes storing the content behind the unique identifying hash associated with that content.
The links between the nodes in IPFS take the form of cryptographic hashes, and this is possible because of its Merkle DAG (Directed Acyclic Graphs) data architecture. The benefits of Merkle DAGs to IPFS include the following:
- Content Addressing – Content has a unique identifier that is the cryptographic hash of the file.
- No Duplication – Files with the same content cannot be duplicated and only stored once.
- Tamper Proof – Data is verified with its checksum, so if the hash changes, then IPFS will know the data is tampered with.
IPFS links file structures to each other using Merkle links and every file can be found by human-readable names using a decentralized naming system called IPNS. The implementation of Merkle Directed Acyclic Graphs (DAGS) are important to the underlying functionality of the protocol, but is more technical than the scope of this article. If you are interested in learning more about this aspect of IPFS, you can find much more detailed information on the IPFS Github page and more about how Merkle trees work here.
Each node only stores the content that it is interested in and indexes the information that allows it to figure out who is storing what. The framework for IPFS fundamentally removes the need for centralized servers to deliver website content to users. Eventually, this concept may entirely push the HTTP protocol into irrelevance and allow users to access content locally, offline. Instead of searching for servers as with the current infrastructure of the Internet, users will be searching for unique ID’s (cryptographic hashes), enabling millions of computers to deliver the file to you instead of just one server.
Use Cases and Implications
There are already some important use cases for IPFS and more are sure to arise as the protocol continues to develop. Offering the new, distributed P2P architecture for the Internet comes with its complexities, but the benefits can be seen in everything from massive financial savings in storage and bandwidth to integration with distributed blockchain networks.
Obvious advantages that come with the distributed storage model of IPFS apply to vastly more efficient data storage and immutable, permanence along with it. No longer will websites be relegated to cyclical 404 error messages due to downed servers or interrupted chain of HTTP links. Further, significant advantages are available for researchers in terms of efficiency, especially those needing to parse and analyze very large data sets. With the prevalence of Big Data in modern science, the fast performance and distributed archiving of data afforded by IPFS will become pertinent to accelerating advancements.
Service providers and content creators can also substantially reduce their costs associated with delivering large amounts of data to customers. Current iterations of this paradigm are hindered by increasing bandwidth costs and data providers getting charged for peering agreements. The costs associated with delivering content through centralized infrastructures of interconnected networks is only increasing and creating an environment of critical inefficiency and further centralization in an attempt to overcome these burdens.
IPFS uses, Image from Blockchain Mind
Additionally, centralization of servers leads to government snooping, increasing DDoS attack prevalence, ISP censorship, and private sale of data.
As Juan Benet, the creator of IPFS stated “Content on IPFS can move through any untrusted middlemen without giving up control of the data or putting it at risk.”
Finally, integration of IPFS with blockchain technology seems to be a perfect fit. Using IPFS within a blockchain transaction, you can place immutable, permanent links. Timestamps secure your data without having to actually store it on-chain which leads to reduced blockchain bloating and provides a convenient method for secure off-chain solutions to help blockchains scale.
IPFS is being included in a number of cryptocurrency platforms and has the potential to symbiotically help the industry to scale by providing the peer to peer and distributed file system architecture that is needed as a foundation to help support the growth of cryptocurrency platforms.
As you can see, IPFS is a both technically and conceptually complex protocol that has lofty ambitions to revolutionize the exchange of data across the Internet. HTTP was successful in its own right and helped the Internet to reach the grand stage that it is at today, but new technologies are emerging, and the need for a reformed and distributed infrastructure has made itself apparent.