InterPlanetary File System
What is IPFS?
Let’s start with a one-line definition of IPFS:
IPFS is a distributed system for storing and accessing files, websites, applications, and data.
What does that mean, exactly? Let’s say you’re doing some research on aardvarks. (Roll with it; aardvarks are cool! Did you know they can tunnel 3 feet in only 5 minutes?) You might start by visiting the Wikipedia page on aardvarks at:
When you put that URL in your browser’s address bar, your computer asks one of Wikipedia’s computers, which might be somewhere on the other side of the country (or even the planet), for the aardvark page.
However, that’s not the only option for meeting your aardvark needs! There’s a mirror of Wikipedia stored on IPFS, and you could use that instead. If you use IPFS, your computer asks to get the aardvark page like this:
IPFS knows how to find that sweet, sweet aardvark information by its contents, not its location (more on that, called content addressing, below). The IPFS-ified version of the aardvark info is represented by that string of numbers in the middle of the URL (
QmXo…), and instead of asking one of Wikipedia’s computers for the page, your computer uses IPFS to ask lots of computers around the world to share the page with you. It can get your aardvark info from anyone who has it, not just Wikipedia.
And, when you use IPFS, you don’t just download files from someone else — your computer also helps distribute them. When your friend a few blocks away needs the same Wikipedia page, they might be as likely to get it from you as they would from your neighbor or anyone else using IPFS.
IPFS makes this possible for web pages and any file a computer might store, whether it’s a document, an email, or even a database record.
Making it possible to download a file from many locations that aren’t managed by one organization…
- Supports a resilient internet. If someone attacks Wikipedia’s web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire; you can still get the same webpages from somewhere else.
- It makes it harder to censor content. Because files on IPFS can come from many places, it’s harder for anyone (whether they’re states, corporations, or someone else) to block things. We hope IPFS can help provide ways to circumvent actions like these when they happen.
- Can speed up the web when you’re far away or disconnected. If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster. This is especially valuable if your community is networked locally but doesn’t have a good connection to the wider internet. (Well-funded organizations with technical expertise do this today by using multiple data centers or CDNs — content distribution networks (opens new window). IPFS hopes to make this possible for everyone.)
That last point is actually where IPFS gets its full name: the InterPlanetary File System. We’re striving to build a system that works across places as disconnected or as far apart as planets. While that’s an idealistic goal, it keeps us working and thinking hard, and almost everything we create in pursuit of that goal is also useful here at home.
What about that link to the aardvark page above? It looked a little unusual:
That jumble of letters after
/ipfs/ is called a content identifier, and it’s how IPFS can get content from multiple places.
Traditional URLs and file paths such as…
…identify a file by where it’s located — what computer it’s on and where on that computer’s hard drive it is. That doesn’t work if the file is in many places, though, like your neighbor’s computer and your friend’s across town.
Instead of being location-based, IPFS addresses a file by _ what’s in it_ or it is content. The content identifier above is a cryptographic hash of the content at that address. The hash is unique to the content that it came from, even though it may look short compared to the original content. It also allows you to verify that you got what you asked for — bad actors can’t just hand you content that doesn’t match. (If hashes are new to you, check out the concept guide on hashes for an introduction.)
Why do we say “content” instead of “files” or “web pages” here? Because a content identifier can point to many different types of data, such as a single small file, a piece of a larger file, or metadata. (In case you don’t know, metadata is “data about the data.” You use metadata when you access the date, location, or file size of your digital pictures, for example.) So, an individual IPFS address can refer to the metadata of just a single piece of a file, a whole file, a directory, a whole website, or any other kind of content. For more on this, check out our guide to how IPFS works.
Because the address of a file in IPFS is created from the content itself, links in IPFS can’t be changed. For example …
- If the text on a web page is changed, the new version gets a new, different address.
- Content can’t be moved to a different address. On today’s internet, a company could reorganize content on their website and move a page at
http://mycompany.com/services. In IPFS, the old link you have would still point to the same old content.
Of course, people want to update and change content all the time and don’t want to send new links every time they do it. This is entirely possible in an IPFS world, but explaining it requires a little more info than within the scope of this IPFS introduction. Check out the concept guides on IPNS, the Mutable File System (MFS), and DNSLink to learn more about how changing content can work in a content-addressed, distributed system.
It’s important to remember in all of these situations, using IPFS is participatory and collaborative. If nobody using IPFS has the content identified by a given address available for others to access, you won’t be able to get it. On the other hand, content can’t be removed from IPFS as long as someone is interested enough to make it available, whether that person is the original author or not. Note that this is similar to the current web, where it is also impossible to remove content that’s been copied across an unknowable number of websites; the difference with IPFS is that you are always able to find those copies.
While there’s lots of complex technology in IPFS, the fundamental ideas are about changing how networks of people and computers communicate. Today’s World Wide Web is structured on ownership and access, meaning that you get files from whoever owns them — if they choose to grant you access. IPFS is based on possession and participation ideas, where many people possess each others’ files and participate in making them available.
That means IPFS only works well when people are actively participating. If you use your computer to share files using IPFS, but then you turn your computer off, other people won’t get those files from you anymore. But if you or others make sure that copies of those files are stored on more than one computer that’s powered on and running IPFS, those files will be more reliably available to other IPFS users who want them. This happens to some extent automatically: by default, your computer shares a file with others for a limited time after you’ve downloaded it using IPFS. You can also make content available more permanently by pinning it, which saves it to your computer and makes it available on the IPFS network until you decide to unpin it. (You can learn more about this in our guide to persistence and pinning.)
If you want to make sure one of your own files is permanently shared on the internet today, you might use a for-pay file-sharing service like Dropbox. Some people have begun offering similar services based on IPFS called pinning services. But since IPFS makes this sharing a built-in feature, you can also collaborate with friends or partner with institutions (for example, museums and libraries might work together) to share each others’ files. We hope IPFS can be the low-level tool that allows a rich fabric of communities, business, and cooperative organizationns to all form a distributed web that is much more reliable, robust, and equitable than the one we have today.
How IPFS works
Want to see a video recap of how IPFS works with files in general? Check out this content from IPFS Camp 2019! Core Course: How IPFS Deals With Files(opens new window)
IPFS is a peer-to-peer (p2p) storage network. Content is accessible through peers located anywhere globally that might relay information, store it, or do both. IPFS knows how to find what you ask for using its content address rather than its location.
There are three fundamental principles to understanding IPFS:
- Unique identification via content addressing.
- Content linking via directed acyclic graphs (DAGs)
- Content discovery via distributed hash tables (DHTs)
These three principles build upon each other to enable the IPFS ecosystem. Let’s start with content addressing and the unique identification of content.
IPFS uses content addressing to identify content by what’s in it rather than by its location. Looking for an item by content is something you already do all the time. For example, when you look for a book in the library, you often ask for it by the title; that’s content addressing because you’re asking for what it is. If you were using location addressing to find that book, you’d ask for it by where it is: “I want the book that’s on the second floor, first stack, third shelf from the bottom, four books from the left.” If someone moved that book, you’d be out of luck!
That problem exists for the internet and on your computer! Right now, content is found by location, such as:
By contrast, every piece of content that uses the IPFS protocol has a content identifier or CID, and it is a hash. The hash is unique to the content that it came from, even though it may look short compared to the original content. If hashes are new to you, check out our guide to cryptographic hashing for an introduction.
Many distributed systems use content addressing through hashes as a means for identifying content and linking it together — everything from the commits that back your code to the blockchains that run cryptocurrencies leverage this strategy. However, the underlying data structures in these systems are not necessarily interoperable.
This is where the Interplanetary Linked Data (IPLD) project (opens new window)comes in. IPLD translates between hash-linked data structures allowing for the unification of the data across distributed systems. IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across many linked nodes, allowing you to explore data regardless of the underlying protocol. IPLD provides a way to translate between content-addressable data structures: “Oh, you use Git-style, no worries, I can follow those links. Oh, you use Ethereum, I got you. I can follow those links too!”
IPFS follows particular data-structure preferences and conventions. The IPFS protocol uses those conventions and IPLD to get from raw content to an IPFS address that uniquely identifies content on the IPFS network. The next section explores how links between content are embedded within that content address through a DAG data structure.
#Directed acyclic graphs (DAGs)
IPFS and many other distributed systems take advantage of a data structure called directed acyclic graphs (opens new window), or DAGs. Specifically, they use Merkle DAGs, which are DAGs where each node has a unique identifier that is a hash of the node’s contents. Sound familiar? This refers back to the CID concept that we covered in the previous section. Put another way: identifying a data object (like a Merkle DAG node) by its hash value is content addressing. Check out our guide to Merkle DAGs for a more in-depth treatment of this topic.
IPFS uses a Merkle DAG optimized for representing directories and files, but you can structure a Merkle DAG in many different ways. For example, Git uses a Merkle DAG that has many versions of your repo inside of it.
To build a Merkle DAG representation of your content, IPFS often first splits it into blocks. Splitting it into blocks means that different files can come from different sources and be authenticated quickly. (If you’ve ever used BitTorrent, you may have noticed that when you download a file, BitTorrent can fetch it from multiple peers at once; this is the same idea.)
It’s easy to see a Merkle DAG representation of your choice file using the DAG Builder visualizer (opens new window).
Merkle DAGs are a bit of a “turtles all the way down” (opens new window)scenario; that is, everything has a CID. Let’s say you have a file, and its CID identifies it. What if that file is in a folder with several other files? Those files will have CIDs too. What about that folder’s CID? It would be a hash of the CIDs from the files underneath (i.e., the folder’s content). In turn, those files are made up of blocks, and each of those blocks has a CID. You can see how a file system on your computer could be represented as a DAG. You can also see, hopefully, how Merkle DAG graphs start to form. For a visual exploration of this concept, look at the IPLD Explorer (opens new window).
Another useful feature of Merkle DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle DAG, i.e., parts of different Merkle DAGs can reference the same subset of data. For example, if you update a website, only updated files receive new content addresses. Your old version and your new version can refer to the same blocks for everything else. This can make transferring versions of large datasets (such as genomics research or weather data) more efficient because you only need to transfer the parts that are new or have changed instead of creating entirely new files each time.
So, to recap, IPFS lets you give CIDs to content and link that content together in a Merkle DAG. Now let’s move on to the last piece: how you find and move content.
#Distributed hash tables (DHTs)
To find which peers are hosting the content you’re after (discovery), IPFS uses a distributed hash table or DHT. A hash table is a database of keys to values. A distributed hash table is one where the table is split across all the peers in a distributed network. To find content, you ask these peers.
The libp2p project (opens new window)is the part of the IPFS ecosystem that provides the DHT and handles peers connecting and talking to each other. (Note that, as with IPLD, libp2p can also be used as a tool for other distributed systems, not just IPFS.)
Once you know where your content is (or, more precisely, which peers are storing each of the blocks that make up the content you’re after), you use the DHT again to find the current location of those peers (routing). So, to get to content, you use libp2p to query the DHT twice.
You’ve discovered your content, and you’ve found the current location(s) of that content. Now, you need to connect to that content and get it (exchange). To request blocks from and send blocks to other peers, IPFS currently uses a module called Bitswap (opens new window). Bitswap allows you to connect to the peer or peers with the content you want, send them your want list (a list of all the blocks you’re interested in), and have them send you the blocks you requested. Once those blocks arrive, you can verify them by hashing their content to get CIDs and compare them to the requested CIDs. These CIDs also allow you to deduplicate blocks if needed.
There are other content replication protocols under discussion (opens new window)as well, the most developed of which is Graphsync (opens new window). There’s also a proposal under discussion to extend the Bitswap protocol (opens new window)to add functionality around requests and responses.
What makes libp2p especially useful for peer to peer connections is connection multiplexing. Traditionally, every service in a system opens a different connection to communicate with other services of the same kind remotely. Using IPFS, you open just one connection, and you multiplex everything on that. For everything your peers need to talk to each other about, you send a little bit of each thing, and the other end knows how to sort those chunks where they belong.
This is useful because establishing connections is usually hard to set up and expensive to maintain. With multiplexing, once you have that connection, you can do whatever you need on it.
#A modular paradigm
As you may have noticed from this discussion, the IPFS ecosystem comprises many modular libraries that support specific parts of any distributed system. You can certainly use any part of the stack independently or combine them in novel ways.
The IPFS ecosystem gives CIDs to content and links that content together by generating IPLD Merkle DAGs. You can discover content using a DHT that’s provided by libp2p, open a connection to any provider of that content, and download it using a multiplexed connection. All of this is held together by the middle of the stack, which is linked, unique identifiers; that’s the essential part that IPFS is built on.
IPFS and privacy
As a protocol for peer-to-peer data storage and delivery, IPFS is by design a public network: Nodes participating in the network store data affiliated with globally consistent content addresses (CIDs) and advertise that they have those CIDs available for other nodes to use through publicly viewable distributed hash tables (DHTs). This paradigm is one of IPFS’s core strengths — at its most basic, it’s essentially a globally distributed “server” of the network’s total available data, referenceable both by the content itself (those CIDs) and by the participants (the nodes) who have or want the content.
What this does mean, however, is that IPFS itself isn’t explicitly protecting knowledge about CIDs and the nodes that provide or retrieve them. This isn’t something unique to the distributed web; on both the d-web and the legacy web, traffic and other metadata can be monitored in ways that can infer a lot about a network and its users. Some key details on this are outlined below, but in short: While IPFS traffic between nodes is encrypted, the metadata those nodes publish to the DHT is public. Nodes announce a variety of information essential to the DHT’s function — including their unique node identifiers (PeerIDs) and the CIDs of data that they’re providing — and because of this, information about which nodes are retrieving and/or reproviding which CIDs is publicly available.
So why doesn’t the IPFS protocol itself explicitly have a “privacy layer” built in? This is in line with key principles of the protocol’s highly modular design — after all, different uses of IPFS over its lifetime may call for different approaches to privacy. Explicitly implementing an approach to privacy within the IPFS core could “box in” future builders due to a lack of modularity, flexibility, and future-proofing. On the other hand, freeing those building on IPFS to use the best privacy approach for the situation at hand ensures IPFS is useful to as many as possible.
If you’re worried about the implications of this for your own personal use case, it’s worth taking additional measures such as disabling reproviding, encrypting sensitive content, or even running a private IPFS network if that’s appropriate for you. More details on these are below.
While IPFS traffic between nodes is encrypted, the essential metadata that nodes publish to the DHT — including their unique node identifiers (PeerIDs) and the CIDs of data that they’re providing — is public. If you’re worried about the implications of this for your personal use case, it’s worth taking additional measures.
#What’s public on IPFS
All traffic on IPFS is public, including the contents of files themselves (unless they’re encrypted; more about this below). For purposes of understanding IPFS privacy, this may be easiest to think about in two halves: content identifiers (CIDs) and IPFS nodes themselves.
Because IPFS uses content addressing rather than the legacy web’s method of location addressing, each piece of data stored in the IPFS network gets its own unique content identifier (CID). Copies of the data associated with that CID can be stored in any number of locations worldwide on any number of participating IPFS nodes. To make retrieving the data associated with a particular CID efficient and robust, IPFS uses a distributed hash table (DHT) to keep track of what’s stored where. When you use IPFS to retrieve a particular CID, your node queries the DHT to find the closest nodes to you with that item — and by default also agrees to reprovide that CID to other nodes for a limited time, until periodic “garbage collection” clears your cache of content you haven’t used in a while. You can also “pin” CIDs that you want to make sure are never garbage-collected — either explicitly using IPFS’s low-level
pin API, or implicitly using the Mutable File System (MFS) — which also means you’re acting as a permanent reprovider of that data.
This is one of the advantages of IPFS over traditional legacy-web hosting. It means retrieving files — especially popular ones that exist on lots of nodes in the network — can be faster and more bandwidth-efficient. However, it’s important to note that those DHT queries happen in public. Because of this, it’s possible that third parties could be monitoring this traffic to determine what CIDs are being requested, when, and by whom. As IPFS continues to grow in popularity, it’s more likely that such monitoring will exist.
The other half of the equation when considering the prospect of IPFS traffic monitoring is that nodes’ unique identifiers are themselves public. Just like with CIDs, every individual IPFS node has its own public identifier (known as a PeerID), such as
While a long string of letters and numbers may not be a “Johnny Appleseed” level of human-readable specificity, your PeerID is still a long-lived, unique identifier for your node. Keep in mind that it’s possible to do a DHT lookup on your PeerID and, particularly if your node is regularly running from the same location (like your home), find your IP address. (It’s possible to reset your PeerID (opens new window)if necessary, but similarly to changing your user ID on legacy web apps and services, is likely to involve extra effort.) Additionally, longer-term monitoring of the public IPFS network could yield information about what CIDs your node is requesting and/or reproviding and when.
#Enhancing your IPFS privacy
If there are situations in which you know you’ll need to remain private but still want to use IPFS, one of the approaches outlined below may help. And don’t forget, you can always discuss privacy and get others’ input or ideas in the official IPFS forums (opens new window).
#Controlling what you share
By default, an IPFS node announces to the rest of the network that it is willing to share every CID in its cache (in other words, reproviding content that it’s retrieved from other nodes), as well as CIDs that you’ve explicitly pinned or added to MFS in order to make them consistently available. If you’d like to disable this behavior, you can do so in the reprovider settings (opens new window)of your node’s config file.
Changing your reprovider settings to “pinned” or “roots” will keep your node from announcing itself as a provider of non-pinned CIDs that are in your cache — so you can still use pinning to provide other nodes with content that you care about and want to make sure continues to be available over IPFS.
#Using a public gateway
Using a public IPFS gateway is one way to request IPFS-hosted content without revealing any information about your local node — because you aren’t using a local node! However, this method does keep you from enjoying all the benefits of being a full participant in the IPFS network.
Public IPFS gateways are primarily intended as a “bridge” between the legacy web and the distributed web; they allow ordinary web clients to request IPFS-hosted content via HTTP. That’s great for back-compatibility, but if you only request content through public gateways rather than directly over IPFS, you’re not actually part of the IPFS network; that gateway is the network participant, acting on your behalf. It’s also important to remember that gateway operators could be collecting their own private metrics, which could include tracking the IP addresses that use a gateway and correlating those with what CIDs are requested. Additionally, content requested through a gateway is visible on the public DHT, although it’s not possible to know who requested it.
If you’re familiar with Tor (opens new window)and comfortable with the command line, you may wish to try running IPFS over Tor transport (opens new window)by configuring your node’s settings.
If you’re a developer building on IPFS, it’s worth noting that the global IPFS community continues to experiment with using Tor transport — see this example from e-commerce organization OpenBazaar (opens new window)— and there may already be an open-source codebase to help your own project achieve this.
#Encrypting content transported via IPFS
If your privacy concerns are less about the potential for monitoring and more about the visibility of the IPFS-provided content itself, this can be mitigated simply by encrypting the content before adding it to the IPFS network. While traffic involving the encrypted content could still be tracked, the data represented by encrypted content’s CIDs remains unreadable by anyone without the ability to decrypt it.
There’s one caveat to keep in mind here: While today’s encryption might seem bulletproof today, it’s not guaranteed that it won’t be broken at some point in the future. Future breakthroughs in computing might allow going back and decrypting older content that’s been put on a public network such as IPFS. If you want to guard against this potential attack vector, using IPFS hybrid-private networks — in which nodes sit behind connection gates that check request ACLs before giving a node a request — is a potential design direction. (For more details, this article from Pinata (opens new window)may be helpful.)
If you’re curious about implementing encryption with IPFS on a large scale, you may enjoy reading this case study on Fleek, a fast-growing IPFS file hosting and delivery service.
#Creating a private network
Private IPFS networks (opens new window)provide full protection from public monitoring but can lack the scale benefits provided by the public IPFS network. A private network operates identically to the public one, but with one critical difference: it can only be accessed by nodes that have been given access, and it will only ever scale to those nodes. This means that the benefits of the public IPFS network’s massive scale, such as geographic resiliency and speedy retrieval of high-demand content, won’t be realized unless the private network is explicitly designed and scaled with this in mind.
Running a private network can be a great option for corporate implementations of IPFS — for one example, see this case study on Morpheus.Network — because the network’s topology can be specified and built exactly as desired.
Announcing is a function of the IPFS networking layer in libp2p, wherein a peer can tell other peers that it has data blocks available.
Bitswap is IPFS’s central block exchange protocol. Its purpose is to request blocks from and send blocks to other peers in the network. More about Bitswap(opens new window)
BitTorrent is a communication protocol for peer-to-peer file sharing, which is used to distribute data and electronic files over the Internet. Also, the first file-sharing application to use the protocol. More about BitTorrent protocol (opens new window)and BitTorrent app(opens new window)
A Blockchain is a growing list of records, known as blocks, that are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data (generally represented as a Merkle tree). More about Blockchain(opens new window)
A Block is a binary blob of data, identified by a CID.
A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network. More about Bootstrapping(opens new window)
Version 0 (v0) of the IPFS content identifier. This CID is 46 characters in length, starting with “Qm”. Uses a base 58-encoded multihash, very simple but much less flexible than newer CIDs. More about CID v0(opens new window)
Version 1 (v1) of the IPFS content identifier. This CID version contains some leading identifiers which provide for forward-compatibility. Able to support different formats for future versions of CID. More about CID v1(opens new window)
A Conflict-Free Replicated Data Type (CRDT) is a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks). More about CRDT(opens new window)
A Daemon is a computer program that typically runs in the background. The IPFS daemon is how you take your node online to the IPFS network. More about IPFS Daemon(opens new window)
A Directed Acyclic Graph (DAG) is a computer science data structure adapted for use with versioned file systems, blockchains, and for modeling many different kinds of information. More about DAG(opens new window)
The Datastore is the on-disk storage system used by an IPFS node. Configuration parameters control the location, size, construction, and operation of the datastore. More about Datastore(opens new window)
A Distributed Hash Table (DHT) is a distributed key-value store where keys are cryptographic hashes. In IPFS, each peer is responsible for a subset of the IPFS DHT. More about DHT(opens new window)
DNSLink is a protocol to link content and services directly from DNS. A DNSLink address looks like an IPNS address, but it uses a domain name in place of a hashed public key, like /ipns/mydomain.org. More about DNSLink(opens new window)
The Decentralized Web (DWeb) looks like today’s World Wide Web, but it is built with new underlying technologies that support decentralization. It is much harder for any single entity (like a government or terrorist group) to take down any single webpage, website, or service, either by accident or on purpose.
The Filestore is a data store that stores the UnixFS data components of blocks as files on the file system instead of as blocks. This allows adding content to IPFS without duplicating the content in the IPFS datastore.
An IPFS Gateway acts as a bridge between traditional web browsers and IPFS. Through the gateway, users can browse files and websites stored in IPFS as if they were stored on a traditional web server. More about Gateway(opens new window)
Garbage Collection (GC) is the process within each IPFS node of clearing out cached files and blocks. Nodes need to clear out previously cached resources to make room for new resources. Pinned resources are never deleted.
In computer science, a Graph is an abstract data type from the field of graph theory within mathematics. The Merkle-DAG used in IPFS is a specialized graph.
Graphsync is an alternative content replication protocol under discussion, similar to Bitswap. Like Bitswap, the primary job is to synchronize data blocks across peers. More about Graphsync(opens new window)
A Cryptographic Hash is a function that takes some arbitrary input (content) and returns a fixed-length value. The exact same input data will always generate the same hash as output. There are numerous hash algorithms. More about Hash(opens new window)
Information Space is the set of concepts, and relations among them, held by an information system. This can be thought of as a conceptual framework or tool for studying how knowledge and information are codified, abstracted, and diffused through a social system. More about Information Space(opens new window)
The InterPlanetary Linked Data (IPLD) model is a set of specifications in support of decentralized data structures for the content-addressable web. Key features are interoperable protocols, easily upgradeable, backward compatible. A single namespace for all hash-based protocols. More about IPLD(opens new window)
The InterPlanetary Name System (IPNS) is a system for creating and updating mutable links to IPFS content. IPNS allows for publishing the latest version of any IPFS content, even though the underlying IPFS hash has changed. More about IPNS(opens new window)
The libp2p project is a modular system of protocols, specifications, and libraries that enable the development of peer-to-peer network applications. It is an essential component of IPFS. More about libp2p(opens new window)
The Merkle-DAG is a computer science data structure used at the core of IPFS files/block storage. Merkle-DAGs create a hash to their content, known as a Content Identifier. More about Merkle-DAG(opens new window)
Merkle Forest is a phrase coined to describe the distributed, authenticated, hash-linked data structures (Merkle trees) running technologies like Bitcoin, Ethereum, git, and BitTorrent. In this way, IPFS is a forest of linked Merkle trees. More about Merkle Forest(opens new window)
A Merkle Tree is a specific type of hash tree used in cryptography and computer science, allowing efficient and secure verification of the contents of large data structures. Named after Ralph Merkle, who patented it in 1979. More about Merkle Tree(opens new window)
The Mutable File System (MFS) is a tool built into IPFS that lets you treat files like a normal name-based filesystem. You may add, edit, and remove MFS files while all link updates and hashes are taken care of for you. More about MFS(opens new window)
The Multiformats project is a collection of protocols that aim to future-proof systems today. A key element is enhancing format values with self-description. This allows for interoperability, protocol agility, and promotes extensibility. More about Multiformats (opens new window)and Multihash(opens new window)
A Path/Address is the method within IPFS of referencing content on the web. Addresses for content are path-like; they are components separated by slashes. More about Path/Address(opens new window)
In system architecture, a Peer is an equal player in the peer-to-peer model of decentralization, as opposed to the client-server model of centralization. See also Peer as Node
A Peer ID is how each unique IPFS node is identified on the network. The Peer ID is created when the IPFS node is initialized and is essentially a cryptographic hash of the node’s public key. More about Peer ID(opens new window)
Pinning is the method of telling an IPFS node that particular data is important and so it will never be removed from that node’s cache. More about Pinning(opens new window)
Publish-subscribe (Pubsub) is an experimental feature in IPFS. Publishers send messages classified by topic or content, and subscribers receive only the messages they are interested in. More about Pubsub(opens new window)
The Relay is a means to establish connectivity between libp2p nodes (e.g., IPFS nodes) that wouldn’t otherwise be able to establish a direct connection to each other. This may be due to nodes that are behind NAT, reverse proxies, firewalls, etc. More about Relay(opens new window)
The Repository (Repo) is a directory where IPFS stores all its settings and internal data. It is created with the
ipfs init command. More about Repo(opens new window)
A Self-certifying File System (SFS) is a distributed file system that doesn’t require special permissions for data exchange. It is self-certifying because data served to a client is authenticated by the file name (which is signed by the server). More about SFS(opens new window)
The signing of data cryptographically allows for trusting of data from untrusted sources. Cryptographically signed values can be passed through an untrusted channel, and any tampering of the data can be detected. More about Digital signature(opens new window)
The Swarm is a term for the network of IPFS peers with which your local node has connections. Swarm addresses are addresses that your local node will listen on for connections from other IPFS peers. More about Swarm addresses(opens new window)
In libp2p, transport refers to the technology that lets us move data from one machine to another. This may be a TCP network, a WebSocket connection in a browser, or anything else capable of implementing the transport interface.
The Unix File System (UnixFS) is the data format used to represent files and all their links and metadata in IPFS and is loosely based on how files work in Unix. Adding a file to IPFS creates a block (or a tree of blocks) in the UnixFS format. More about UnixFS(opens new window)