documentation/blog/2024-07-02-pnpm-post.md
This article was last updated on July 02, 2024, to add new advantages and hard links sections for pnpm
A package manager is responsible for installing, updating, and removing software packages and dependencies. NPM has been widely used as the standard package manager for Javascript; however, companies are quickly adopting the pnpm package manager due to its immense benefits.
pnpm is a drop-in replacement for npm. It is built on top of npm and is much faster and more efficient than its predecessor. It is highly disk efficient and solves inherent issues in npm. In this article, we will discuss pnpm in detail. We will explain how it works and will go through why pnpm is a perfect replacement for npm or yarn.
Let’s start with discussing the issues in existing package managers.
Npm uses flattened node_modules. The flattened dependency results in many issues, such as:
node_modules folderLet’s explain this with a basic example. One of your projects needs an express module, and express has a dependency package named “debug”. This is how the structure of node modules will look like:
<div className="centered-image" ><em> Figure 1 - Example project dependencies</em>
</div>Your code can require('debug'), even if you do not depend on it explicitly in the package.json file. Let’s discuss two scenarios that can cause issues:
In both cases, your code will fail because it has an implicit dependency to debug. That is a big problem. Similarly, if you work with a large monorepo, then it will be much more difficult to track particular dependencies a project uses. Duplicate packages are another issue here. Now yarn is slightly better in terms of optimizing disk space as it makes use of hoistings; however that approach fails in some cases.
pnpm uses hard links and symlinks to maintain a semistrict node_modules structure. So what’s the difference between a hard link and a soft link? Well, a hard link is a different reference to the same file. In soft link, you create a new file, and the contents of the file point to another path. Symlinks are symbolic links that pnpm uses to create a nested structure of dependencies.
Let’s see the pnpm version of the express and node modules discussed in the previous section.
<div><em> Figure 2 - project structure with pnpm</em></div>
The above picture shows the directory structure of your project files. The first thing you will notice is that your code cannot access the “debug” package because it is not directly under the root level node_modules directory. Pnpm has created a special folder named “.pnpm” that contains hard links to all the modules.
If you look at express/4.0.0, the express module is a hard link to the global pnpm-store and a debug symbolic link to the debug hard link that also links to the global pnpm-store.
Below is the image of the actual pnpm-store, which contains the packages. This is where all the downloaded dependencies are maintained. When you download a dependency, pnpm first checks whether the dependency is present in this store or not. If it finds the dependency in the store, pnpm fetches it by creating a hard link.
<div className="centered-image" ><em> Figure 3 - Global pnpm store</em>
</div>Because of this approach, pnpm re-uses the same packages if they are already installed for another project. This brings us to the many benefits which pnpm provides. Let’s discuss some of those.
## Understand how pnpm deals with hard links and symlinks
I will try to describe pnpm's ways of using hard links and symlinks to give us a basic understanding of how our node_modules directory structure will be.
Hard Links vs. Soft Links (Symlinks)
pnpm's Approach:
node_modules a .pnpm directory that includes hard links to all of the module files, held in a global store. What this means is that even if a given module should be utilized by many projects, it will only ever take up one space on disk with pnpm.node_modules directory, it makes symlinks to those hard links. This way, each project gets its own node_modules structure, but there is no redundancy in copies of the same packages.Directory Structure
node_modules/
.pnpm/
[email protected]/node_modules/express/[email protected]/node_modules/debug/express/ -> symlink to ../.pnpm/[email protected]/node_modules/expressdebug/ -> symlink to ../.pnpm/[email protected]/node_modules/debugEfficiency:
Dependency Isolation
package.json. This way, the number of conflicts and implicit access issues is brought down to a minimum compared with npm and Yarn.Hard links and symlinks are being used for leveraging pnpm in optimal disk space and management of dependencies, making it an efficient and effective package manager. If there are any questions or further details needed from this, please do not hesitate to reach out.
pnpm provides many advantages over npm and yarn. Some of the core features include:
As shown in the previous section, pnpm uses a content-addressable file system to store the packages and dependencies on the disk. That means the same package will not be duplicated. Even with different versions of the same package, pnpm is intelligent enough to reuse the maximum code. If a package version 1 has 500 files and version 2 has just one more file, then pnpm will not write 501 files for version 2; instead, it will create a hard link to the original 500 files and write just the new file. If you compare it with npm, then version 2 will also be loaded with duplicating the original 500 files. For large monorepo projects, it can make a big difference. Image a scenario where a package is needed by hundreds of other packages that will be killing your disk space unless you use pnpm.
The speed of package installation with pnpm is significantly better than npm and yarn. If you look at the below benchmark tests, you can see that pnpm performs better in most cases thano npm and yarn.
<div className="centered-image" ><em> source - https://pnpm.io/benchmarks</em>
</div>Pnpm, like yarn, has a special file with the checksum of all the installed packages. This ensures the integrity of all the installed packages before their code is executed.
In terms of unprivileged access, pnpm also outperforms npm and yarn. In the case of npm and yarn, If package A depends on package B, and B depends on C, then A implicitly gets access to C even though A has not declared C as its dependency. This problem is intensified in a large monorepo setup. Pnpm, on the other hand, uses a different dependency resolution algorithm and different folder structure of node_modules that prevents illegal access to packages.
Note that pnpm has excellent support for monorepo and offline mode.
Pnpm has a cool CLI. Here are some of the basic commands:
pnpm init: Create a package.json filepnpm install: Download all the packages listed as dependencies in package.jsonpnpm add [module_name]@[version]: Download a particular version of a package and add to the list of dependencies in package.jsonpnpm remove [module_name]: Uninstall and remove a package from the list of dependencies in package.jsonpnpm list: Print a tree of locally installed modulesIf your projects use npm or yarn, then migrating to pnpm will not be very difficult. Here is a comparison of commands between npm, yarn, and pnpm.
| npm command | Yarn command | pnpm equivalent |
|---|---|---|
| npm install | yarn | pnpm install |
| npm install [pkg] | yarn add [pkg] | pnpm add [pkg] |
| npm uninstall [pkg] | yarn remove [pkg] | pnpm remove [pkg] |
| npm update | yarn upgrade | pnpm update |
| npm list | yarn list | pnpm list |
| npm run [scriptName] | yarn [scriptName] | pnpm [scriptName] |
| npx [command] | yarn dlx [command] | pnpm dlx [command] |
| npm exec | yarn exec [commandName] | pnpm exec [commandName] |
| npm init [initializer] | yarn create [initializer] | pnpm create [initializer] |
I would like to share some advanced features of pnpm with you in the hope that we can make our development workflow brilliant. The following are some of the most essential features that demonstrate why pnpm is powerful and efficient:
pnpm Workspaces Introduction Workspaces: pnpm workspaces help one manage multiple packages under a single repository (monorepo). This becomes very useful when working on projects with interdependent packages or microservices.
pnpm-workspace.yaml file a packages field, listing monorepo paths to the packages.packages:
- "packages/*"
- "tools/*"
Workspace Benefits:
Use pnpm Offline: Offline Mode: pnpm can work entirely in offline mode with the help of its global store. pnpm caches every downloaded package, which makes it possible to install packages without having an Internet connection after the first download.
CI/CD Pipelines Benefits:
NodeLinker Mode:
Overview: pnpm provides node-linker mode, which can be set to one of two modes—hoisted or isolated. This is to say, developers will be given control over how pnpm is supposed to link packages.
node-linker in .npmrc.node-linker=hoisted
Advantages:
Hooks:
package.json under the scripts section.{
"scripts": {
"preinstall": "echo 'Before install'",
"postinstall": "echo 'After install'"
}
}
Advantages:
These advanced features will allow pnpm to increase workflow efficiency, decrease build times, and manage dependencies more reliably. Do not hesitate to contact us with questions or more detailed instructions on implementing these features.
Pnpm is “performant” version of npm, hence the name pnpm. This article showed you the issues with existing package managers like npm and yarn. pnpm has brought many improvements, built on top of existing npm features. pnpm has adopted all the good things of npm while removing its weaknesses, making pnpm the best of both worlds.
As shown in the above benchmark results, pnpm has overall performed much better than npm and yarn. No wonder giant tech companies like Vue3, Prism, and Microsoft are quickly adopting pnpm.