Skip to content

Commit

Permalink
Document module loading
Browse files Browse the repository at this point in the history
  • Loading branch information
isaacs authored and ry committed Feb 10, 2011
1 parent 35e3222 commit ff39d41
Showing 1 changed file with 260 additions and 53 deletions.
313 changes: 260 additions & 53 deletions doc/api/modules.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@ one-to-one correspondence. As an example, `foo.js` loads the module

The contents of `foo.js`:

var circle = require('./circle');
var circle = require('./circle.js');
console.log( 'The area of a circle of radius 4 is '
+ circle.area(4));

The contents of `circle.js`:

var PI = 3.14;
var PI = Math.PI;

exports.area = function (r) {
return PI * r * r;
Expand All @@ -39,78 +39,285 @@ The contents of `circle.js`:

The module `circle.js` has exported the functions `area()` and
`circumference()`. To export an object, add to the special `exports`
object. (Alternatively, one can use `this` instead of `exports`.) Variables
object.

Variables
local to the module will be private. In this example the variable `PI` is
private to `circle.js`. The function `puts()` comes from the module `'util'`,
which is a built-in module. Modules which are not prefixed by `'./'` are
built-in modules--more about this later.
private to `circle.js`.

### Core Modules

Node has several modules compiled into the binary. These modules are
described in greater detail elsewhere in this documentation.

The core modules are defined in node's source in the `lib/` folder.

Core modules are always preferentially loaded if their identifier is
passed to `require()`. For instance, `require('http')` will always
return the built in HTTP module, even if there is a file by that name.

### Module Resolving
### File Modules

If the exact filename is not found, then node will attempt to load the
required filename with the added extension of `.js`, and then `.node`.

`.js` files are interpreted as JavaScript text files, and `.node` files
are interpreted as compiled addon modules loaded with `dlopen`.

A module prefixed with `'/'` is an absolute path to the file. For
example, `require('/home/marco/foo.js')` will load the file at
`/home/marco/foo.js`.

A module prefixed with `'./'` is relative to the file calling `require()`.
That is, `circle.js` must be in the same directory as `foo.js` for
`require('./circle')` to find it.

Without the leading `'./'`, like `require('assert')` the module is searched
for in the `require.paths` array. `require.paths` on my system looks like
this:
Without a leading '/' or './' to indicate a file, the module is either a
"core module" or is loaded from a `node_modules` folder.

### Loading from `node_modules` Folders

If the module identifier passed to `require()` is not a native module,
and does not begin with `'/'`, `'../'`, or `'./'`, then node starts at the
parent directory of the current module, and adds `/node_modules`, and
attempts to load the module from that location.

`[ '/home/ryan/.node_modules' ]`
If it is not found there, then it moves to the parent directory, and so
on, until either the module is found, or the root of the tree is
reached.

That is, when `require('foo')` is called Node looks for:
For example, if the file at `'/home/ry/projects/foo.js'` called
`require('bar.js')`, then node would look in the following locations, in
this order:

* 1: `/home/ryan/.node_modules/foo`
* 2: `/home/ryan/.node_modules/foo.js`
* 3: `/home/ryan/.node_modules/foo.node`
* 4: `/home/ryan/.node_modules/foo/index.js`
* 5: `/home/ryan/.node_modules/foo/index.node`
* `/home/ry/projects/node_modules/bar.js`
* `/home/ry/node_modules/bar.js`
* `/home/node_modules/bar.js`
* `/node_modules/bar.js`

interrupting once a file is found. Files ending in `'.node'` are binary Addon
Modules; see 'Addons' below. `'index.js'` allows one to package a module as
a directory.
This allows programs to localize their dependencies, so that they do not
clash.

Additionally, a `package.json` file may be used to treat a folder as a
module, if it specifies a `'main'` field. For example, if the file at
`./foo/bar/package.json` contained this data:
#### Optimizations to the `node_modules` Lookup Process

{ "name" : "bar",
"version" : "1.2.3",
"main" : "./lib/bar.js" }
When there are many levels of nested dependencies, it is possible for
these file trees to get fairly long. The following optimizations are thus
made to the process.

then `require('./foo/bar')` would load the file at
`'./foo/bar/lib/bar.js'`. This allows package authors to specify an
entry point to their module, while structuring their package how it
suits them.
First, `/node_modules` is never appended to a folder already ending in
`/node_modules`.

Any folders named `"node_modules"` that exist in the current module path
will also be appended to the effective require path. This allows for
bundling libraries and other dependencies in a 'node_modules' folder at
the root of a program.
Second, if the file calling `require()` is already inside a `node_modules`
heirarchy, then the top-most `node_modules` folder is treated as the
root of the search tree.

To avoid overly long lookup paths in the case of nested packages,
the following 2 optimizations are made:
For example, if the file at
`'/home/ry/projects/foo/node_modules/bar/node_modules/baz/quux.js'`
called `require('asdf.js')`, then node would search the following
locations:

1. If the module calling `require()` is already within a `node_modules`
folder, then the lookup will not go above the top-most `node_modules`
directory.
2. Node will not append `node_modules` to a path already ending in
`node_modules`.
* `/home/ry/projects/foo/node_modules/bar/node_modules/baz/node_modules/asdf.js`
* `/home/ry/projects/foo/node_modules/bar/node_modules/asdf.js`
* `/home/ry/projects/foo/node_modules/asdf.js`

So, for example, if the file at
`/usr/lib/node_modules/foo/node_modules/bar.js` were to do
`require('baz')`, then the following places would be searched for a
`baz` module, in this order:
### Folders as Modules

* 1: `/usr/lib/node_modules/foo/node_modules`
* 2: `/usr/lib/node_modules`
It is convenient to organize programs and libraries into self-contained
directories, and then provide a single entry point to that library.
There are three ways in which a folder may be passed to `require()` as
an argument.

`require.paths` can be modified at runtime by simply unshifting new
paths onto it, or at startup with the `NODE_PATH` environmental
variable (which should be a list of paths, colon separated).
The first is to create a `package.json` file in the root of the folder,
which specifies a `main` module. An example package.json file might
look like this:

The second time `require('foo')` is called, it is not loaded again from
disk. It looks in the `require.cache` object to see if it has been loaded
before.
{ "name" : "some-library",
"main" : "./lib/some-library.js" }

If this was in a folder at `./some-library`, then
`require('./some-library')` would attempt to load
`./some-library/lib/some-library.js`.

This is the extent of Node's awareness of package.json files.

If there is no package.json file present in the directory, then node
will attempt to load an `index.js` or `index.node` file out of that
directory. For example, if there was no package.json file in the above
example, then `require('./some-library')` would attempt to load:

* `./some-library/index.js`
* `./some-library/index.node`

### Caching

Modules are cached after the first time they are loaded. This means
(among other things) that every call to `require('foo')` will get
exactly the same object returned, if it would resolve to the same file.

### All Together...

To get the exact filename that will be loaded when `require()` is called, use
the `require.resolve()` function.

Putting together all of the above, here is the high-level algorithm
in pseudocode of what require.resolve does:

require(X)
1. If X is a core module,
a. return the core module
b. STOP
2. If X begins with `./` or `/`,
a. LOAD_AS_FILE(Y + X)
b. LOAD_AS_DIRECTORY(Y + X)
3. LOAD_NODE_MODULES(X, dirname(Y))
4. THROW "not found"

LOAD_AS_FILE(X)
1. If X is a file, load X as JavaScript text. STOP
2. If X.js is a file, load X.js as JavaScript text. STOP
3. If X.node is a file, load X.node as binary addon. STOP

LOAD_AS_DIRECTORY(X)
1. If X/package.json is a file,
a. Parse X/package.json, and look for "main" field.
b. let M = X + (json main field)
c. LOAD_AS_FILE(M)
2. LOAD_AS_FILE(X/index)

LOAD_NODE_MODULES(X, START)
1. let DIRS=NODE_MODULES_PATHS(START)
2. for each DIR in DIRS:
a. LOAD_AS_FILE(DIR/X)
b. LOAD_AS_DIRECTORY(DIR/X)

NODE_MODULES_PATHS(START)
1. let PARTS = path split(START)
2. let ROOT = index of first instance of "node_modules" in PARTS, or 0
3. let I = count of PARTS - 1
4. let DIRS = []
5. while I > ROOT,
a. if PARTS[I] = "node_modules" CONTINUE
c. DIR = path join(PARTS[0 .. I] + "node_modules")
b. DIRS = DIRS + DIR
6. return DIRS

### Loading from the `require.paths` Folders

In node, `require.paths` is an array of strings that represent paths to
be searched for modules when they are not prefixed with `'/'`, `'./'`, or
`'../'`. For example, if require.paths were set to:

[ '/home/micheil/.node_modules',
'/usr/local/lib/node_modules' ]

Then calling `require('bar/baz.js')` would search the following
locations:

* 1: `'/home/micheil/.node_modules/bar/baz.js'`
* 2: `'/usr/local/lib/node_modules/bar/baz.js'`

The `require.paths` array can be mutated at run time to alter this
behavior.

It is set initially from the `NODE_PATH` environment variable, which is
a colon-delimited list of absolute paths. In the previous example,
the `NODE_PATH` environment variable might have been set to:

/home/micheil/.node_modules:/usr/local/lib/node_modules

#### **Note:** Please Avoid Modifying `require.paths`

For compatibility reasons, `require.paths` is still given first priority
in the module lookup process. However, it may disappear in a future
release.

While it seemed like a good idea at the time, and enabled a lot of
useful experimentation, in practice a mutable `require.paths` list is
often a troublesome source of confusion and headaches.

##### Setting `require.paths` to some other value does nothing.

This does not do what one might expect:

require.paths = [ '/usr/lib/node' ];

All that does is lose the reference to the *actual* node module lookup
paths, and create a new reference to some other thing that isn't used
for anything.

##### Putting relative paths in `require.paths` is... weird.

If you do this:

require.paths.push('./lib');

then it does *not* add the full resolved path to where `./lib`
is on the filesystem. Instead, it literally adds `'./lib'`,
meaning that if you do `require('y.js')` in `/a/b/x.js`, then it'll look
in `/a/b/lib/y.js`. If you then did `require('y.js')` in
`/l/m/n/o/p.js`, then it'd look in `/l/m/n/o/p/lib/y.js`.

In practice, people have used this as an ad hoc way to bundle
dependencies, but this technique is brittle.

##### Zero Isolation

There is (by regrettable design), only one `require.paths` array used by
all modules.

As a result, if one node program comes to rely on this behavior, it may
permanently and subtly alter the behavior of all other node programs in
the same process. As the application stack grows, we tend to assemble
functionality, and it is a problem with those parts interact in ways
that are difficult to predict.

## Addenda: Package Manager Tips

If you were to build a package manager, the tools above provide you with
all you need to very elegantly set up modules in a folder structure such
that they get the required dependencies and do not conflict with one
another.

Let's say that we wanted to have the folder at
`/usr/lib/<some-program>/<some-version>` hold the contents of a specific
version of a package.

Packages can depend on one another. So, in order to install
package `foo`, you may have to install a specific version of package `bar`.
The `bar` package may itself have dependencies, and in some cases, these
dependencies may even collide or form cycles.

Since Node looks up the `realpath` of any modules it loads, and then
looks for their dependencies in the `node_modules` folders as described
above, this situation is very simple to resolve with the following
architecture:

* `/usr/lib/foo/1.2.3/` - Contents of the `foo` package, version 1.2.3.
* `/usr/lib/bar/4.3.2/` - Contents of the `bar` package that `foo`
depends on.
* `/usr/lib/foo/1.2.3/node_modules/bar` - Symbolic link to
`/usr/lib/bar/4.3.2/`.
* `/usr/lib/bar/4.3.2/node_modules/*` - Symbolic links to the packages
that `bar` depends on.

Thus, even if a cycle is encountered, or if there are dependency
conflicts, every module will be able to get a version of its dependency
that it can use.

When the code in the `foo` package does `require('bar')`, it will get
the version that is symlinked into
`/usr/lib/foo/1.2.3/node_modules/bar`. Then, when the code in the `bar`
package calls `require('quux')`, it'll get the version that is symlinked
into `/usr/lib/bar/4.3.2/node_modules/quux`.

Furthermore, to make the module lookup process even more optimal, rather
than putting packages directly in `/usr/lib`, we could put them in
`/usr/lib/node_modules/<name>/<version>`. Then node will not bother
looking for missing dependencies in `/usr/node_modules` or
`/node_modules`.

In order to make modules available to the node repl, it might be useful
to also add the `/usr/lib/node_modules` folder to the `NODE_PATH`
environment variable. Since the module lookups using `node_modules`
folders are all relative, and based on the real path of the files
making the calls to `require()`, the packages themselves can be anywhere.

0 comments on commit ff39d41

Please sign in to comment.