Pitfalls in Large-Scale Application Development on Node.js

This post is also available in: Russian

It has become clear that Node.js has become a striking trend in the community of Web developers. In this post, we are going to discuss, what pitfalls may encroach on the developer in Node.js, and whether it is worth switching to it.

Basically, developers associate Node.js with small services, e.g., chats. (For more details, read "Development of high-performance services for Node.js"). Well, they have a good reason for that. Node.js is a perfect fit for services that support a permanent connection to the client or can wait for a response for a long time. Absence of I/O lock allows you to read large files, keeping your service ready to process new requests.

So, that’s why the community is so fond of Node.js:

• The Node can maintain thousands of open connections with clients, while constantly saving the data to the database.
• The event-based model works great to make cascaded asynchronous requests to other parts of the application.
• A large and responsive community combined with a fast speed of the technology development.
• Dozens of modules supported.
• JSON is the native format of object description. Moreover, if you use MongoDB, you may virtually forget of the serialization issue.

By the way, Node.js also allows to quickly implement highly loaded REST services. This is greatly simplified by such modules as express.

However, there is a reverse side of the coin.

Unlockable Flow?

There is an erroneous opinion that node.js can not be locked. From the architectural viewpoint, Node is built to run everything in a single flow. It means that, heavy math can easily hang the application for a while.

So, how to deal with it?  There are several approaches to solving this problem. All of them leverage the idea of taking heavy computations off the control flow. Here they are:

  1. Break the code down into iterations and run them using setTimeout with a delay of 0 ms. Please keep in mind that 0 here in no way means no delay at all. It means that the node will start executing the code as soon as possible. Of course, you will have to store intermediate results somewhere. Also, all this would lead to extended time needed to execute such computations.
  2. Delegate execution to workers. Essentially, this approach will generate additional system processes to run your code.

File read/write and collisions

If you exchange the contents of large files between workers, you should be aware of specifics of accessing them. When opening a file, the Node will fetch its description from the system. Then, the description will be passed to workers.

Also, often you need to notify workers or other Node’s instances of making file manipulations. Usually you need to lock a file from other processes to prevent parallel write to it. For such purposes, a separate controller node is written to grant read/write permissions to other nodes and organize the read/write waiting queue.

Please note that this approach is applicable not just to file manipulations, but to any task involving collision control.

Garbage Collector Execution Lock

As you know, Node.js is built around the V8 engine. Its specifics is that when you start GC, the node stops until GC terminates. The delay time depends on the number of objects in the heap. "Running" applications performance may somewhat degrade due to the garbage collector. To display GC results, run the node as follows:

node <script> –trace-gc

In one approach, the application is split into several parts to reduce the number of objects inside a single system module. Then running the garbage collector in each of the modules will take less time. However the idea is good, you should remember that the code of asynchronous modules should be concerted, engendering complex constructions.

So, careful and mindful creation of objects should be your main paradigm in writing services in Node.js.

Pitfalls in Debugging

Debugging of the node is not an easy task. Of course, you can just run the node in the debug mode. However the debugging experience here is quite poor. It is much easier to use a third-party application, e.g.:

1) node-inspector

This product allows you to remotely debug the node in the WebInspector interface, edit the runtime code and even profile your application.

Below is a short video on debugging the node using node-inspector

2) PHPStorm/WebStorm Node.js Debugger

These editors have built-in features for local and remote node debugging in their native environment.

The event model built into Node.js can easily confuse the developer. The output of the stack consisting of a sequence of invocations of callback functions, is usually not informative. However, you can improve readability of the output, if you make it a rule to give names to all your callbacks. For example:

1
setTimeout (function do () {/ * Some code here * /});

Now at stack output, you will not get lots of messages relating to unnamed function calls.

Callback function cascading

Complex Node.js based applications imply sequential call of many callbacks. Quite often, the following structures can be seen in the code:

1
do(function () { ... function() { ... function() { ... } } });

It is extremely difficult to read and maintain such code. To structurize your code in such situations, you can use special patterns. For more details about them, read the post "Node.js Control flow".

API variations

The Node and its modules are rapidly changing. So there are risks that the newer versions of Node.js and its modules could break the whole application. It means that you have to be prepared, while migrating to a newer API, create new implementations of entire service blocks.

For this reason, we recommend to avoid installing any module globally. The situation is quite possible, as two different nodes on the same machine may use two incompatible versions of the same module.

npm install <module>

Examples of successful use of Node.js

On GitHub, there is a list of companies that use Node.js in their services. You can access it by clicking here.

Let’s take a look at some of them:

  • Based on Node.js, Yahoo has implemented the Mojito framework; it allows you to write both client-side and server-side applications, depending on which device they run.
  • Yammer uses a Node.js based service functioning as a cross-domain proxy for requests sent to the API. The main benefit outlined by Yammer’s developers is the node’s capability to serve many simultaneous requests.
  • Bocoup has been written on the IRC_bot’s node. They have chosen this technology based on the server-side JavaScript paradigm (Bocoup has lots of JS developers)

Summary

It is only for you to decide whether to use Node.js or not, based on the nature of your particular task. However, you should understand that Node.js has its specific strengths and weaknesses.

Helpful links

Leave a Reply