Predicting and monitoring payment volumes with Spark and ElasticSearch
As a fintech company processing sensitive payment data, Adyen is risk averse by nature. We tend to wait until technology has an optimal combination of maturity and functionality rather than rush into using it. Security takes precedence over early adoption.
NodeJS (Node), though steeped in controversy, has proven itself as an essential foundation block of the modern web. In this post we will look at why we adopted Node, some of the security issues around it, and how we built an automated dependency verification tool to secure Node at Adyen.
Node has been around now for about ten years, and is core to modern front-end development. Node’s package manager (NPM) has more than 350,000 libraries available, making it twice as big as the next largest package repository.
NPM helps programmers share their Node libraries and simplifies dependencies with an API, and enables developers to compile and bundle front-end assets independent of their back-end stack. In addition to these reasons, using Node provided us with two further advantages:
2. Our current goal in front-end is to separate the front-end and back-end. Decoupling avoids rendering a page with every request, reducing server resources. It allows us to develop interfaces and services in parallel which enables quick prototyping of new features.
Node is critical for our move to a modern, mainstream, front-end framework like VueJS.
However, for all its advantages, few developer tools attract as much controversy as Node. Developers discuss — and even fume at each other — about the benefits and drawbacks of this powerful technology. What is it in Node that creates such vitriol, and why does it seem that we can’t discuss it without sparking a flame war?
Complaints about Node vary from criticism of how it is written, such as error handling, to the way code is executed, such as CPU inefficiency. While these topics are debatable, most developers will agree that the Node Package Manager (NPM) has high potential to be a security risk.
“Unfortunately in Node we just bound to everything, and there’s zero security. You run a node program and you have access to all kinds of system calls.”
On a minor level, there is no enforced uniformity in the package.json. There are placeholders for license, a link to source code, and so on, but lack of enforcement makes it very difficult to automate validation. Another issue is that the source code is not visible to the user from the NPM website. Again, there is a placeholder in package.json to link to the source code, but that code does not have to match the code hosted by Node.
NPM also has a feature called “lifecycle scripts” named preinstall and postinstall. These are scripts that run before and after package installation. They handle installation of prerequisites and cleaning up any mess left behind. These scripts have the power to invoke the shell, and could potentially install anything.
To complicate things further, these concerns are multiplied by the number of the package’s dependencies. The top-level package might be fine, but what about the dependencies of that package or the dependencies of those dependencies? Nested dependencies containing nested dependencies are like sinister Matryoshka dolls. Any of these might contain a nasty surprise.
NPM: Nested dependencies like sinister Matryoshka dolls 👻
The original left-padleft-pad package is a great example of a package that seems safe, but could be malicious. Left-pad is a utility package that pads out the left-hand side of strings; an unremarkable dependency used by thousands of projects. It was unpublished by its author and, as a result, broke thousands of projects relying on it. As a fix, NPM updated their policy on unpublishing packages, and took measures to prevent squatting on package names. Still, this edge case highlights the problems with NPM. No one really knows what happens deep in the dependency tree.
No one really knows what happens deep in the dependency tree.
Our security team approves every piece of software before allowing its use. When any of the approved software updates, they must approve it again. Vetting every dependency in the tree is impossible, but we had to take security measures to protect Adyen. So we looked for a tool that could report on a library’s security weaknesses.
Automated dependency verification became a prerequisite for using Node at Adyen. At first, research led to some other tools available on the market which made bold claims about being able to “run all code safely.” Their formulas were not very clear, and they were unknown entities with smaller repositories. Most importantly, they were not open-source solutions, and we wanted a tool where we have control over the code.
It became clear that securing Node at Adyen would be a DIY project. We assembled a small team and began building Skantek, a private node package and supporting infrastructure.
Skantek is a Node program to scan for suspect packages, using metrics such as:
Based on these factors, Skantek determines the risk level of a package, and whether a manual review is necessary. If a package is approved, it adds it to a private NPM registry.
Skantek fetches the package metadata from NPM and uses a library to resolve all dependencies in the tree. It traverses through the tree and scans each package, assigning a risk score. If these all pass, it scans the parent package. It then retrieves all associated packages and publishes these to the private registry.
How Skantek works 🛠️
The risk score increases whenever Skantek finds an irregularity in a package’s metadata. This works similarly to how we perform risk checks on payments. Rather than one piece of information, we use a combination of data points to determine fraud. A threshold is defined, and when enough attributes combine to push the risk score over that threshold, that package is flagged as risky.
Skantek rescans packages on a regular basis and includes a delay for updating our internal NPM registry with updates. This makes zero-day exploits much more unlikely.
If a developer finds a new package that they think is useful, we recommend that they research the package. They can invoke Skantek to get a full report of the package in question. Once they understand the level of risk, we encourage them to speak with other developers. This helps determine if we have already published similar packages, or if other members of the team need the package. They then submit their findings to our NPM approval team (currently being formed). If the team approves the package, Skantek publishes the library to Adyen’s private repo.
By conducting this research, we gained a clearer picture of the risk involved in using NPM. We wanted to be proactive and realistic about the dangers of Node and take steps to secure it, instead of waiting until something goes wrong. Skantek alerts developers to the problem, and solves it at the same time.
However, analyzing every dependency is a dirty job. Some dependency trees are massive, and we currently traverse them three times, which can be taxing on performance. There are plans to refactor Skantek so that it touches each dependency only once, performing all checks in one go. It would be much more efficient, and would make Skantek easier to maintain and support.
Many developers do not list their license. Being unable to find the license programmatically is a more common issue than rejecting a requested package. Skantek simplifies checking licenses, but the lack of uniformity in the NPM structure means that we perform manual checks. For example, there are a lot of authors who have put their license in the readme file, but not in the package.json license property.
Dependency trees are really deep. The easiest way to mitigate the risk of NPM dependencies is to use fewer libraries. When selecting a new library to use in a project, we encourage our developers to choose the package with the fewest dependencies. Fewer dependencies makes it easier to review the code and ensure there’s nothing malicious in there.
Stepping into modern front-end architecture is already changing how we develop software at Adyen. It helps us be more flexible, up-to-date, and to attract top talent. After we decouple the front and back end, we can rapidly prototype new interfaces and revolutionize our users’ experience.
At this stage there is a main project functioning as the pilot for Node. Our security team approved the first version of Skantek and we published the initial version of our internal registry. Developers can now work with a local instance of a Node runtime, and we are ready to deploy our first NodeJS powered interfaces.
The roadmap for securing Node at Adyen includes consuming Skantek in a cron job on a regular basis. It will allow us to scan and rescan, checking for differences between results, and acting when necessary. After we put a logging utility in place, Security will have access to visualizations that allow speedy decision making. At a glance, it will be clear which licenses have changed and if the tool discovered any new vulnerabilities.
As a part of this, we are growing our NPM Approval Team. This team will administer and support Skantek. They will approve and publish packages to our internal registry and develop protocols to deal with vulnerabilities. It is not always an option to remove a problematic library, so the NPM Approval Team will work with security to deal with it. They will decide whether to fork and patch, do a pull request, and so on. All of this will heighten our developers’ awareness of package security and raise our development standards.
In the immediate future, we will optimize Skantek dependency tree recursion and add robust testing and logging. The tool should also be able to scan more than one package at a time (a goal that is more realistic once recursion optimization is live). Skantek should also be able to find dangerous code itself. Currently, it scans the general structure of the package for irregularities. It then executes a Snyk test to check against their database for known vulnerabilities. Future versions of Skantek should scan based on common patterns by reading the code and identifying weaknesses.
After Skantek has matured enough that it would be useful to share, we would like to make it available to the open source community. Security is an arms race. Everyday, Adyen’s security experts work to detect different types of injection, CSRF, and phishing, to name a few. Skantek should continue to evolve in the same way, adding new protections and risk detection. If we make Skantek open source and make this available to the greater community, everyone benefits from our work.
As a final note, sometimes in tech, people rush into adapting new technologies to “get things done” without considering the dangers. It’s the equivalent of a driver saying “We’re going really fast in this car, no roof, no brakes, no seatbelt, but we’re really getting places!”. Our aim is to help our developers to go far, go fast, and arrive at their destinations safely.
By submitting this form, you acknowledge that you have reviewed the terms of our Privacy Statement and consent to the use of data in accordance therewith.