Adyen for Java developers
At Adyen, data is ubiquitous. Our production systems deal with thousands upon thousands of events per second, our analytics systems are scaled to store and process petabytes of data and our machine learning models need to combine those capabilities. In a highly competitive industry, it is important that we can rely on the correctness, timeliness and usefulness of our data.
This is where data engineers come in: data engineers at Adyen are responsible for creating high-quality, scalable, reusable and insightful datasets out of large volumes of raw data. In this blog, we will take a look at what that actually means, how we go about it, and how we fit within the wider organization.
Over the last few years, we’ve seen an explosion in data related roles. As companies became aware of the impact data can have, they have shifted resources into building their data teams. Of course, as these teams grew, it became more important to clarify the different roles.
We highlighted the way we distinguish data roles in our blog post about defining data roles when scaling up data culture. If you just want a quick overview of the main roles under our data umbrella, have a look at the image below.
An overview of the data roles under our umbrella
At Adyen, we strive to give autonomy to our engineers. This has two advantages for our engineers: they can quickly make decisions within their area of expertise and can enjoy a safe environment for creativity and experimentation. Therefore, our data engineers are encouraged to choose projects based on skill, experience and interest within their team and even expand their work across teams.
At other companies, the data engineering role is sometimes called analytics engineer. However, because the job scope and focus of our data engineers is broad and can vary per team, we think data engineer is more suitable.
But what does it mean to be a data engineer at Adyen?
Unsurprisingly, data engineers spend most of their time… on engineering problems. We use Python and PySpark to define how our data needs to be processed. We validate our input and output data by using both automated testing, data quality and validation tooling.
We orchestrate pipelines using Airflow. We analyze and explore data using JupyterHub notebooks, tied to our Apache Hadoop filesystem and Hive metastore.
In a regular week, we use about 70% of our time for code development, and 5-10% to review our peers' code. The remainder is split between other responsibilities, such as working with stakeholders, exploring foreign key constraints, launching new initiatives, squashing bugs, sharing knowledge and learning from our colleagues across various roles and teams, or just connecting with fellow Adyeners over a cup of our barista coffee!
What you might also find interesting is that we do not write ‘raw’ SQL in our day to day job. All data transformations are performed with PySpark to keep things scalable.
Of course, a good foundation of SQL is still crucial for our day to day work since PySpark allows us to apply SQL-like analysis using a Python interface.
Moreover, we are also not really responsible for the functioning of the platform: this means we do not have to ensure there are enough airflow workers, or that we need to ingest raw data from event streams. This is done by our data platform engineers, and we trust them to do a good job 👍
Still not clear how data engineers fit within our data landscape? Have a look at the flowchart below.
The differences between data roles - now in flowchart format!
Data engineers are the connection between the raw data and resulting products and insights.
Our motto is: engineered for high quality data. That means that we are responsible for ensuring our downstream users can rely on the datasets we build for them.
Our data has to be validated, tested, and monitored from the time we ingest, to the point at which our analysts are creating insights from the data.
As you can imagine, a close collaboration with all stakeholders is key to achieving reliable datasets that can confidently be used to make meaningful business decisions. Therefore, our data teams are generally oriented by domains and are cross-functional.
A team might consist of data analysts, data engineers and data scientists, headed by a technical team lead and product manager, who all work together to develop a product.
In terms of time spent working together, the closest collaboration happens within the team, followed by other stakeholders, suppliers and domain experts of the data in related teams.
While engineers always have the possibility to work directly with stakeholders, the product vision is generally set by the product manager. Data engineers are encouraged to understand this vision well enough to make their own decisions where possible.
Within the team, we could collaborate on the following themes:
We extend our collaboration beyond our team in the following cases:
As you can see from the examples above, we collaborate with a wide variety of roles. Our position between suppliers of the raw data and stakeholders of the transformed data is unique and requires a good understanding of both worlds. If you would like to read more about how we create industry-leading services together, have a look at the Adyen Formula.
In this blogpost, we highlighted important aspects of data engineering at Adyen. We gave you insights on how our role fits into the data landscape, what our job consists of and how we collaborate together. We hope that you have a much clearer understanding of our work after reading this post.
If you are interested in applying to the Data Engineering role, have a look at our careers page to see all open roles!
By submitting this form, you acknowledge that you have reviewed the terms of our Privacy Statement and consent to the use of data in accordance therewith.