Airflow at Adyen: Adoption as ETL/ML Orchestrator
Many businesses incorporate or are built around subscription products, such as Spotify or Apple Music. A common challenge for these businesses is ensuring a successful charge every month. On a small scale, some amount of failed repeating payments is unavoidable, whether due to fraud, an expired card, or even insufficient funds. However, if enough payments suffer this same fate, revenues can falter. This is obviously undesirable, and has clear implications for customer churn and, in extreme cases, the long term viability of a merchant’s business model.
About a year ago we asked ourselves: “What more can we do to help our merchants who face this problem? Our aim was to investigate whether we could build a product that could, using intelligent logic based on the huge amount of payment data in our system, give companies the highest chance of recovering failed recurring payments. In a previous blog you might have heard of RevenueAccelerate; the product described here has now gone on to become one of RevenueAccelerate’s main features.
The most common approach to ensuring a successful charge for monthly payments is to retry the payment a certain number of times within a fixed amount of time. Let’s consider an example. Mary’s Wine Club charges a monthly fee in exchange for a selection of premium wines that their customers can try each month. On the 25th of the month, one of their customers has a failed subscription payment. Rather than having to pick up the phone or send an email to the customer, Mary’s Wine Club instead decides to try rescheduling the transaction, and decides to retry 7 times over the next 28 days. If they fail after that period, then they reach out to the customer directly and pause that customer's subscription.
We can break down this approach with two core concepts:
Now, given a failed subscription payment, what is the optimal way to schedule the remaining retry attempts within the retry window in order to maximize the probability that the failed payment is converted?
Let’s first consider a simple strategy. Putting ourselves in the merchant’s shoes, we could just spread the remaining attempts evenly across the next 28 days. This would mean retrying every four days in the hope that at some point this customer might have enough money in their account, or that whatever other issue facing the payment originally has now been resolved.
In principle, this seems like a good idea and could go a long way towards helping the payment to convert; however, it is not without its issues. Let’s have a look at what happens when we try to apply this strategy in country X.
Below, we see the expected probability of a successful retry conditional on the day of the month. In some cases, this approach strikes lucky, whereas in other instances this choice is clearly suboptimal.
Probability of a successful retry by day of month.
There are many other factors that might influence the outcome of a retry:
Unfortunately, there are likely to hundreds, if not thousands, of these types of rules that could apply to this particular customer and payment. With this in mind, we need a smarter decision-making system, one that is able to learn as many of these rules as possible to give the best chance of payment conversion.
Along with incorporating more factors to help in this decision making, we also need to ensure that the new decision-making system doesn’t just repeat what seemed optimal at the time; it should also be able to effectively respond to new and changing situations. This system needs to effectively distribute all of the available retries across the retry window, and not put all its eggs in one basket, so to speak.
Similar to our previous blog post about optimizing payment conversion rates, we decided to cast this as a contextual multi-armed bandit problem and solve it by implementing a variation on the classic random forest algorithm. Recall that a contextual bandit can be thought of as a repeated game with each stage consisting of the following steps:
In the case of the AutoRescue model, the action space is considered to be all possible future times at which to retry the failed payment, within the retry window. We accept a trade-off between exploration and exploitation because of:
Initial experimentation showed that tree-based classifiers gave the best results and would already generate a significant uplift using an exploitation-only approach. Nevertheless, we wanted to include an element of exploration in our model for the above mentioned reasons and considered a number of approaches in doing so. Ultimately, we decided on an epsilon-greedy approach, but taking into account the degree of confidence of the model predictions during the random action selection stage.
Since random forests showed favorable results during experimentation, we decided to use this algorithm since it relies on bootstrapping, and can therefore provide information about the full conditional distribution of the target variable. Instead of averaging the results of the individual trees in the forest, we can use these predictions to form an empirical distribution from which to sample, thereby emulating the process of Thompson Sampling from a posterior distribution.
Before we continue let’s clarify a few of the terms we just used:
Two beta distributions with different shape parameters.
Let’s look at an actual example from the AutoRescue model. The below example shows the probability of a successful retry by day (the true action space is more granular than this) for a particular set of payment characteristics as returned by our random forest model. Suppose we had a single retry left;in this case, we would opt to retry on penultimate days of the retry window during the epsilon-greedy exploitation stage.
Probabilities returned by random forest model across the retry window.
During the random action selection stage of the epsilon-greedy algorithm, instead of looking at the conditional (weighted) mean usually returned by the random forest, we return the probabilities estimated by the individual trees (pictured below) and randomly select one for each action. We then select the action with the highest sampled probability.
Probabilities returned by the individual trees in the random forest.
As we can see from the above example, our model is more confident about the probability of a successful retry for some actions than others. By allowing the model to still explore areas for which it exhibits a greater degree of uncertainty, we might discover actions that end up returning higher rewards.
The final problem to solve is how to optimally spread our attempts across the retry window. Looking at the graph above, one might be tempted to schedule a few attempts during the first week and the majority of attempts towards the end of the retry window. However, probabilities for the remaining available actions are recalculated after each failed attempt, leading to a different proposed distribution of attempts across the retry window.
Problems arise when the predicted probability of a retry being successful is high for actions towards the end of the rescue window, but are less so when recalculated after an unsuccessful retry. We noticed that this is particularly the case when certain model input features become dominant, and have a comparable degree of contribution in determining the models output across adjoining actions (we can use SHAP values to quantify this contribution).
We ended up using a heuristic approach in which the model prioritizes(smoothed) local probability peaks. This generally caused the attempts to be spread more across the retry window, and was accompanied by an uplift in successful orders.
An important question still remains: how did this model perform in practice? Over the summer, we conducted an experiment to compare its performance with the baseline strategy we introduced above - namely, retrying each failed payment every four days. The first experiment we tried, which didn’t include logic of distribution attempts throughout the retry window described above, saw our machine learning model convert about 6% more orders than the baseline.
Using these results as a starting point, we spent some time developing the model further, adding the local peaks logic we outlined in the preceding paragraph as well as some additional functionality to improve how the model learns success probabilities from the data it receives. Initial results for this model are even better than those of its previous iteration, and we are confident that we have built a product that can drive real value for our clients.
By submitting this form, you acknowledge that you have reviewed the terms of our Privacy Statement and consent to the use of data in accordance therewith.