### The Adyen way of engineering: we design for 20x

6 Minutes

- For subscription payments, requesting an account (card) update from the card scheme and subsequently applying it.
- Optimizing the payment message formatting (ISO message), e.g. banks might prefer getting fields in a certain format (cardholder name, address fields, etc).
- Instantly retrying a payment, e.g. when a bank’s system is temporarily overloaded.

This resulted in a few problems:

- Small sample problem. Adding additional payment features (e.g. amount or transaction type) could result in small and thus insignificant A/B test groups.
- Experimentation versus exploitation. The result of an experiment is deterministic, i.e. if the underlying world changes then the experiment would have to be re-run again.

- Conceptually, it caters to both problems that were faced.
- The context (payment features) is very important to take into account, hence the contextual addition.
- While the world might constantly change, a full-fledged reinforcement learning set-up was out-of-reach from a technology perspective.

- The environment (i.e. the real world) reveals a context (i.e. payment features).
- The learner chooses an action (i.e. an optimization).
- The environment reveals a reward (i.e. 0 for a non-successful and 1 for successful payment).

An example of our setting with contexts, actions and rewards.

- A range of numerical features, e.g. amount converted to EUR, card expiry date delta in days (between the payment date and expiry date).
- A wide range of categorical features (e.g. card type, bank, company, etc) for which we used Target Encoding (i.e. what was the conversion rate for a specific category value?)
- Each optimization decision gets its own dummy feature, i.e. each optimization has a feature of zero or one indicating whether it was applied on a payment. Note that every combination of optimizations is considered as one action.

Our policy is constructed in the following manner:

- The best action is selected in an epsilon-greedy manner, i.e. the ratio of “best optimizations” (exploitation) is fixed to a static percentage α. Over time, this percentage can either be increased or decreased based on the performance observed.
- For the remaining percentage of 1-α, we pick an action from the remaining set of possible optimizations (exploration). To scale the remaining probabilities, we use a softmax function to convert the probabilities of success to scaled/normalized probabilities from which to pick actions from.
- In the end, this results in probabilities per action that sum up to 1. The ultimate action is then randomly chosen using these scaled/normalized probabilities as weights.

Conversion rates

We are on the lookout for talented engineers and technical people to help us build the infrastructure of global commerce!

Check out developer vacanciesBy submitting this form, you acknowledge that you have reviewed the terms of our Privacy Statement and consent to the use of data in accordance therewith.