In recent years, machine learning techniques have found their way into the insurance world. While these methods generally improve model accuracy, both explainability and manual interventions continue to play a key role in risk and tariff modelling. This is why practitioners in many lines of business still apply Generalised Linear Models (GLMs) today for non-life pricing.
But conventional modelling with GLMs comes with downsides. It is a mostly manual and step-by-step process, which may result in overfitting or unrecognised main/interaction effects.
However, GLMs do offer variants in the flavour of machine learning that automatically adapt to patterns in the data. These techniques are known as regularised GLMs, and their most prominent versions are the Lasso, Ridge regression and elastic nets. Not only can these methods proactively prevent overfitting but also adaptively learn non-linear patterns in the data along with an implicitly integrated pre-processing and selection of variables.
In this web session, we will dive into a specific algorithm that uses GLM regularisation in an easy yet powerful way. In this algorithm, we first postulate a complex model structure that represents all potential linear and non-linear patterns for the main effects (and possibly interaction effects) in the data. We then introduce a global penalty term which we apply to reduce the model to only the statistically significant effects at which model accuracy on unseen data performs best.
From a theoretical point of view, we first substantially widen the GLM modelling space to reduce (or even eliminate) any model bias. This implicitly introduces a higher model variance which we counteract by continuously increasing the penalty term. Since the reduction in variance comes again at the cost of allowing some bias, the algorithm effectively allows us to control for the bias-variance trade-off in the GLM modelling space, and we aim for the GLM that simultaneously minimises the prediction error.
Applying the algorithm results in a simple but generally more accurate model in which we adaptively learned the relevant effects in a data-driven, simultaneous and automated way. A key feature is that we can account for all common types of explanatory variables (continuous, ordinal, nominal) both at the same time and in the same way. The desired balance between model simplicity and forecast accuracy can be set by means of a single control parameter. The final model has a proven GLM structure that is still explainable and allows seamless integration into existing pricing workflows.
During the web session, we will first explore the theoretical foundations of both the bias-variance trade-off in predictive modelling and general GLM regularisation. We will then study the explicit design of the algorithm. The remainder will be hands-on as we provide extensive code that implements the algorithm in the statistical programming language R. We will discuss and run the code using a realistic case study in actuarial claims frequency modelling. You will learn how to use the programme and apply the algorithm to non-life claims data for pricing. Further focus will be on the visualisation of the results, especially on the insights gained from the learned meta-results of the algorithm, e.g., the implicit way how we selected, prioritised and pre-processed variables.