14/15 Nov 2024
Hands-on Adaptive Learning of GLMs for Risk Modelling in R
In recent years, machine learning techniques have found their way into the insurance world. While these methods generally improve model accuracy, both explainability and manual interventions continue to play a key role in risk and tariff modelling. This is why practitioners in many lines of business still apply Generalised Linear Models (GLMs) today for non-life pricing.
But conventional modelling with GLMs comes with downsides. It is a mostly manual and step-by-step process, which may result in overfitting or unrecognised main/interaction effects.
However, GLMs do offer variants in the flavour of machine learning that automatically adapt to patterns in the data. These techniques are known as regularised GLMs, and their most prominent versions are the Lasso, Ridge regression and elastic nets. Not only can these methods proactively prevent overfitting but also adaptively learn non-linear patterns in the data along with an implicitly integrated pre-processing and selection of variables.
In this web session, we will dive into a specific algorithm that uses GLM regularisation in an easy yet powerful way. In this algorithm, we first postulate a complex model structure that represents all potential linear and non-linear patterns for the main effects (and possibly interaction effects) in the data. We then introduce a global penalty term which we apply to reduce the model to only the statistically significant effects at which model accuracy on unseen data performs best.
From a theoretical point of view, we first substantially widen the GLM modelling space to reduce (or even eliminate) any model bias. This implicitly introduces a higher model variance which we counteract by continuously increasing the penalty term. Since the reduction in variance comes again at the cost of allowing some bias, the algorithm effectively allows us to control for the bias-variance trade-off in the GLM modelling space, and we aim for the GLM that simultaneously minimises the prediction error.
Applying the algorithm results in a simple but generally more accurate model in which we adaptively learned the relevant effects in a data-driven, simultaneous and automated way. A key feature is that we can account for all common types of explanatory variables (continuous, ordinal, nominal) both at the same time and in the same way. The desired balance between model simplicity and forecast accuracy can be set by means of a single control parameter. The final model has a proven GLM structure that is still explainable and allows seamless integration into existing pricing workflows.
During the web session, we will first explore the theoretical foundations of both the bias-variance trade-off in predictive modelling and general GLM regularisation. We will then study the explicit design of the algorithm. The remainder will be hands-on as we provide extensive code that implements the algorithm in the statistical programming language R. We will discuss and run the code using a realistic case study in actuarial claims frequency modelling. You will learn how to use the programme and apply the algorithm to non-life claims data for pricing. Further focus will be on the visualisation of the results, especially on the insights gained from the learned meta-results of the algorithm, e.g., the implicit way how we selected, prioritised and pre-processed variables.
This web session is suited for actuaries in (but not restricted to) non-life pricing using Generalised Linear Models (GLMs) for risk modelling.
We assume knowledge of theoretical foundations of GLMs (definition, estimation theory, predictive modelling and statistical inference) and practical experience in modelling with GLMs (e.g., conventional risk modelling with GLMs in P&C, although not necessarily required to be able to follow the web session).
As we will be diving into the underlying R code for this specific use case, you should bring general knowledge of the R programming language (https://www.r-project.org/) and experience with R Studio (https://www.rstudio.com/). Both R and R Studio need to be installed prior to attending the web session in order to participate in the case study. We will distribute technical requirements in advance.
Technical RequirementsPlease check with your IT department if your firewall and computer settings support web session participation (the programme Zoom will be used for this online training). Please also make sure that you are joining the web session with a stable internet connection.
Both R (https://www.r-project.org/) and R Studio (https://www.rstudio.com/) as well as a pre-defined set of R packages need to be installed prior to attending the web session in order to participate in the case study. We will distribute the specific technical requirements, e.g., the list of required R packages prior to the web session. Please make sure to install and test the required software well before the web session to avoid potential technical difficulties, e.g., due to the firewall of your IT department or insufficient admin rights, during the web session.
Purpose and Nature
The purpose of this web session is twofold:
First, you will gain an in-depth understanding of the algorithm including its underlying theoretical motivation and its statistical properties.
Second, you will receive a comprehensive executable R programme that implements the algorithm. During the web session we will discuss and apply the code hands-on by means of a case study. You will learn how to run the algorithm for different settings and data. After the web session, you will have an R programme to add to your actuarial toolbox, which you can easily apply to own data and extend or modify according to your needs.
Dr Lukas Hahn
Lukas is a certified actuary and works as a Lead Data Scientist at SV SparkassenVersicherung in Stuttgart, Germany. The focus of his work lies on both the development and productive deployment of statistical and machine learning models in SV's big data ecosystems on various use cases ranging from actuarial non-life pricing to customer lead management. As a key component of his work, he maintains a self-service tool with a dynamic web-based user interface that builds upon the algorithm that we will use in this web session. The tool is now deployed company-wide as a software as a service for data-driven explainable data analyses.
Before joining SV in 2019, Lukas worked at the Institute of Finance and Actuarial Science (ifa) in Ulm as a senior consultant on data analytics in insurance with projects in all lines of business.
Lukas holds degrees in business mathematics (M.Sc.) from Ulm University and statistics (M.Math.) from the University of Waterloo in Waterloo, Canada. In 2019, he received his doctorate from Ulm University.
Lukas is a member of the German Actuarial Association (DAV) and a Certified Actuarial Data Scientist (CADS). He is lecturer for the German certification programme on actuarial data science for the German Actuarial Academy (DAA).