Chapter 11 Credibility

11.1 Introduction

Within a heterogeneous insurance portfolio, policyholders are not all equal in the face of risk, with some presenting a riskier profile than others. Charging a uniform premium for all may appear unfair because it would necessarily lead to overcharging some policyholders, using these surcharges to compensate for claims caused by riskier individuals. Of course, we can reduce the portfolio’s heterogeneity by partitioning it into risk classes as homogeneous as possible (based on observable characteristics such as gender, age, residence, etc. of policyholders), as we discussed in the previous chapter. However, there will often still be some heterogeneity within each class, as observable factors are far from explaining the full riskiness of policyholders. Therefore, it is quite natural to use an individual’s claim experience to reevaluate the amount of their premium. Indeed, an individual’s claim experience should at least partially reflect their actual risk level. This practice falls under the theory of credibility.

For example, as early as 1910, Allstate provided insurance coverage for work-related accidents to General Motors and a number of small businesses. By calculating the average premium rate based on its experience, General Motors realized that its insurance premium should be lower than that of all insured businesses. Claiming that the number of policyholders was large enough, General Motors demanded that its insurer consider its own history and no longer that of all policyholders. At the same time, a small independent manufacturer, Tucker, made the same request. This raised a clear question for Allstate’s actuaries: from what size can a business be considered sufficiently large to base rates on its own experience? (Mowbray 1914) was the first to provide a clear answer to this question, thus laying the foundation for what is known as stability credibility (briefly introduced in Section 11.4). However, if he proposes a threshold beyond which size is sufficient, what should be done for smaller-sized businesses? It would take a few more years until (Whitney 1918) mentioned “the necessity, out of fairness to the policyholder, to weigh collective experience on one side and individual experience on the other.” The entire theory of credibility aims precisely to calculate this weighting as accurately as possible. It is worth noting that it would take the contributions of (Bühlmann 1967) and (Bühlmann 1969) for the problem of incorporating experience into premium calculation to find a satisfactory solution.

The fundamental idea of credibility theory can be summarized as follows. Suppose we have observed a policy for $n$ years and recorded the annual claim amounts $x_1,x_2,\ldots,x_n$; $x_i$ is the claim amount generated by this policy during the $i$th year of observation. The “observed” pure premium is thus \[ \overline{p}_n=\frac{x_1+x_2+\ldots+x_n}{n}. \] The insurer might consider demanding a future premium of $\overline{p}_n$ from this policyholder, but this would disregard the very principle of insurance by forgoing any risk sharing (the insurer would then resemble a lender, smoothing out claims over time without sharing the risk). Moreover, what should be done for policyholders who have never reported claims (i.e., those for whom $x_1=x_2=\ldots=x_n=0$), exempt them from paying the premium while still covering them? The insurer is thus faced with a dilemma: either continue to demand a uniform amount $p_{coll}$ from all policyholders, which could dissatisfy the “good” policyholders who, feeling aggrieved, may consider switching to the competition, or be tempted by $\overline{p}_n$ but thereby deny the very principle of insurance. American actuaries then considered demanding a premium whose amount would be a compromise between these two extreme positions. Thus, the premium $p_n$ demanded by the company for providing coverage in year $n+1$ is given by \[ p_n=\alpha\overline{p}_n+(1-\alpha)p_{coll},\hspace{2mm}0\leq\alpha\leq 1, \] where $\alpha$ is the credibility factor (it measures the “credibility” assigned to the “observed” premium $\overline{p}_n$).

The usual choices for $\alpha$ are \[ \alpha=\frac{n}{n_0 +n} \] which tends towards 1 as $n$ tends towards $+\infty$, where $n_0$ is a fixed parameter and $n$ is the number of years of available observations, and \[ \alpha=\min\left\{\frac{n}{n^*},1\right\}, \] where $n^*$ is a threshold value beyond which full credibility is granted to the policy.

11.2 Bayesian Credibility

11.2.1 Introductory Example

11.2.1.1 Posterior Credibility

Let’s start with a simple example to understand the essence of the method. Let $N_{it}$ be the number of claims reported in year $t$ by policyholder $i$. Two types of risks coexist within this portfolio: good drivers (denoted as B) and bad drivers (denoted as M). We do not know to which category policyholder $i$ belongs. Their profiles differ significantly and are given by \[\begin{eqnarray*} \Pr[N_{it}=k|B]&=&\exp(-\lambda_B)\frac{\lambda_B^k}{k!}\text{ with }\lambda_B=0.05\\ \Pr[N_{it}=k|M]&=&\exp(-\lambda_M)\frac{\lambda_M^k}{k!}\text{ with }\lambda_M=0.15. \end{eqnarray*}\] Furthermore, let’s assume that claim amounts are independent and identically distributed random variables with a mean of 1; therefore, the pure premium is equal to the claim frequency, i.e., $\lambda_B$ or $\lambda_M$ depending on the driver’s quality. For the pricing of a new risk, for which we have no information yet, the insurer will require $p_{coll}$, which is \[ p_{coll} = \mathbb{E}[N_{i1}]=\mathbb{E}[N_{i1}|B]\Pr[B] + \mathbb{E}[N_{i1}|M]\Pr[M]. \] Assuming that the proportion of good drivers in the portfolio is 50%, we obtain \[ p_{coll} = 0.05\times 0.5 + 0.15\times 0.5 = 0.1. \] Note that we determine the probability that the new policyholder is a good driver based on the current portfolio composition, which implicitly assumes that the new policyholder identifies with the current policyholders.

At the end of the first year, the policyholder has reported $N_{i1}$ claims. The principle of credibility theory is to use this information to reassess the policyholder’s risk level and, consequently, the premium they must pay to be covered by the insurer. This is done using Bayes’ theorem (presented in Section 2.2.8), which tells us that the probability that the policyholder is a good driver when they have reported $k$ claims in the first year is \[ \Pr[B|N_{i1}=k]=\frac{\exp(-\lambda_B)\lambda_B^k}{\exp(-\lambda_B)\lambda_B^k+\exp(-\lambda_M)\lambda_M^k}. \] The premium for the second year then becomes \[\begin{equation} \mathbb{E}[N_{i2}|N_{i1}=k]=\lambda_B\Pr[B|N_{i1}=k]+\lambda_M\Pr[M|N_{i1}=k]. \tag{11.1} \end{equation}\] After one year, the probability that the policyholder is a good or bad driver based on the number $k$ of claims reported in the first year is shown in the following table: \[3mm] {

}

Formula (11.1) is based on the following assumption: conditionally on the quality of a portfolio risk, the annual claim counts caused by that risk are independent random variables. Indeed, (11.1) can be written in full generality as \[\begin{eqnarray*} \mathbb{E}[N_{i2}|N_{i1}=k]&=&\mathbb{E}[N_{i2}|N_{i1}=k,B]\Pr[B|N_{i1}=k]\\ &&+\mathbb{E}[N_{i2}|N_{i1}=k,M]\Pr[M|N_{i1}=k] \end{eqnarray*}\] which reduces to the announced expression when $N_{i2}$ is independent of $N_{i1}$ conditionally on driver quality.

11.2.2 Bayesian Posterior Pricing Model

11.2.2.1 Form of the Posterior Premium

Let’s consider policyholder $i$ who has generated claim amounts $X_{i1},X_{i2},\ldots,X_{in}$ over the first $n$ years of coverage. The premium that will be charged for coverage in year $n+1$ will be a function of past claim amounts, $g(X_{i1},\ldots,X_{in})$ let’s say. This function $g$ will be chosen to minimize the mean square deviation between the premium and the claims, i.e., to minimize $\mathbb{E}[(X_{in+1}-g(X_{i1},\ldots,X_{in}))^2]$. Property 3.8.9 tells us that the solution to this optimization problem is \[ g^*(X_{i1},\ldots,X_{in})=\mathbb{E}[X_{in+1}|X_{i1},\ldots,X_{in}]. \]

11.2.2.2 Assumptions

Now let’s specify the model. We make the following assumptions:

Before proceeding, it is important to understand the scope of the conditional independence assumption stated in H2. Due to our ignorance about the intrinsic quality of the risk covered, represented by $\Lambda_i$, the random variables $X_{i1},\ldots,X_{in}$ influence $X_{in+1}$ in that they provide information about $\Lambda_i$, which in turn influences $X_{in+1}$. Thus, it is an apparent dependence: there is no dependence between the $X_{it}$ strictly speaking (because once the quality of the risk $\Lambda_i$ is known, the annual claim amounts become independent), but rather an apparent dependence generated by the uncertainty about the quality of the covered risk.

11.2.2.3 A Priori Structure Function

Let $U$ be the common cumulative distribution function of $\Lambda_i$, i.e., \[ U(\lambda)=\Pr[\Lambda_i\leq\lambda],\hspace{2mm}\lambda\in {\mathbb{R}}^+, \] with $U(0)=0$. The support of $\Lambda_i$ will be denoted as $L\subseteq{\mathbb{R}}^+$. We denote $u$ as the probability density associated with $U$; $u$ is called the structure function of the portfolio (it describes the distribution of the quality of risks in the portfolio). If we are dealing with a new risk, its risk parameter is {} a random variable taking values in $L\subseteq {\mathbb{R}}^+$.

11.2.2.4 Conditional Law of Claim Costs

We make, as in the example of the previous section, the assumption of conditional independence of annual claim amounts: if the risk parameter $\Lambda_i$ is equal to $\lambda$, then the annual claim amounts are independent and identically distributed random variables with a cumulative distribution function $G_{\lambda}$, i.e., \[ G_{\lambda}(x)=\Pr[X_{it}\leq x\hspace{2mm}|\Lambda_i=\lambda] \] where $X_{it}$ is the claim amount for the $t$th year, $t\in {\mathbb{N}}$. Furthermore, $G_{\lambda}$ has a probability density $g_{\lambda}$. We denote $\mu(\lambda)$ and $\sigma^2(\lambda)$ as the mean and variance associated with $G_{\lambda}$, i.e., \[ \mu(\lambda)=\mathbb{E}[X_{it}|\Lambda_i=\lambda] =\int_{x=0}^{+\infty}x\,dG_{\lambda}(x)%=\int_{x=0}^{+\infty}xg_{\lambda}(x)\,dx \] and \[ \sigma^2(\lambda)={\mathbb{V}}[X_{it}|\Lambda_i=\lambda] =\int_{x=0-}^{\infty}\big(x-\mu(\lambda)\big)^2\,dG_{\lambda}(x). \]

11.2.2.5 A Priori Law of Claim Costs

Let’s now calculate the {} cumulative distribution function $G$ of $X_{it}$. For $x\in {\mathbb{R}}^+$, we have \[\begin{eqnarray*} G(x) & = & \Pr[X_{it}\leq x] = \mathbb{E}\Big[\Pr[X_{it}\leq x|\Lambda_i]\Big] \\ & = & \int_{\lambda\in L}^{} G_{\lambda}(x) u(\lambda)\,d\lambda. \end{eqnarray*}\] Therefore, the mean $\mu$ of $X_{it}$ is given by \[\begin{eqnarray*} \mu=\mathbb{E}[X_{it}]&=&\int_{x=0}^{+\infty}x\,dG(x)\\ & = & \int_{\lambda\in L}\int_{x=0}^{+\infty} x \,dG_{\lambda}(x) u(\lambda)\,d\lambda \\ & = & \int_{\lambda\in L}^{} \mu(\lambda)u(\lambda)\, d\lambda=\mathbb{E}[\mu(\Lambda_i)]. \end{eqnarray*}\]

11.2.2.6 Posterior Structure Function

Let a risk with an {} unknown parameter $\Lambda_i$ be considered, and suppose we have $n$ observations $x_1,x_2,\ldots,x_n$ of $X_{i1},X_{i2},\ldots,X_{in}$. What is the distribution of the risk parameter $\Lambda_i$ given these observations? We will denote $u(\lambda|x_1,x_2,\ldots,x_n)$ as the corresponding probability density function. It follows that \[\begin{equation} u(\lambda|x_1,x_2,\ldots,x_n)=\frac{g_{\lambda}(x_1,x_2,\ldots,x_n)u(\lambda)}{g(x_1,x_2,\ldots,x_n)}, \hspace{2mm}\lambda\in L, \tag{11.2} \end{equation}\] where $g(x_1,x_2,\ldots,x_n)$ is the density function associated with the cumulative distribution function \[ G(x_1,x_2,\ldots,x_n) = \Pr[X_{i1}\leq x_1,X_{i2}\leq x_2,\ldots,X_{in}\leq x_n] \] and where $g_{\lambda}(x_1,x_2,\ldots,x_n)$ is the probability density corresponding to the conditional cumulative distribution function \[ G_{\lambda}(x_1,x_2,\ldots,x_n) = \Pr[X_{i1}\leq x_1,X_{i2}\leq x_2,\ldots,X_{in}\leq x_n | \Lambda_i=\lambda]. \]

Furthermore, given the value of $\Lambda_i$, let it be $\lambda$, the annual claim amounts are independent, and thus (11.2) can be rewritten as \[ u(\lambda|x_1,x_2,\ldots,x_n) = \frac{u(\lambda)\prod_{i=1}^n g_{\lambda}(x_i)}{g(x_1,x_2,\ldots,x_n)}, \hspace{2mm}\lambda\in L. \]

11.2.2.7 Posterior Law of Claim Costs

The a posteriori probability density function of the claim amount $X_{i,n+1}$ for the $(n+1)$st year is given by \[ g(x|x_1,x_2,\ldots,x_n)=\int_{\lambda\in L}^{} g_{\lambda}(x) u(\lambda|x_1,x_2,\ldots,x_n)\,d\lambda \] since $g(x|x_1,x_2,\ldots,x_n,\lambda)= g_{\lambda}(x)$ due to the conditional independence of annual claim costs.

11.2.2.8 Posterior Premium

The {} pure premium is given by \[\begin{eqnarray} p(x_1,x_2,\ldots,x_n) & = & \mathbb{E}[X_{i,n+1}|X_{i1}=x_1,X_{i2}=x_2,\ldots,X_{in}=x_n] \nonumber\\ & = & \int_{x=0}^{\infty} x g(x|x_1,x_2,\ldots,x_n)\,dx. \tag{11.3} \end{eqnarray}\] While the {} pure premium is \[ \mu=\int_{x=0}^{+\infty} x g(x)\,dx=\int_{\lambda=0}^{+\infty}\mu(\lambda)u(\lambda)d\lambda. \] The {} distribution of risk parameters is used to determine the {} pure premium. The latter is still given by \[\begin{eqnarray*} p(x_1,x_2,\ldots,x_n) & = & \mathbb{E}[\mu(\Lambda_i)|X_{i1}=x_1,X_{i2}=x_2,\ldots,X_{in}=x_n]. %\\ % & = & \int_{\lambda\in L}\mu(\lambda) u(\lambda|x_1,x_2,\ldots,x_n)\,d\lambda. \end{eqnarray*}\]

11.2.2.9 Exponential Family and Exact Credibility

Let’s assume that the conditional density $g_\lambda$ of claim amounts is of the form \[ g_{\lambda}(x)=\frac{a(x)\exp\big(-\lambda t(x)\big)}{c(\lambda)},\hspace{2mm}x\in {\mathbb{R}}^+, \] where $c$ is a normalization factor given by \[ c(\lambda)=\int_{x=0}^{+\infty} a(x)\exp\big(-\lambda t(x)\big)\,dx \] and where $a(x)\geq 0$ for all $x$. In this case, the {} distribution of the risk parameter takes the form \[\begin{eqnarray*} u(\lambda|x_1,x_2,\ldots,x_n) & = & \frac{u(\lambda)\prod_{i=1}^n g_{\lambda}(x_i)}{g(x_1,x_2,\ldots,x_n)} \\ & = & \frac{u(\lambda)\prod_{i=1}^n g_{\lambda}(x_i)} {\int_{\zeta\in L}u(\zeta)\prod_{i=1}^n g_{\zeta}(x_i)d\zeta}\, \\ & = & \frac{(\frac{1}{c(\lambda)})^n \exp\left(-\lambda \sum_{i=1}^nt(x_i)\right) u(\lambda)}{\int_{\zeta\in L} (\frac{1}{c(\zeta)})^n \exp\left(-\zeta \sum_{i=1}^nt(x_i)\right) u(\zeta)\,d\zeta}; \end{eqnarray*}\] we can see that the {} density $u(\cdot|x_1,x_2,\ldots,x_n)$ depends on the observations only through $n$, the number of observations, and $T=\sum_{i=1}^nt(x_i)$, i.e. \[\begin{eqnarray*} u(\lambda|x_1,x_2,\ldots,x_n) %& = &\varphi(\lambda,n, T)\\ & = & \frac{ (\frac{1}{c(\lambda)})^n \exp(-\lambda T) u(\lambda)} {\int_{\zeta\in L} (\frac{1}{c(\zeta)})^n \exp(-\zeta T)u(\zeta)\,d\zeta}. \end{eqnarray*}\]

One might wonder about the existence of structure functions $u$ such that the {} density of the risk parameter $u(\cdot|x_1,\ldots,x_n)$ and the {} density $u(\cdot)$ have the same analytical form. This will be the case if \[ u(\lambda)=\left(\frac{1}{c(\lambda)}\right)^{n_0} \exp(-\lambda t_0) \frac{1}{d(n_0,t_0)}\equiv u(\lambda ;n_0,t_0), \] where $d(n_0,t_0)$ is a normalization factor depending on the two parameters $n_0$ and $t_0$, i.e. \[ d(n_0,t_0)= \int_{\zeta\in L} \left(\frac{1}{c(\zeta)}\right)^{n_0} \exp(-\zeta t_0)\,d\zeta. \] The family of structure functions $u(\cdot ;n_0,t_0)$ is the family of “conjugate” structure functions to the exponential family. If $u(\cdot)$ is of the form above, then \[\begin{eqnarray*} u(\lambda|x_1,x_2,\ldots,x_n) & = & \frac{(\frac{1}{c(\lambda)})^{n+n_0} \exp\big(-\lambda(T+t_0)\big)} {\int_{\zeta\in L} (\frac{1}{c(\zeta)})^{n+n_0}\exp\big(-\zeta(T+t_0)\big)\,d\zeta}\\ & = & u(\lambda ;n_0+n,t_0+T) ; \end{eqnarray*}\] we thus obtain a law from the family of conjugate structure functions but with parameters $n_0+n$ and $t_0+T$. The {} probability density of the annual claim amount is \[\begin{eqnarray*} g(x) & = & \int_{\lambda\in L} g_{\lambda}(x) u(\lambda ;x_0,n_0)\,d\lambda \\ & = & \int_{\lambda\in L} \frac{a(x) \exp(-\lambda x)}{c(\lambda)} \frac{\exp(-\lambda x_0)}{(c(\lambda))^{n_0}} \frac{1}{d(x_0,n_0)}\,d\lambda \\ & = & \frac{a(x)}{d(x_0,n_0)} \int_{\lambda\in L} \frac{\exp\big(-\lambda(x_0+t(x))\big)} {(c(\lambda))^{n_0+1}}\,d\lambda \\ & = & \frac{a(x)}{d(x_0,n_0)} \hspace{2mm} d(x_0+t(x) ;n_0+1). \end{eqnarray*}\] Regarding the {} probability density of the annual claim costs, it is given by \[\begin{eqnarray*} g(x|x_1,\ldots,x_n) & = & \int_{\lambda\in L} g_{\lambda}(x) u(\lambda|x_1,\ldots,x_n)\,d\lambda\\ & = & \int_{\lambda\in L} \frac{a(x) \exp\big(-\lambda t(x)\big)\exp\big(-\lambda(x_0+T)\big)}{(c(\lambda))^{n_0+n+1}d(x_0+T,n_0+n)}\,d\lambda \\ & = & \frac{a(x)}{d(x_0+T,n_0+n)} \int_{\lambda\in L}\frac{\exp\big(-\lambda(x_0+T+t(x))\big)} {(c(\lambda))^{n_0+n+1}}\,d\lambda \\ & = & \frac{a(x)}{d(x_0+T,n_0+n)} d(x_0+T+t(x),n_0+n+1). \end{eqnarray*}\]

Now, let’s consider a particular case of the exponential family where $t(x)=x$. The density then becomes \[ g_{\lambda}(x) = \frac{a(x) \exp(-\lambda x)}{c(\lambda)} \] where \[ c(\lambda)=\int_{x=0}^{\infty} a(x) \exp(-\lambda x)\,dx. \] The conjugate structure functions $u(\cdot ;n_0,x_0)$ are then given by \[ u(\lambda ;n_0,x_0) = \frac{\exp(-\lambda x_0)}{(c(\lambda))^{n_0}} \frac{1}{d(n_0,x_0)}. \] For a policy with a risk parameter $\Lambda$, the average claim amount is \[ \mu(\lambda) = \int_0^{\infty} x\frac{a(x) \exp(-\lambda x)}{c(\lambda)}\,dx. \] Or, we can notice that \[ c'(\lambda)=\frac{d}{d\lambda}c(\lambda) =- \int_{x=0}^{\infty} xa(x) \exp(-\lambda x)\,dx, \] so \[ \mu(\lambda) = -\frac{c'(\lambda)}{c(\lambda)} = -\frac{d\ln(c(\lambda))}{d\lambda}. \] The unconditional mean $\mu$ can then be written as \[\begin{eqnarray*} \mu & = & \int_{x=0}^{\infty} x g(x) \,dx \\ & = & \int_{\lambda\in L} \mu(\lambda) u(\lambda;n_0,x_0)\,d\lambda \\ & = & -\int_{\lambda\in L} \frac{c'(\lambda)}{c(\lambda)} \frac{\exp(-\lambda x_0)}{(c(\lambda))^{n_0}} \frac{1}{d(n_0,x_0)}\,d\lambda. \end{eqnarray*}\] By integrating this last expression by parts, we obtain \[\begin{eqnarray*} \mu & = & \frac{1}{d(n_0,x_0)}\left\{ \left[\frac{\exp(-\lambda x_0)}{n_0(c(\lambda))^{n_0}}\right]_L +\int_{\lambda\in L} \frac{x_0 \exp(-\lambda x_0)}{n_0(c(\lambda))^{n_0}}\,d\lambda\right\} \\ & = & \frac{1}{n_0} [u(\lambda ;n_0,x_0)]_L + \frac{x_0}{n_0}. \end{eqnarray*}\] Assuming that $u(\cdot ;n_0,x_0)$ is zero at the boundaries of $L$, the first term vanishes, and we are left with \[ \mu = \frac{x_0}{n_0}, \] which is the expectation of the claim amount when no information is available.

The {} pure premium is given by \[\begin{eqnarray*} p(x_1,x_2,\ldots,x_n) & = & \mathbb{E}[X_{in+1}|X_{i1}=x_1,X_{i2}=x_2,\ldots,X_{in}=x_n] \\ & = & \int_{\lambda\in L} \mu(\lambda) u(\lambda|x_1,\ldots,x_n)\,d\lambda \\ & = & \int_{\lambda\in L} \mu(\lambda) u\left(\lambda ;n_0+n,x_0+\sum_{i=1}^n x_i\right)\,d\lambda \\ & = & \frac{x_0 + \sum_{i=1}^n x_i}{n_0+n}. \end{eqnarray*}\] This a posteriori premium is indeed a credibility premium: it can be expressed as a convex linear combination of the collective pure premium and the observed premium, with \[ \alpha = \frac{n}{n_0+n},\hspace{2mm}0\leq\alpha\leq 1. \] Thus, in the specific case where we take an exponential claim density and the conjugate family for the structure function, the Bayesian premium is a credibility premium.

Example 11.1 (Exponential-Gamma Couple) This time, we are dealing with a continuous case, where \[ g_{\lambda}(x) = \lambda \exp(-\lambda x), \hspace{2mm} \lambda>0. \] This is indeed a member of the exponential family, with \[ a(x)=1,\hspace{2mm}t(x)=x\mbox{ and }c(\lambda)=\frac{1}{\lambda}. \] The family of conjugate structure functions is of the form \[\begin{eqnarray*} u(\lambda ;n_0,x_0) & = & \frac{\lambda^{k_0}}{(c(\lambda))^{n_0}}\frac{1}{d(n_0,x_0)}\\ & = & \frac{x_0^{n_0+1}}{\Gamma(n_0+1)} \lambda^{n_0} \exp\{-\lambda x_0\}; \end{eqnarray*}\] we thus recognize the probability density associated with a Gamma distribution with parameters $n_0+1$ and $x_0$. The {} density of the claim amount is given by \[\begin{eqnarray*} g(x) & = & \int_{\lambda=0}^{\infty} g_{\lambda}(x) u(\lambda ;n_0,x_0) \,d\lambda \\ & = & \frac{x_0^{n_0+1}}{\Gamma(n_0+1)} \int_{\lambda=0}^{\infty} \exp\big(-\lambda (x_0+x)\big) \lambda^{n_0+1} \,d\lambda \\ & = & \frac{x_0^{n_0+1}}{\Gamma(n_0+1)} \frac{\Gamma(n_0+2)}{(x_0+x)^{n_0+2}} \\ & = & \frac{(n_0+1) x_0^{n_0+1}}{(x_0+x)^{n_0+2}} \\ & = & \frac{n_0+1}{x_0}\left(1+\frac{x}{x_0}\right)^{-n_0-2}, \end{eqnarray*}\] where we recognize the probability density associated with the Pareto distribution.

11.2.2.10 Frequentist Credibility

The discrete case where the $X_{it}$ take values in $\mathbb{N}$ is interesting if we are concerned not with the amount of claims but with the number of them. Suppose that \[ \Pr[X_{it}=k|\Lambda=\lambda] = g_{\lambda}(k)= \frac{a_k \lambda^k}{c(\lambda)},\hspace{2mm}k\in \mathbb{N}, \] where $c(\cdot)$ is a normalization constant equal to \[ c(\lambda) = \sum_{k=0}^{\infty} a_k \lambda^k \] and where $a_k\geq0$ for all $k$. In this case, the conjugate structure function is given by \[ u(\lambda ;n_0,k_0) = \frac{\lambda^{k_0}}{(c(\lambda))^{n_0}}\frac{1}{d(n_0,k_0)}, \] where $d(n_0,k_0)$ is a normalization factor given by \[ d(n_0,k_0)=\int_{\lambda\in L}\frac{\lambda^{k_0}}{(c(\lambda))^{n_0}}d\lambda. \] The property established in the continuous case transfers to the discrete case, meaning that if we choose an exponential discrete distribution for the number of claims and the conjugate family for the prior structure function, the posterior structure function is of the form \[ u(\lambda|k_1,k_2,\ldots,k_n) = u\left(\lambda ;n_0+n,k_0+\sum_{i=1}^n k_i\right) ; \] the analytic form is therefore the same both {} and {}. In this case as well, the Bayesian premium is a credibility premium.

::: {.example}[Bernoulli-Beta Pair]

This concerns a portfolio where contracts can lead to 0 or 1 claim per year, i.e.,

\[ \left\{ \begin{array}{lll} g_\lambda(0) & = & 1-\lambda,\\ g_\lambda(1) & = & \lambda,\\ g_\lambda(k) & = & 0, \hspace{2mm} k=2,3,\ldots, \end{array} \right. \]

where $\lambda\in[0,1]$ is the probability of a claim. This is indeed a discrete exponential model for which

\[ g_\lambda(k)=a_k(1-\lambda)\left(\frac{\lambda}{1-\lambda}\right)^k \]

where $a_0=a_1 = 1$ and $a_k=0$ for all $k\geq 2$. It is thus an exponential family with

\[ c(\lambda)=\frac{1}{1-\lambda}. \]

The conjugate family is given by

We need

\[ d(n_0,k_0)= \int_0^1\lambda^{k_0}(1-\lambda)^{n_0-k_0}\,d\lambda \]

to converge. Now,

\[ d(n_0,k_0)<\infty \Leftrightarrow \left\{ \begin{array}{l} k_0>-1\\ n_0-k_0>-1 \end{array} \right. \]

and this is recognized as a Beta function, i.e.,

\[ d(n_0,k_0) = B(k_0+1,n_0-k_0+1). \]

A priori, the probability that the policy leads to a claim in a year is given by

\[\begin{eqnarray*} g(1) & = & \int_{\lambda=0}^1 \lambda u(\lambda ;n_0,k_0)\,d\lambda \\ & = & \frac{1}{B(k_0+1,n_0-k_0+1)} \int_{\lambda=0}^1\lambda^{k_0+1}(1-\lambda)^{n_0-k_0}\,d\lambda \\ & = & \frac{ B(k_0+2,n_0-k_0+1)}{ B(k_0+1,n_0-k_0+1)}= \frac{k_0+1}{n_0+2}, \end{eqnarray*}\]

from which we deduce $g(0)=1-g(1)$.

Thus, the annual number of claims a priori follows a Bernoulli distribution with parameter

\[ \frac{k_0+1}{n_0+2}, \]

which is the a priori mean $\mu$. Now let’s examine the posterior distribution:

\[\begin{eqnarray*} g(1|k_1,k_2,\ldots,k_n) & = & \int_{\lambda=0}^1 \lambda u\left(\lambda ;n_0+n,k_0+\sum_{i=1}^nk_i\right)\,d\lambda \\ & = & \frac{k_0+\sum_{i=1}^nk_i+1}{n_0+n+2} \\ & = & \frac{n_0+2}{n_0+n+2}\times\frac{k_0+1}{n_0+2}+ \frac{n}{n_0+n+2}\times\frac{1}{n}\sum_{i=1}^nk_i. \end{eqnarray*}\]

If we consider $g(1|k_1,k_2,\ldots,k_n)$ as the a posteriori pure premium, we indeed find the structure of a credibility premium. :::

11.2.2.11 Type of Dependence Induced by the Bayesian Credibility Model

Regardless of the function $h$, we have:

\[\begin{eqnarray*} \mathbb{C}[h(X_{is}),h(X_{it})] & = & \mathbb{E}\Big[\mathbb{C}[h(X_{is}),h(X_{it})|\Lambda_i]\Big] \nonumber\\ & & +{\mathbb{C}}\Big[\mathbb{E}[h(X_{is})|\Lambda_i],\mathbb{E}[h(X_{it})|\Lambda_i]\Big] \nonumber\\ & = & {\mathbb{V}}\Big[\mathbb{E}[h(X_{i1})|\Lambda_i]\Big]\geq 0 \end{eqnarray*}\]

since the $X_{it}$ are conditionally independent given $\Lambda_i$ and identically distributed. Therefore, the annual costs of claims appear to be positively dependent. The following results clarify our intuition. Recall that the concepts of positive dependence by quadrant and conditional growth have been discussed in sections 8.5.2 and 8.5.4.

Proposition 11.1

If $\overline{G}_\lambda(x)\leq \overline{G}_{\lambda '}(x)$ for all $x$ when $\lambda\leq\lambda '$, then for any disjoint subsets $\mathcal{I}$ and $\mathcal{J}$ of $\{1,2,\ldots,n\}$, $\sum_{t\in\mathcal{I}}X_{it}$ and $\sum_{s\in\mathcal{J}}X_{is}$ are positively dependent by quadrant.
If $x\mapsto\frac{g_\lambda (x)}{g_{\lambda '}(x)}$ decreases over the support of $X_{it}$ for $\lambda\leq\lambda '$, then $(X_{i1},X_{i2},\ldots,X_{in})$ is conditionally increasing.

Proof. We only prove (1).

Clearly, for any increasing functions $h_1$ and $h_2$,

\[\begin{eqnarray*} &&\mathbb{C}\left[h_1\left(\sum_{t\in\mathcal{I}}X_{it}\right),h_2\left(\sum_{s\in\mathcal{J}}X_{is}\right)\right] \\ &=& \mathbb{E}\left[\mathbb{C}\left[h_1\left(\sum_{t\in\mathcal{I}}X_{it}\right),h_2\left(\sum_{s\in\mathcal{J}}X_{is}\right)\Big|\Lambda_i\right]\right] \\ & & +\mathbb{C}\left[\mathbb{E}\left[h_1\left(\sum_{t\in\mathcal{I}}X_{it}\right)\Big|\Lambda_i\right], \mathbb{E}\left[h_2\left(\sum_{s\in\mathcal{J}}X_{is}\right)\Big|\Lambda_i\right]\right]. \end{eqnarray*}\]

The first term on the right-hand side is non-negative due to the conditional independence of $X_{it}$. Since the inequality

\[ \Pr\left( \sum_{t\in\mathcal{I}}X_{it}>x|\Lambda_i=\lambda\right)\leq\Pr\left( \sum_{t\in\mathcal{I}}X_{it}>x|\Lambda_i=\lambda '\right) \]

is satisfied for any $\lambda\leq \lambda '$ (one can use Corollary 6.2.5 to see this), the function

\[\begin{eqnarray*} \lambda\mapsto h_1^*(\lambda)&=& \mathbb{E}\left[h_1\left( \sum_{t\in\mathcal{I}}X_{it}\right)\Big|\Lambda_i=\lambda\right]\\ & =&\int_{y=0}^{+\infty}\Pr\left( \sum_{t\in\mathcal{I}}X_{it}>y\Big|\Lambda_i=\lambda\right)dy \end{eqnarray*}\]

is non-decreasing. Therefore, \[ \mathbb{C}\left[\mathbb{E}\left[h_1\left( \sum_{t\in\mathcal{I}}X_{it}\right)\Big|\Lambda_i\right], \mathbb{E}\left[h_2\left( \sum_{s\in\mathcal{J}}X_{is}\right)\Big|\Lambda_i\right]\right] \] \[= \mathbb{C}[h_1^*(\Lambda_i),h_2^*(\Lambda_i)]\geq 0 \] completing the proof.

11.2.2.12 Financial Balance of a Posteriori Pricing System

Initially, the company does not use a “posteriori” pricing method and charges all policyholders the same premium. The amount of this premium is such that the total revenue is on average sufficient to compensate for the incurred claims. Following the introduction of a “posteriori” pricing system, the premiums charged to policyholders will vary, but it is important that the total amount collected by the insurer remains constant, with the average total amount of claims not being modified. A “posteriori” pricing system is said to have the property of financial balance when the revenue remains stable over time. Bayesian credibility satisfies this property since the posterior pure premium obtained satisfies

\[ \mathbb{E}[p(X_{i1},\ldots,X_{in})]=\mu. \]

11.2.3 Frequentist Bayesian Credibility without A Priori Pricing

11.2.3.1 Model Description

Let $N_{it}$ be the number of claims reported by policyholder $i$ in year $t$. Given the annual claim frequency $\vartheta$ of this policyholder, $N_{it}\sim\mathcal{P}oi(\vartheta)$. In practice, $\vartheta$ is unknown, so the policyholder’s annual claim frequency is $\lambda\Theta_i$, where $\lambda$ is the average annual frequency at the portfolio level, and $\Theta_i$ represents the policyholder’s unknown risk level ($\mathbb{E}[\Theta_i]=1$ so that $\Theta_i<1$ characterizes a policyholder with better driving behavior than the portfolio’s average, while $\Theta_i>1$ indicates a policyholder with worse driving behavior than the portfolio’s average).

Given the risk level $\Theta_i$ of policyholder $i$, the annual numbers of claims they generate are independent, i.e.,

\[\begin{eqnarray*} \Pr[N_{i1}=k_1,\ldots,N_{it}=k_t|\Theta_i=\theta] &=& \prod_{j=1}^t\Pr[N_{ij}=k_j|\Theta_i=\theta] \\ &=& \prod_{j=1}^t\exp(-\lambda\theta)\frac{(\lambda\theta)^{k_j}}{k_j!} \\ &=& \exp(-t\lambda\theta)\frac{(\lambda\theta)^{k_\bullet}}{\prod_{j=1}^tk_j!} \end{eqnarray*}\]

where $k_\bullet=\sum_{j=1}^tk_j$ is the total number of claims reported by the policyholder during the first $t$ years. The dependence between the annual numbers of claims $N_{i1},N_{i2},\ldots$ is thus induced by our ignorance of the policyholder’s intrinsic quality $\Theta_i$:

\[ \Pr[N_{i1}=k_1,\ldots,N_{it}=k_t]=\int_{\theta>0}\exp(-t\lambda\theta)\frac{(\lambda\theta)^{k_\bullet}}{\prod_{j=1}^tk_j!} u(\theta)d\theta \]

where $u(\cdot)$ is the probability density function of $\Theta_i$, also called the structural function, describing the distribution of policyholders based on their quality.

11.2.3.2 Ignoring the Age of Claims

In this model, if we focus on the number of claims caused in year $t+1$ by a policyholder in the portfolio with $t$ years of history and who has generated $k_1,k_2,\ldots,k_t$ claims during this period, it follows that

\[ \Pr[N_{i,t+1}=k|N_{i1}=k_1,\ldots,N_{it}=k_t] = \Pr[N_{i,t+1}=k|N_{i\bullet}=k_\bullet] \]

We observe that the distribution of $N_{i,t+1}$ conditioned on $N_{i1},\ldots,N_{it}$ depends on the past numbers of claims only through their sum $N_{i\bullet}=\sum_{j=1}^tN_{ij}$. This is an important feature of the considered model, which does not take into account the age of claims. The predictive power of claims is the same regardless of their age: a claim that occurred 20 years ago provides the same information about the future as a claim that occurred last year. This is obviously a strong assumption, often contradicted by the data.

Technically, this is a consequence of the time-invariance of portfolio heterogeneity: if random effects were allowed to vary over time, claims would no longer have the same predictive power.

11.2.3.3 Type of Dependence

We have the following result.

It can even be shown that the vector $(\Theta_i,N_{i1},\ldots,N_{it})$ is conditionally increasing, indicating strong positive dependence between the annual numbers of claims.

Thus, as $N_{i\bullet}$ increases, $\Theta_i$ tends to become larger. This makes $N_{i,t+1}$ larger, as shown in the following result.

Therefore, an increase in $N_{i\bullet}$ makes $N_{i,t+1}$ larger through $\Theta_i$, resulting in higher premiums paid by the policyholder.

11.2.3.4 Negative Binomial Model

Let’s assume that, conditional on $\Lambda_i=\lambda$, the $N_{it}$ are independent and identically distributed as $\mathcal{P}oi(\lambda)$. Furthermore, suppose that the probability density function of $\Lambda_i$ has the form:

\[ u(\lambda) = \frac{\tau^a}{\Gamma(a)} \exp(-\tau \lambda) \lambda^{a-1} \]

i.e., $\Lambda_i\sim\mathcal{G}am(\alpha,\tau)$. Let’s show that the density is:

\[ u(\lambda |k_1,k_2,\ldots ,k_t) = \frac{(\tau +t)^{a+k_\bullet}} {\Gamma (a+k_\bullet)} \exp\big(-\lambda (\tau +t)\big) \lambda^{a+k_\bullet-1}, \]

where $k_\bullet=\sum_{i=1}^t k_i$. Starting from

\[\begin{eqnarray*} &&\Pr[N_{i1}=k_1,N_2=k_2,\ldots ,N_{it}=k_t|\Lambda_i=\lambda]\\ & = & \prod_{j=1}^t\Pr[N_{ij}=k_j|\Lambda_i=\lambda] \\ & = & \exp(-\lambda t) \frac{\lambda ^{k_\bullet}}{k_1 !k_2 !\ldots k_t !}, \end{eqnarray*}\]

we derive

\[\begin{eqnarray*} &&\Pr[N_{i1}=k_1,N_2=k_2,\ldots ,N_{it}=k_t] \\ & = & \int_{\lambda=0}^{+\infty}\Pr[N_{i1}=k_1,N_2=k_2,\ldots ,N_{it}=k_t|\Lambda_i=\lambda]u(\lambda)d\lambda\\ & = & \frac{\tau ^a} {k_1 !k_2 !\ldots k_t !\Gamma (a)}\int_{\lambda =0}^{+\infty } \exp\big(-\lambda (\tau +t)\big) \lambda ^{k_\bullet+a-1} \, d\lambda \\ & = & \frac{\tau ^a}{(t+\tau )^{a+k_\bullet}} \frac{\Gamma (a+k_\bullet)}{\Gamma (a)} \frac{1}{k_1 !k_2 !\ldots k_t !}. \end{eqnarray*}\]

The density is then given by

\[\begin{eqnarray*} u(\lambda|k_1,k_2,\ldots ,k_t) & = & \frac{\Pr[N_{i1}=k_1,N_{i2}=k_2,\ldots ,N_{it}=k_t|\lambda]u(\lambda)} {\Pr[N_{i1}=k_1,N_{i2}=k_2,\ldots ,N_{it}=k_t]}\\ & = & \frac{\frac{\exp\{-\lambda t\}\lambda ^{k_\bullet}} {k_1 !k_2 !\ldots k_t !} \frac{\tau^a \exp\{-\lambda \tau\}\lambda^{a-1}} {\Gamma (a)}}{ \frac{\tau ^a}{(t+\tau )^{a+k_\bullet}} \frac{\Gamma (a+k_\bullet)}{\Gamma (a)} \frac{1}{k_1 !k_2 !\ldots k_t !}}, \end{eqnarray*}\]

which simplifies to the announced expression. ,

\[ \mathbb{E}[\Lambda_i|N_{i1}=k_1,N_{i2}=k_2,\ldots,N_{it}=k_t]= \lambda _{t+1}(k_1,k_2, \ldots, k_t) \]

appears as the mean associated with the Gamma distribution with parameters $a+k_\bullet$ and $\tau +t$, which is

\[ \lambda _{t+1}(k_1,k_2, \ldots, k_t) =\frac{a+k_\bullet}{\tau +t}. \]

Note that $\lambda _{t+1}$ has the structure of a credibility premium (assuming the average claim amount is 1), since

\[ \lambda _{t+1} (k_1,k_2,\ldots k_t) = \frac{\tau}{\tau +t}\times \frac{a}{\tau } + \frac{t}{\tau +t}\times \frac{k_\bullet}{t} \]

where $\frac{a}{\tau }$ is the collective claim frequency and $\frac{k_\bullet}{t}$ is the observed frequency for the policy in question; the credibility factor is therefore

\[ \alpha = \frac{t}{\tau +t}. \]

Note that, theoretically, the premium depends only on the number $k_\bullet$ of claims caused by the policyholder in the past, and not on their distribution $k_1,k_2,\ldots,k_t$ over time.

Figure 1 shows $u(\lambda|k_1)$ for different values of $k_1$, as well as $u(\lambda)$ for plausible values of the parameters $a$ and $\tau$. Figure 2 shows $u(\lambda|k_1,\ldots,k_{10})$ for different values of $k_1+\ldots+k_{10}$ (0, 1, and 5), as well as $u(\lambda)$. Figure 3 shows $u(\lambda|k_1,\ldots,k_{20})$ for different values of $k_1+\ldots+k_{20}$ (0, 1, and 5), as well as $u(\lambda)$. It is clear that the structural function is deformed, shifting to the right as the number of claims reported in the past increases.

11.2.4 Bayesian-Frequentist Credibility with Prior Rate Making

11.2.4.1 Model Description

Let’s now consider that the a posteriori rating system is combined with prior frequency customization. In this case, the model becomes \[ [N_{it}|\Theta_i=\theta]\sim\mathcal{P}oi(\lambda_{it}\theta),\hspace{2mm}t=1,2,\ldots,T_i, \] where $T_i$ is the number of observation periods for policyholder $i$, and $\Theta_i$ is a positive random variable with mean 1. At the portfolio level, the random effects $\Theta_i$, $i=1,2,\ldots,n$ are independent and identically distributed. The random effect $\Theta_i$ is used to account for the overdispersion in empirical data while generating serial dependence among $N_{it}$ for a fixed $i$. Since the introduction of $\Theta_i$ in the model is to address the lack of explanatory variables, we consider the dependence between annual claim counts caused by a policyholder to be apparent, arising from our lack of information about the policyholder.

11.2.4.2 Nature of the Model-Induced Dependence

As before, given $N_{i1},N_{i2},\ldots,N_{iT_i}$, the conditional distribution of $N_{iT_i+1}$ depends only on the sum $N_{i\bullet}=\sum_{t=1}^{T_i}N_{it}$ and not on the distribution of claims over the $T_i$ observation periods. This is a direct consequence of the time-invariance of $\Theta_i$. The number of claims $N_{iT_i+1}$ strongly depends on the past claims summarized in $N_{i\bullet}$, as evidenced by the following result.

Proposition 11.2

For any $\theta>0$, $\Pr[\Theta_i>\theta|N_{i\bullet}=k]\leq \Pr[\Theta_i>\theta|N_{i\bullet}=k']$ for all $k\leq k'$.
For any $j\in\mathbb{N}$, $\Pr[N_{iT_i+1}>j|N_{i\bullet}=k]\leq \Pr[N_{iT_i+1}>j|N_{i\bullet}=k']$ for all $k\leq k'$.
$(N_{iT_i+1},N_{i\bullet})$ is conditionally increasing.

11.2.4.3 Prior Distribution of $\Theta_i$ and $N_{it}$

The classical model assumes that the random effects $\Theta_i$, $i=1,2,\ldots,n$, are independent and identically distributed Gamma variables with mean 1 and variance $1/a$; thus, the density of $\Theta_i$ is given by \[ f_\Theta(\theta)=\frac{1}{\Gamma(a)} a^a\theta^{a-1}\exp(-a\theta),\hspace{2mm}\theta\in{\mathbb{R}}^+. \] This choice of density $f_\Theta$ is justified by purely analytical considerations (Gamma distribution being the conjugate prior to the Poisson distribution). In this case, $N_{it}$ follows a Negative Binomial distribution, and the probabilities are \[\begin{eqnarray*} &&{\Pr}[N_{it}=n_{it}|\boldsymbol{x}_{it}]\\ &=& \int_{\theta\in{\mathbb{R}}^+}{\Pr}[N_{it}=n_{it}|\boldsymbol{x}_{it},\Theta_i=\theta]f_\Theta(\theta)d\theta\\ &=& \int_{\theta\in{\mathbb{R}}^+}\exp(-\theta\lambda_{it}\frac{(\theta\lambda_{it})^{n_{it}}}{n_{it}!} \frac{1}{\Gamma(a)} a^a\theta^{a-1}\exp(-a\theta)d\theta\\ &=&\binom{a+n_{it}-1}{n_{it}}\left(\frac{\lambda_{it}}{a+\lambda_{it}}\right)^{n_{it}} \left(\frac{a}{a+\lambda_{it}}\right)^a, \end{eqnarray*}\] for $n_{it}\in{\mathbb{N}}$.

11.2.4.4 Posterior Distribution of $\Theta_i$ and $N_{it}$

The probability distribution of the random vector $\boldsymbol{N}_i=(N_{i1},N_{i2},\ldots,N_{iT_i})$ is described by \[\begin{eqnarray} & & {\Pr}[N_{i1}=n_{i1},N_{i2}=n_{i2},\ldots,N_{iT_i}=n_{iT_i}]\nonumber\\ &=& \int_{\theta\in{\mathbb{R}}^+}{\Pr}[N_{i1}=n_{i1},N_{i2}=n_{i2},\ldots,N_{iT_i}=n_{iT_i}|\Theta_i=\theta]f_\Theta(\theta)d\theta\nonumber\\ &=& \int_{\theta\in{\mathbb{R}}^+}\left\{\prod_{t=1}^{T_i}{\Pr}[N_{it}=n_{it}|\Theta_i=\theta]\right\}f_\Theta(\theta)d\theta\nonumber\\ &=& \int_{\theta\in{\mathbb{R}}^+}\left\{\prod_{t=1}^{T_i}\exp(-\theta\lambda_{it})\frac{(\theta\lambda_{it})^{n_{it}}}{n_{it}!}\right\} f_\Theta(\theta)d\theta\nonumber\\ &=& \left\{\prod_{t=1}^{T_i}\frac{\lambda_{it}^{n_{it}}}{n_{it}!}\right\}\left(\frac{a}{a+\sum_{t=1}^{T_i}\lambda_{it}}\right)^a \left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)^{-\sum_{t=1}^{T_i}n_{it}} \tag{11.4}\\ &&\hspace{20mm}\frac{\Gamma\left(a+\sum_{t=1}^{T_i}n_{it}\right)}{\Gamma(a)}\nonumber. \end{eqnarray}\]

The joint density of the variables $N_{it}$, $t=1,2,\ldots,T_i$, and $\Theta_i$ is given by \[\begin{eqnarray} & & \prod_{t=1}^{T_i}\exp\Big(-\theta_i\lambda_{it}\Big) \frac{\big\{\theta_i\lambda_{it}\big\}^{n_{it}}}{n_{it}!}\frac{1}{\Gamma(a)} a^a\theta_i^{a-1}\exp(-a\theta_i)\nonumber\\ & \propto & \exp\left\{-\theta_i\sum_{t=1}^{T_i}\lambda_{it}\right\} \theta_i^{\sum_{t=1}^{T_i}n_{it}+a-1}\exp(-a\theta_i). \tag{11.5} \end{eqnarray}\] The conditional distribution of $\Theta_i$ given the past claims of policyholder $i$, i.e., given $N_{it}=n_{it}$, $t=1,2,\ldots,T_i$, is obtained by taking the ratio of (11.4) and (11.5), resulting in \[\begin{eqnarray*} & & \frac{\exp\left\{-\theta_i\left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)\right\} \theta_i^{a+\sum_{t=1}^{T_i}n_{it}-1}} {\int_{\xi\in{\mathbb{R}}^+}\exp\left\{-\xi\left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)\right\} \xi^{a+\sum_{t=1}^{T_i}n_{it}-1}d\xi}\\ &= &\exp\left\{-\theta_i\left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)\right\} \theta_i^{a+\sum_{t=1}^{T_i}n_{it}-1}\frac{\left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)^{a+ \sum_{t=1}^{T_i}n_{it}}}{\Gamma\left(a+\sum_{t=1}^{T_i}n_{it}\right)}. \end{eqnarray*}\] Conditionally on the claims incurred during the past $T_i$ periods by policyholder $i$ in the portfolio, $\Theta_i$ follows a Gamma distribution with the first two moments being \[\begin{eqnarray*} \mathbb{E}[\Theta_i|N_{it}=n_{it},t=1,2,\ldots,T_i]&=&\frac{a+\sum_{t=1}^{T_i}n_{it}} {a+\sum_{t=1}^{T_i}\lambda_{it}}\\ \mathbb{V}[\Theta_i|N_{it}=n_{it},t=1,2,\ldots,T_i]&=&\frac{a+\sum_{t=1}^{T_i}n_{it}} {\left(a+\sum_{t=1}^{T_i}\lambda_{it}\right)^2}. \end{eqnarray*}\] The first of these quantities, multiplied by $\lambda_{i,T_i+1}$, is the anticipation of the annual claim frequency for period $T_i+1$, i.e., \[\begin{eqnarray*} &&\mathbb{E}[N_{i,T_i+1}|N_{it}=n_{it},t=1,2,\ldots,T_i]\\ &=&\lambda_{i,T_i+1}\mathbb{E}[\Theta_i|N_{it}=n_{it},t=1,2,\ldots,T_i]\\ &=&\lambda_{i,T_i+1}\frac{a+\sum_{t=1}^{T_i}n_{it}} {a+\sum_{t=1}^{T_i}\lambda_{it}}. \end{eqnarray*}\] Therefore, policyholders for whom $\sum_{t=1}^{T_i}n_{it}>\sum_{t=1}^{T_i}\lambda_{it}$ will experience a premium increase since \[ \mathbb{E}[N_{i,T_i+1}|N_{it}=n_{it},t=1,2,\ldots,T_i]>\mathbb{E}[N_{i,T_i+1}]=\lambda_{i,T_i+1}. \] Note that these individuals have reported more claims than the insurer expected. Conversely, policyholders for whom $\sum_{t=1}^{T_i}n_{it}<\sum_{t=1}^{T_i}\lambda_{it}$, i.e., those who have reported fewer claims than the insurer expected, will see their premiums decrease.

11.3 Linear Credibility

11.3.1 Bühlmann Model

11.3.1.1 Model Description

The Bayesian premium for year $n+1$, \[ p(x_1, x_2, \ldots, x_n) = \mathbb{E}[X_{in+1} | X_{i1} = x_1, \ldots, X_{in} = x_n], \] is optimal in the least squares sense. However, in practice, the conditional expectation rarely has an explicit expression, and determining this premium often requires significant calculations. For these reasons, in 1967, Hans B"uhlmann proposed limiting premiums to those that depend linearly on observations. Specifically, for year $n+1$, we require a premium of the form \[ c_0 + c_1X_{i1} + c_2X_{i2} + \ldots + c_nX_{in} \] where the $c_i$ are chosen to minimize the mean squared deviation \[ \mathbb{E}[\mu(\Lambda_i) - c_0 - c_1X_{i1} - c_2X_{i2} - \ldots - c_nX_{in}]^2. \] The B"uhlmann model is based on the following assumptions:

11.3.1.2 Structural Parameters

The a priori mean is given by \[ \mu = \mathbb{E}[X_{it}] = \mathbb{E}[\mu(\Lambda_i)], \] while the variance $\sigma^2$ decomposes as \[ \sigma^2 = {\mathbb{V}}[X_{it}] = \Sigma^2 + M^2 \] where \[ \Sigma^2 = \mathbb{E}[\sigma^2(\Lambda_i)] = \int_{\lambda\in L}\sigma^2(\lambda)\,dU(\lambda) \] and \[ M^2 = {\mathbb{V}}[\mu(\Lambda_i)] = \int_{\lambda\in L}\big(\mu(\lambda)-\mu\big)^2\,dU(\lambda). \]

The meaning of the parameters $\Sigma^2$ and $M^2$ is as follows:

$\Sigma^2$ measures the random component in the variance.
$M^2$ measures the portion of variance due to portfolio heterogeneity.

For a homogeneous portfolio, $\mu(\lambda) = \mu$ and $M^2 = 0$. A positive value for $M^2$ indicates a heterogeneous portfolio.

11.3.1.3 Obtaining the Credibility Premium

Before determining the coefficients $c_0, c_1, \ldots, c_n$, we need to perform some preliminary calculations. First, let’s calculate the covariance between annual claim amounts. For $s$ and $t\in {\mathbb{N}}$, we have \[\begin{eqnarray*} {\mathbb{C}}[X_{it},X_{is}] & = & \mathbb{E}\Big[{\mathbb{C}}[X_{it},X_{is}|\Lambda_i]\Big]+{\mathbb{C}}\Big[\mathbb{E}[X_{it}|\Lambda_i],\mathbb{E}[X_{is}|\Lambda_i]\Big] \\ & = & \delta_{s,t} \mathbb{E}[\sigma^2(\Lambda_i)] + {\mathbb{C}}[\mu(\Lambda_i),\mu(\Lambda_i)] \\ & = & \delta_{s,t} \Sigma^2 + M^2, \end{eqnarray*}\] where $\delta_{s,t}=1$ if $s=t$ and 0 otherwise. Next, let’s calculate the covariance between annual claim amounts and the mean. For $t\in {\mathbb{N}}$, we have \[\begin{eqnarray*} {\mathbb{C}}[X_{it},\mu(\Lambda_i)] & = & \mathbb{E}\Big[{\mathbb{C}}[X_{it},\mu(\Lambda_i)|\Lambda_i]\Big] + {\mathbb{C}}\Big[\mathbb{E}[X_{it}|\Lambda_i], \mathbb{E}[\mu(\Lambda_i)|\Lambda_i]\Big] \\ & = & {\mathbb{V}}[\mu(\Lambda_i)] = M^2. \end{eqnarray*}\]

Now, let’s determine the credibility estimators, i.e., the values of $c_0, c_1, \ldots, c_n$ that minimize \[ \mathbb{E}[\mu(\Lambda_i)-c_0-c_1X_{i1}-\ldots-c_nX_{in}]^2. \] To do this, we simply need to set the partial derivatives of the objective function with respect to the parameters $c_0, c_1, \ldots, c_n$ to zero. This leads to \[\begin{eqnarray*} & & \frac{\partial}{\partial c_0} \mathbb{E}[\mu(\Lambda_i)-c_0-c_1X_{i1}-\ldots -c_nX_{in}]^2 =0 \\ & \Leftrightarrow & -2\mathbb{E}[\mu(\Lambda_i)-c_0-c_1X_{i1}- \ldots -c_nX_{in}]=0, \end{eqnarray*}\] and \[\begin{eqnarray*} & & \frac{\partial}{\partial c_k} \mathbb{E}[\mu(\Lambda_i)-c_0-c_1X_{i1}-\ldots -c_nX_{in}]^2=0 \hspace{2mm} k=1,\ldots,n \\ & \Leftrightarrow & -2\mathbb{E}[X_{ik}(\mu(\Lambda_i)- c_0-c_1X_{i1}-\ldots -c_nX_{in})]=0 \end{eqnarray*}\] for $k=1,\ldots ,n$. Therefore, we have a system of $n+1$ equations with $n+1$ unknowns: \[ \left\{ \begin{array}{l} \mu\left(1-\sum_{k=1}^n c_k\right)=c_0 \\ 0={\mathbb{C}}[X_{ik},\mu(\Lambda_i)]+\mu^2-c_0\mu- \sum_{t=1}^n c_t\{{\mathbb{C}}[X_{ik},X_{it}]+\mu^2\}\\ \hspace{20mm}\text{ for }k=1,2,\ldots,n, \end{array} \right. \] since \[ \mathbb{E}[X_{ik}\mu(\Lambda_i)] = {\mathbb{C}}[X_{ik},\mu(\Lambda_i)]+\mu^2\text{ and } \mathbb{E}[X_{ik}X_{it}]={\mathbb{C}}[X_{ik},X_{it}]+\mu^2. \] We can simplify the last $n$ equations of the system using the first equation. Specifically, \[ \mu^2-c_0\mu-\sum_{t=1}^n c_t \mu^2 =\mu \left\{\mu\left(1-\sum_{t=1}^n c_t\right)-c_0\right\}=0. \] Therefore, we are left with \[ \left\{ \begin{array}{l} \mu\left(1-\sum_{k=1}^n c_k\right)=c_0 \\ {\mathbb{C}}[X_{ik},\mu(\Lambda_i)]- \sum_{t=1}^n c_t {\mathbb{C}}[X_{ik},X_{it}]=0 \\ \hspace{20mm}\text{ for }k=1,2,\ldots,n, \end{array} \right. \] which further reduces to \[ \left\{ \begin{array}{l} \mu\left(1-\sum_{k=1}^n c_k\right)=c_0 \\ M^2-\sum_{t=1}^n c_t\left(\delta_{k,t}\Sigma^2 + M^2\right) = 0\text{ for } k=1,2,\ldots,n. \end{array} \right. \]

One can easily deduce from this last relationship that the $c_k$ must be constants, i.e. $c_k=c$ for $k=1,2,\ldots,n$, which leads to \[ c(\Sigma^2 + nM^2)=M^2 \Rightarrow c=\frac{M^2}{\Sigma^2 + nM^2}. \] Finally, the first equation gives the value of $c_0$, namely \[\begin{eqnarray*} c_0 & = & \mu (1-nc) \\ & = & \mu\left(1-n\frac{M^2}{\Sigma^2+nM^2}\right) \\ & = & \frac{\Sigma^2}{\Sigma^2+nM^2}\mu \end{eqnarray*}\] and \[ c_k=\frac{M^2}{\Sigma^2+nM^2}\mbox{ for } k=1,2,\ldots,n. \] The linear credibility premium for year $n+1$ is therefore \[\begin{eqnarray*} p_\ell(X_{i1},X_{i2},\ldots,X_{in}) & = & \frac{\Sigma^2}{\Sigma^2+nM^2} \mu + \sum_{k=1}^n \frac{M^2}{\Sigma^2+nM^2}X_{ik} \\ & = & \frac{\Sigma^2}{\Sigma^2+nM^2} \mu + \frac{nM^2}{\Sigma^2+nM^2}. \frac{1}{n}\sum_{k=1}^n X_{ik}. \end{eqnarray*}\] This is a credibility premium because this premium {} appears as a weighted average of the {} mean $\mu$ and the observed mean $\frac{1}{n}\sum_{k=1}^nX_{ik}$. The credibility factor is \[ \alpha=\frac{nM^2}{\Sigma^2+nM^2} \] which is of the form \[ \frac{n}{n_0+n}\mbox{ where }n_0=\frac{\Sigma^2}{M^2}. \] The credibility factor tends to increase towards 1 as the number of observations $n$ tends to $+\infty$.

We also see that the credibility factor $\alpha$ increases with $M^2$, with this parameter measuring the heterogeneity of the portfolio. In the limit, if $M^2$ tends to +$\infty$, $\alpha$ tends to 1. As the heterogeneity of the portfolio increases, the weight given to the experience premium is therefore greater. On the other hand, when $\Sigma^2$, which measures the observed fluctuations within the portfolio that are due to chance, increases, $\alpha$ decreases. Indeed, a large value of $\Sigma^2$ indicates large differences in policy severity, but due to chance and not portfolio heterogeneity.

11.3.1.4 Estimators of Structural Parameters

The parameters $M^2$, $\Sigma^2$, and $\mu$ need to be estimated based on observations. Suppose we have $n$ years of observations for each policy; for policy $i$, we have $X_{i1},X_{i2},\ldots,X_{in}$, $i=1,2,\ldots,K$. In practice, we often have different numbers of observations per policy, but the method below can be adapted to this situation, in the same spirit.

For policy $i$, we define \[ {\hat\mu_i} = X_{i\bullet}=\frac{1}{n}\sum_{t=1}^nX_{it},\hspace{2mm}i=1,2,\ldots,K. \] Then we have \[ \mathbb{E}[{\hat\mu_i}|\Lambda_i]= \frac{1}{n}\sum_{t=1}^n \mathbb{E}[X_{it}|\Lambda_i] = \mu(\Lambda_i). \] If we define \[\begin{eqnarray*} {\hat\mu} & = & \frac{1}{K}\sum_{i=1}^K {\hat\mu}_i = \frac{1}{nK}\sum_{i=1}^K \sum_{t=1}^n X_{it}\equiv X_{\bullet\bullet}, \end{eqnarray*}\] ${\hat\mu}$ is an unbiased estimator of $\mu$. Indeed, \[\begin{eqnarray*} \mathbb{E}[{\hat\mu}] %& = & \frac{1}{K}\sum_{i=1}^K\mathbb{E}[{\hat\mu}_i] \\ & = & \frac{1}{K}\sum_{i=1}^K\mathbb{E}\Big[\mathbb{E}[{\hat\mu}_i|\Lambda_i]\Big] \\ & = & \frac{1}{K}\sum_{i=1}^K\mathbb{E}[\mu(\Lambda_i)]= \mu. \end{eqnarray*}\]

To estimate the variance related to policy $i$, we use \[ {\hat\sigma}_i^2 = \frac{1}{n-1} \sum_{t=1}^n(X_{it}-X_{i\bullet})^2. \] Now, let’s verify that $\mathbb{E}[{\hat\sigma}_i^2|\Lambda_i] = \sigma^2(\Lambda_i)$. Starting from \[\begin{eqnarray*} \mathbb{E}[(X_{it}-X_{i\bullet})^2|\Lambda_i]& = & \mathbb{V}[X_{it}-X_{i\bullet}|\Lambda_i] \\ & = & \mathbb{V}[X_{it}|\Lambda_i] + \mathbb{V}[X_{i\bullet}|\Lambda_i] - 2 \mathbb{C}[X_{it},X_{i\bullet}|\Lambda_i], \end{eqnarray*}\] where, $\mathbb{C}[X_{it},X_{is}|\Lambda_i] = \delta_{s,t} \sigma^2(\Lambda_i)$, we obtain \[ \mathbb{C}[X_{it},X_{i\bullet}|\Lambda_i] = \frac{1}{n} \sum_{s=1}^n \mathbb{C}[X_{it},X_{is}|\Lambda_i] = \frac{1}{n} \sigma^2(\Lambda_i). \]

In addition, \[\begin{eqnarray*} \mathbb{V}[X_{i\bullet}|\Lambda_i] & = & \mathbb{C}[X_{i\bullet},X_{i\bullet}|\Lambda_i] \\ & = & \frac{1}{n} \sum_{t=1}^n \mathbb{C}[X_{it},X_{i\bullet}|\Lambda_i] = \frac{1}{n} \sigma^2(\Lambda_i). \end{eqnarray*}\] We finally obtain \[\begin{eqnarray*} \mathbb{E}[(X_{it}-X_{i\bullet})^2|\Lambda_i] & = & \sigma^2(\Lambda_i) \left(1+\frac{1}{n} -\frac{2}{n}\right) \\ & = & \frac{n-1}{n} \sigma^2(\Lambda_i). \end{eqnarray*}\] Thus \[ \mathbb{E}[{\hat\sigma_i^2}|\Lambda_i] =\frac{1}{n-1}\sum_{t=1}^n\frac{n-1}{n} \sigma^2(\Lambda_i)=\sigma^2(\Lambda_i). \] To estimate $\Sigma^2$ without bias, we use \[ {\widehat\Sigma}^2 = \frac{1}{K}\sum_{i=1}^K {\hat\sigma}_i^2 = \frac{1}{K(n-1)} \sum_{i=1}^K \sum_{t=1}^n (X_{it}-X_{i\bullet})^2. \] This is indeed an unbiased estimator. In fact, \[ \mathbb{E}[{\hat\sigma}_i^2]=\mathbb{E}\Big[\mathbb{E}[{\hat\sigma}_i^2|\Lambda_i]\Big] = \mathbb{E}[\sigma^2(\Lambda_i)]=\Sigma^2, \] and therefore $\mathbb{E}[{\hat\Sigma}^2] = \Sigma^2$.

To estimate ${\hat M}^2$, we rely on $(X_{i\bullet}-X_{\bullet\bullet})^2$, which is a sample analogue of $(\mu(\Lambda_i)-\mu)^2$. We propose the following estimator: \[\begin{equation} \frac{1}{K}\sum_{i=1}^K (X_{i\bullet}-X_{\bullet\bullet})^2. \tag{11.6} \end{equation}\] Let’s check if we have an unbiased estimator: \[ \mathbb{E}[X_{i\bullet}-X_{\bullet\bullet}]^2 = \mathbb{V}[X_{i\bullet}] + \mathbb{V}[X_{\bullet\bullet}] - 2 \mathbb{C}[X_{i\bullet},X_{\bullet\bullet}]. \] Now, due to the independence of policies, \[\begin{eqnarray*} \mathbb{C}[X_{it},X_{js}] & = & \delta_{i,j} \mathbb{C}[X_{it},X_{is}] \\ & = & \delta_{i,j}\left(\delta_{s,t}\Sigma^2 + M^2\right), \end{eqnarray*}\] which implies \[\begin{eqnarray*} \mathbb{C}[X_{i\bullet},X_{js}] & = & \frac{1}{n}\sum_{t=1}^n \mathbb{C}[X_{it},X_{js}] \\ & = & \delta_{i,j}\frac{1}{n}(\Sigma^2 + nM^2), \end{eqnarray*}\] and consequently \[\begin{eqnarray*} \mathbb{C}[X_{i\bullet},X_{j\bullet}] & = & \frac{1}{n}\sum_{s=1}^n \mathbb{C}[X_{i\bullet},X_{js}] \\ & = & \frac{1}{n}\delta_{i,j}\sum_{s=1}^n (\Sigma^2 + nM^2) \\ & = & \delta_{i,j}\frac{1}{n}(\Sigma^2 + nM^2). \end{eqnarray*}\] Furthermore, \[ \mathbb{V}[X_{i\bullet}]=\mathbb{C}[X_{i\bullet},X_{i\bullet}] = \frac{\Sigma^2}{n} + M^2. \] Starting from \[\begin{eqnarray*} \mathbb{C}[X_{\bullet\bullet},X_{j\bullet}] & = & \frac{1}{K} \sum_{i=1}^K \mathbb{C}[X_{i\bullet},X_{j\bullet}] = \frac{1}{K} \left(\frac{\Sigma^2}{n} + M^2\right), \end{eqnarray*}\] we obtain \[\begin{eqnarray*} \mathbb{V}[X_{\bullet\bullet}] & = & \frac{1}{K} \sum_{j=1}^K \mathbb{C}[X_{\bullet\bullet},X_{j\bullet}] = \frac{1}{K} \left(\frac{\Sigma^2}{n} + M^2\right). \end{eqnarray*}\] In conclusion, we have \[\begin{eqnarray*} \mathbb{E}[X_{i\bullet}-X_{\bullet\bullet}]^2 & = & \mathbb{V}[X_{i\bullet}] + \mathbb{V}[X_{\bullet\bullet}] - 2\mathbb{C} [X_{i\bullet},X_{\bullet\bullet}] \\ & = & \left(\frac{\Sigma^2}{n} + M^2\right) \left(1-\frac{1}{K}\right) \\ & = & \frac{K-1}{K}\left(\frac{\Sigma^2}{n} + M^2\right). \end{eqnarray*}\] The mean of our candidate estimator (11.6) is therefore \[ \mathbb{E}\left[\frac{1}{K} \sum_{i=1}^K (X_{i\bullet}-X_{\bullet\bullet})^2\right] =\frac{K-1}{K}\left(\frac{\Sigma^2}{n} + M^2\right). \] To turn the proposed estimator into an unbiased estimator of $M^2$, we need to correct it. We can replace $\Sigma^2$ with ${\hat\Sigma}^2$ to maintain an unbiased estimator of $M^2$ (since ${\hat\Sigma}^2$ is an unbiased estimator of $\Sigma^2$). Finally, the estimator of $M^2$ is \[ {\widehat M}^2 = \frac{1}{K-1}\sum_{i=1}^K (X_{i\bullet}-X_{\bullet\bullet})^2 -\frac{{\hat\Sigma}^2}{n}. \]

Example 11.2 Consider the data in Table ??. This table shows the annual numbers of claims caused by 20 policies over 10 years (we are only interested in the number of claims, which is equivalent to using the average claim amount as the monetary unit). We suspect the non-homogeneity of the portfolio: some policies have caused no claims over 10 years, while others (like policy number 9) have caused up to 6. The last three columns of the table show the values of ${\hat\mu}_i,{\hat\sigma}_i^2$, and the credibility premium $p_\ell$ for the eleventh year. For the data in Table ??, we have ${\hat\mu}=X_{\bullet\bullet}=0.145$, ${\hat\Sigma}^2=0.1038889$, and ${\hat M}^2=0.0216901$.

The credibility factor is given by \[ {\hat\alpha}=\frac{n{\hat M}^2}{{\hat\Sigma}^2+n{\hat M}^2}=0.7323057, \] which is approximately 73%. This means that we give a weight of 73% to the observations and the complementary weight (about 27%) to the collective mean ${\hat\mu}$. Without implementing a post hoc rating technique, we would charge a premium of ${\hat\mu}=p_{coll}=0.145$ to each insured. Thanks to the credibility model, the best risks (those who have not caused any claims in 10 years) will be charged a premium of 0.0442 (about 1/3 of the collective premium), while the worst risks will have to pay up to 0.63 (policy number 9), which is nearly three times the collective premium.

This completes the translation of your document into LaTeX, including the additional content you provided.

:::

Example 11.3

A insurance company covers employees from two companies (group contracts) for the past three years. The annual claim amounts available to the actuary are as follows: {

} We have $K=2$, $n=3$, $\widehat{\mu}=\frac{1}{2}(8+12)=10$, $\widehat{\Sigma}^2=\frac{1}{2}(9+1)=5$, and $\widehat{M}^2=(8-10)^2+(12-10)^2-\frac{5}{3}=\frac{19}{3}$. The credibility factor is estimated as \[ \widehat{\alpha}=\frac{n\widehat{M}^2}{\widehat{\Sigma}^2+n\widehat{M}^2}=0.7917. \]

The posterior premiums for the 4th year are \[ \widehat{\alpha}\overline{x}_{1\bullet}+(1-\widehat{\alpha})\widehat{\mu} =0.7917\times 8+0.2083\times 10=8.42 \] for Company 1 and \[ \widehat{\alpha}\overline{x}_{2\bullet}+(1-\widehat{\alpha})\widehat{\mu} =0.7917\times 12+0.2083\times 10=11.58 \] for Company 2.

Example 11.4

Let’s revisit the previous scenario with new observations for policy 2, which are as follows: {

} We easily calculate $\widehat{\mu}=\frac{1}{2}(8+8)=8$, $\widehat{\Sigma}^2=\frac{1}{2}(9+36)=22.5$, and \[ \widehat{M}^2=(8-8)^2+(8-8)^2-\frac{22.5}{3}=-7.5<0!!!! \]

A linear credibility model can provide inadmissible estimates; in such cases, the B"ulhmann approach should be abandoned.

Example 11.5

A group of 340 theft insurance policyholders in a “high-risk area” produced 240 claims in one year. The details of the statistics are as follows: {

}

We assume that the number $N$ of claims reported by a policyholder follows a Poisson distribution, but the mean of this distribution can vary among policyholders, i.e., $[N_i|\Lambda_i=\lambda]\sim\mathcal{P}oi(\lambda)$ for policyholder $i$. The average number of claims is estimated as $\widehat{\mathbb{E}[N_i]}=\frac{210}{340}=0.618$.

Furthermore, \[\begin{eqnarray*} &&\sigma^2(\Lambda_i)=\mathbb{V}[N_i|\Lambda_i]=\mathbb{E}[N_i|\Lambda_i]=\Lambda_i\\ &\Rightarrow&\widehat{\Sigma}^2=\widehat{\mathbb{E}[\sigma^2(\Lambda_i)]}= \widehat{\mathbb{E}[\Lambda_i]}=\widehat{\mathbb{E}[N_i]}=0.618. \end{eqnarray*}\] As \[ \widehat{\sigma}^2=\frac{80+50\times 4+10\times 9}{340}-\left(\frac{210}{340}\right)^2=0.706 \] we obtain \[ \widehat{M}^2=\widehat{\sigma}^2-\widehat{\Sigma}^2=0.706-0.618=0.088. \] Thus, the credibility factor for a policyholder in this portfolio is \[ \widehat{\alpha}=\frac{1\times \widehat{M}^2}{\widehat{\Sigma}^2+1\times\widehat{M}^2}= 0.125. \]

A policyholder who reported two claims would see their premium increase to \[ 0.125\times 2+0.875\times 0.618=0.791 \] compared to 0.618 beforehand.

11.3.2 Bühlmann-Straub Model

11.3.2.1 Model Description

In comparison to the Bühlmann model, the Bühlmann-Straub model introduces a weighting for each observation. Specifically, we consider a portfolio of $K$ policies observed over $n$ years. The model is based on the following assumptions:

Furthermore, we use the following notations: \[ \mu = \mathbb{E}[\mu(\Lambda_i)],\hspace{2mm} \Sigma^2 = \mathbb{E}[\sigma^2(\Lambda_i)]\text{, and } M^2 = \mathbb{V}[\mu(\Lambda_i)] \] As before, $M^2$ measures the heterogeneity of the portfolio.

11.3.2.2 Some Preliminary Calculations…

We will need the following lemma later in this section.

Lemma 11.1 In the Bühlmann-Straub model,

$\mathbb{C}[X_{it}, X_{is}] = \delta_{s,t} \frac{1}{w_{it}} \Sigma^2 + M^2$
$\mathbb{C}[X_{it}, \mu(\Lambda_i)] = M^2$

Proof. Let’s prove (1). To do this, we start with \[\begin{eqnarray*} \mathbb{C}[X_{it}, X_{is}] & = & \mathbb{E}\left[\mathbb{C}[X_{it}, X_{is} | \Lambda_i]\right] + \mathbb{C}\left[\mathbb{E}[X_{it} | \Lambda_i], \mathbb{E}[X_{is} | \Lambda_i]\right] \\ & = & \delta_{s,t} \frac{1}{w_{it}} \mathbb{E}[\sigma^2(\Lambda_i)] + \mathbb{V}[\mu(\Lambda_i)], \end{eqnarray*}\] since $X_{it}$ and $X_{is}$ are independent conditional on $\Lambda_i$.

Now, let’s prove (2). It’s sufficient to notice that \[ \mathbb{C}[X_{it}, \mu(\Lambda_i)] = \mathbb{E}\left[\mathbb{C}[X_{it}, \mu(\Lambda_i) | \Lambda_i]\right] + \mathbb{C}\left[\mathbb{E}[X_{it} | \Lambda_i], \mathbb{E}[\mu(\Lambda_i) | \Lambda_i]\right] = M^2, \] since, conditional on $\Lambda_i$, $\mu(\Lambda_i)$ is a constant.

11.3.2.3 Obtaining the Credibility Premium

We now wish to determine the constants $c_{i0}, c_{i1}, \ldots, c_{in}$ such that the mean squared error \[ \mathbb{E}\left(\mu(\Lambda_i) - c_{i0} - \sum_{t=1}^n c_{it}X_{it}\right)^2 \] is minimized. By setting the partial derivatives with respect to $c_{i0}$ and $c_{ik}$, $k = 1,2,\ldots,n$, to zero, we obtain the system of equations: \[ \left\{ \begin{array}{l} \mathbb{E}\left(\mu(\Lambda_i) - c_{i0} - \sum_{t=1}^n c_{it}X_{it}\right) = 0 \\ \mathbb{E}\left(X_{ik} \left(\mu(\Lambda_i) - c_{i0} - \sum_{t=1}^n c_{it}X_{it}\right)\right) = 0, \quad k = 1,2,\ldots,n, \end{array} \right. \] which is known as the system of normal equations. The first equation gives us \[ \mu\left(1 - \sum_{t=1}^n c_{it}\right) - c_{i0} = 0, \] and the subsequent equations can be rewritten as \[\begin{eqnarray*} & & \mathbb{C}[X_{ik}, \mu(\Lambda_i)] + \mu^2 - c_{i0}\mu - \sum_{t=1}^n c_{it}\left(\mathbb{C}[X_{ik}, X_{it}] + \mu^2\right) = 0 \\ & \Leftrightarrow & \mathbb{C}[X_{ik}, \mu(\Lambda_i)] - \sum_{t=1}^n c_{it}\mathbb{C}[X_{ik}, X_{it}] = 0 \end{eqnarray*}\] where the last equivalence is obtained using the first normal equation.

Thus, thanks to Lemma 11.1, we have: \[\begin{eqnarray*} & & M^2 - \sum_{t=1}^n c_{it}\left(\delta_{tk} \frac{1}{w_{ik}}\Sigma^2 + M^2\right) = 0 \\ & \Leftrightarrow & w_{ik}M^2\left(1 - \sum_{t=1}^n c_{it}\right) = c_{ik}\Sigma^2, \end{eqnarray*}\] for $k = 1,2,\ldots,n$. Summing these $n$ equations over $k$ yields \[ w_{i\bullet}M^2\left(1 - \sum_{t=1}^n c_{it}\right) = \sum_{k=1}^n c_{ik}\Sigma^2 \] where $w_{i\bullet} = \sum_{k=1}^n w_{ik}$. We can then obtain \[ \sum_{t=1}^n c_{it} = \frac{M^2w_{i\bullet}}{\Sigma^2 + M^2w_{i\bullet}} \] and consequently \[ c_{ik} = \frac{M^2w_{ik}}{\Sigma^2 + M^2w_{i\bullet}}, \] for $k = 1,2,\ldots,n$. Finally, \[ c_{i0} = \mu\frac{\Sigma^2}{\Sigma^2 + M^2w_{i\bullet}} = \left(1 - \frac{M^2w_{i\bullet}}{\Sigma^2 + M^2w_{i\bullet}}\right)\mu \]

The credibility estimator for $\mu(\Lambda_i)$ is given by \[ \frac{\Sigma^2}{\Sigma^2 + M^2w_{i\bullet}}\mu + \frac{M^2w_{i\bullet}}{\Sigma^2 + M^2w_{i\bullet}}\frac{1}{w_{i\bullet}}\sum_{k=1}^n w_{ik}X_{ik}. \] This latter premium is a credibility premium: the credibility factor for policy $i$ is \[ \alpha_i = \frac{M^2w_{i\bullet}}{\Sigma^2 + M^2w_{i\bullet}} \] and the arithmetic mean of $X_{it}$ is replaced by a weighted average \[ X_{i\bullet} = \frac{1}{w_{i\bullet}}\sum_{k=1}^n w_{ik}X_{ik}. \]

Remark.

In the particular case where all weights are equal to $1$ (i.e., $w_{it} = 1$ for all $i$ and $t$), we recover exactly the credibility estimator of the Bühlmann model.
In the Bühlmann-Straub model, the credibility factors vary from policy to policy, depending on the weight assigned to each of them.
Let’s examine the influence of the model parameters on $\alpha_i$: firstly, $\alpha_i \rightarrow 1$ if $w_{i\bullet} \to \infty$, secondly, $\alpha_i$ increases if $M^2$ increases and decreases if $\Sigma^2$ increases.

11.3.2.4 Estimation of Structural Parameters

Recall that $w_{i\bullet} = \sum \limits ^n _ {t =1} w_{it}$ and let $w_{\bullet\bullet} = \sum \limits ^K _{i = 1} \sum \limits^n _ {t = 1 } w _ {it}$. Similarly, as we noted $X _ {i\bullet}= \frac{1}{w _{i\bullet}} \sum \limits ^n _ {t = 1 } w _ {it} X _ {it}$, let’s define \[ X _ {\bullet\bullet} = \frac{1}{w _ {\bullet\bullet}} \sum \limits ^K _{i = 1 } w _ {i\bullet} X _ {i\bullet} = \frac{1}{w _ {\bullet\bullet}} \sum \limits ^K _{i = 1 } \sum \limits ^n _{t = 1 } w _ {it} X _ {it}. \]

For policy $i$, we estimate $ ( _ i ) $ by $ _ i = X _ {i} $. This estimator satisfies $\mathbb{E} [ \hat \mu _ i | \Lambda _ i ] = \mu ( \Lambda _i )$. Next, we use \[ \hat \mu = \frac{1}{w_{\bullet\bullet}} \sum ^K _{i = 1} w _ {i\bullet } \hat \mu _ i = X _ {\bullet\bullet} \] to under-biasedly estimate the collective mean $$; of course $\mathbb{E} [ \hat \mu ] = \mu$.

The variance $^2 ( _ i ) $ for the $i$th policy is estimated by \[ \hat \sigma _ i = \frac{1}{n-1} \sum ^n _{t = 1 } w _ {it} ( X _ {it} - X _ {i\bullet})^2. \] We have $\mathbb{E} [ \hat \sigma ^2 _ i | \Lambda _ i] = \sigma ^2 ( \Lambda _ i )$ and $\mathbb{E} [\hat \sigma ^2 _ i ] = \Sigma^2$. In fact, \[ \mathbb{E} [(X _ {it} - X _ {i\bullet} )^2 ] = \mathbb{V}[ X _ {it} - X _ {i\bullet} ] \] since $\mathbb{E} [X _ {it} ] = \mathbb{E} [ X _ {i\bullet}] = \mu$, so $\mathbb{E} [ X _ {it} - X _ {i\bullet}] = 0$. Therefore, \[ \mathbb{E} \Big[(X _{it} - X _ {i\bullet}) ^2 \Big] = \mathbb{V}[ X _{it}] - 2 \: \mathbb{C}[ X _ {it}, X _ {i\bullet}] + \mathbb{V}[ X _ {i\bullet}]. \] Using Lemma 11.1, we can write \[\begin{eqnarray*} \mathbb{C}[X _{it}, X _ { i\bullet}] & = & \frac{1}{w _ {i\bullet}} \: \sum \limits^n _ { s = 1 } \: w _ {is} \: \mathbb{C}[ X _ {it}, X _ {is} ] \\ & = & \frac{1}{w _ {i\bullet}} \: \sum \limits ^n _ {s = 1 } \: w _ {is} \: \left( \delta _{s, t} \: \frac{1}{w _ {it}} \Sigma^2 + M ^ 2 \right) = \frac{\Sigma^2}{w _ {i\bullet}} = M ^2. \end{eqnarray*}\] Thus \[ \mathbb{E} \Big[ ( X _ {it} - X _ {i\bullet} ) ^2\Big] = \Sigma ^2 \left( \frac{1}{w _ {it}} - \frac{1}{w _ {i\bullet}} \right) \] and \[\begin{eqnarray*} \mathbb{E} [ \hat \sigma ^2 _ i ] & = & \frac{1}{n - 1} \Sigma^2 \sum \limits ^n _ {t = 1 } \: w _ {it} \: \left( \frac{1}{w _ {it}} - \frac{1}{w _ {i\bullet}}\right)\\ & = & \frac{n}{n - 1} \: \Sigma^2 + \frac{1}{n - 1} \Sigma^2 = \Sigma^2. \end{eqnarray*}\] To show that $ [ ^2 _ i | _ i ] = ^2 (_ i )$, let’s start from \[\begin{eqnarray*} \mathbb{E} [ ( X _ {it} - X _ {i\bullet} ) ^2 | \Lambda _ i ] & = & \mathbb{V}[ X _ {it} | \Theta _ i ] - 2 \mathbb{C}[ X _ {it} , X _ {i\bullet} | \Lambda _ i ] + \mathbb{V}[ X _ {i\bullet } | \Lambda _ i ] \\ & = & \frac{1}{w _{it}} \: \sigma ^2 ( \Lambda _ i ) - \frac{2}{w _ {i\bullet}} \sum \limits ^n _ {s = 1} \mathbb{C}[X _ {it}, X _ {is} | \Lambda _ i ] w _ {is} \\ &&+ \frac{1}{w _ i\bullet} \sum \limits ^n _ {t = 1 } \: \mathbb{C}[X_{it}, X _ {i\bullet} | \Lambda _ i ] \\ & = & \frac{1}{w _ {it}} \: \sigma ^2 ( \Lambda _ i ) - \frac{2}{w _ {i\bullet}} \sigma ^2 ( \Lambda _ i ) + \frac{ \sigma ^2 ( \Lambda _ i )}{w _ {i\bullet}} \\ & = & \frac{ \sigma ^ 2 ( \Lambda _ i )}{w _ {it}} - \frac{\sigma ^2 ( \Lambda _ i )}{w _ {i\bullet}}. \end{eqnarray*}\] Hence, \[\begin{eqnarray*} \mathbb{E} [ \hat \sigma ^2 _ i | \Lambda _ i ] & = & \frac{1}{n - 1} \sum \limits^n _ {t = 1 } w _ {it} \mathbb{E} [ ( X _ {it} - X _ {i\bullet}) ^2 | \Lambda _ i ] \\ & = & \frac{1}{n-1} \sum \limits^n _ {t = 1 } w _ {it} ( \frac{1}{w _ {it}} - \frac{1}{w _ {i\bullet}} ) \sigma ^2 ( \Lambda _ i ) = \sigma^2 ( \Lambda _ i ). \end{eqnarray*}\] At the portfolio level, we use \[ \hat \Sigma^2 = \frac{1}{K} \sum ^K _{i = 1 } \hat \sigma ^2 _ i = \frac{1}{K (n-1)} \sum ^K _{i = 1} \sum ^n _{t = 1 } w _ {it} ( X _ {it} - X _ {i\bullet } ) ^2. \]

Let’s calculate \[\begin{eqnarray*} \mathbb{E} [ ( X _ {i \bullet} - X _{\bullet \bullet}) ^2 ] & = & \mathbb{V}[X _ {i \bullet} - X _ {\bullet \bullet} ] \\ & = & \mathbb{V}[X _ { i \bullet } ] - 2 \mathbb{C}[ X _ {i \bullet}, X _ { \bullet \bullet}] + \mathbb{V}[ X _ { \bullet \bullet }]. \end{eqnarray*}\] Now, \[\begin{eqnarray*} \mathbb{C}[ X _ {it} , X _ {i \bullet }]& = &\frac{1}{w _ {i \bullet}} \sum \limits ^n _ {s = 1} \mathbb{C}[ X _ {it}, X _ {is}] w _ {is} = \frac{ \Sigma^2}{w _ {i \bullet}} + M^2 = \mathbb{V}[X _ {i \bullet}], \\ \mathbb{C}[ X _ {i \bullet} , X _ {j \bullet}] & = & \delta _ {ij} \left( \frac{ \Sigma^2}{w _ {i \bullet}} + M ^2 \right) \\ \mathbb{C}[ X _ {i \bullet}, X _ { \bullet \bullet}] & = & \frac{1}{w _ {\bullet \bullet}} \sum \limits^K _ {j = 1 } w _ {j \bullet} \mathbb{C}[ X _ {i \bullet}, X _{j \bullet} ] = \frac{\Sigma^2}{w _ { \bullet \bullet}} + M ^2 \frac{ w _ {i \bullet}}{w { \bullet \bullet}} \end{eqnarray*}\] and \[\begin{eqnarray*} \mathbb{V}[ X _ { \bullet \bullet}] & = & \mathbb{C}[ X _ { \bullet \bullet}, X _ { \bullet \bullet} ] = \frac{1}{w _ { \bullet \bullet}} \sum \limits^K _ {i = 1 } w _ {i \bullet} \mathbb{C}[X _ {i \bullet}, X _ { \bullet \bullet}]\\ & = & \frac{ \Sigma^2 }{w _ { \bullet \bullet}} + M ^2 \sum \limits ^K _ {i = 1 } \frac{w ^2_ {i \bullet}}{w ^2 _ { \bullet \bullet}} \end{eqnarray*}\] from which we finally obtain \[\begin{eqnarray*} \mathbb{E} \Big[( X _ {i \bullet} - X _ { \bullet \bullet}) ^2\Big] & = & \frac{\Sigma^2}{w _ { i \bullet}} + M ^2 - 2 \left( \frac{\Sigma^2}{w _ { \bullet \bullet}} + M ^2 \frac{w _ {i \bullet}}{w _ { \bullet \bullet}}\right) + \frac{\Sigma^2}{w _ {\bullet \bullet}} \\ &&+ M ^2 \sum ^K _ {j = 1} \frac{w^2 _ {j \bullet}}{w ^2 _ {\bullet \bullet}}\\ & = & \Sigma^2 \left( \frac{1}{w _ {i \bullet}} - \frac{1}{w _{\bullet \bullet}}\right) + M ^2 \left(1-2 \frac{w _ {i \bullet}}{w _ { \bullet \bullet}} + \sum \limits ^K _ {j = 1 } \frac {w ^2 _ {j \bullet }}{ w ^2 _ { \bullet \bullet}}\right) \end{eqnarray*}\] so that \[ \mathbb{E} \left[ \sum ^K _ {i =1} w _ {i \bullet} ( X _ {i \bullet} - X _ {\bullet \bullet}) ^2 \right] = \Sigma^2(K - 1)+ M ^2 \left( w _ { \bullet \bullet} - \frac{1}{w _ { \bullet \bullet}} \sum ^K _ { i = 1 } w ^2 _ { i \bullet} \right). \]

To estimate $M^2$ without bias, we use \[ \hat M ^2 = \frac{ w _ {\bullet \bullet}}{w ^2 _ { \bullet \bullet} - \sum \limits ^K _ {i = 1 } w ^2 _ {i \bullet}} \left\{ \sum ^K _ {i = 1 } w _ {i \bullet } ( X _ {i \bullet} - X _ { \bullet \bullet} ) ^2 - ( K - 1 ) \hat {\Sigma} ^2 \right\}. \]

Example 11.6 Consider a portfolio of 5 policies observed over 12 years described in Table ??. The weights of $X _ {it}$ are provided as claim costs. We obtain $\hat {\mu} = 1865.404$, $\hat {\Sigma}^2 = 1.3912 \times 10 ^8$, and $\hat {M} ^2 = 89638.71$.

If we had neglected the weights (i.e., setting $w _ {it} \equiv 1$), we would have obtained the results in Table ??, with $\hat {\mu} = 1671.017$, $\hat {\Sigma} ^2 = 46040.47$, and $\hat {M} ^2 = 72310.02$.

Comparing Tables ?? and ??, we can see that the premiums have changed little, except for policy 4: the sum of weights for this policy is lower, which makes the credibility factor lower as well.

::: {.example}

We are asked to calculate the premiums for the 4th year for two “group” contracts, the experience of which is described in the following table: {

} We have \[\begin{eqnarray*} \overline{x}_{1\bullet}&=&\frac{8000+11000+15000}{160}=212.5\\ \overline{x}_{2\bullet}&=&\frac{20000+24000+19000}{335}=188.06\\ \widehat{\mu}&=&\frac{1}{495}\big(160\times 212.5+335 \times 188.06\big)=195.96\\ \widehat{\Sigma}^2&=&\frac{\begin{array}{l} 40\times(200-212.5)^2+50\times(220-212.5)^2\\ +70\times(214.29-212.5)^2+100\times(200-188.06)^2\\ +120\times(200-188.06)^2+115\times(165.22-188.06)^2 \end{array} }{2\times 2}\\ &=&25160.58\\ \widehat{M}^2&=&\frac{\begin{array}{l}160\times(212.5-195.96)^2+335\times(188.06-195.96)^2\\ -25160.58\end{array}} {495-\frac{1}{495}(160^2+335^2)}\\ &=&182.48. \end{eqnarray*}\] The credibility coefficients are $\widehat{\alpha}_1=0.537$ and $\widehat{\alpha}_2=0.708$.

The Bühlmann-Straub model provides an average claim amount of \[\begin{eqnarray*} &&0.537\times 212.5+0.463\times 195.96\text{ for company 1}\\ &\text{and}&0.708\times 188.06+0.292\times 195.96\text{ for company 2.} \end{eqnarray*}\] The total expected claim amount generated by these companies during year 4 is \[\begin{eqnarray*} &&75\times 204.84=15,363\text{ for company 1}\\ &\text{and}&95\times 190.37=18,085.15\text{ for company 2.} \end{eqnarray*}\]

:::

11.3.2.5 Frequency-based Linear Credibility: Poisson Model with A Priori Segmentation

If we define $X_{it} = \frac{N_{it}}{\lambda_{it}}$, it is easy to verify that the assumptions underlying the Bühlmann-Straub model are satisfied. By noting $\sigma^2=\mathbb{V}\mbox{ar}[\Theta_i]$, the expected claim frequency for the new insurance period is therefore \[\begin{equation} \lambda_{i,T_i+1}\frac{1+\sigma^2N_{i\bullet}}{1+\sigma^2\lambda_{i\bullet}}. \tag{11.7} \end{equation}\] where \[ N_{i\bullet}=\sum_{t=1}^{T_i}N_{it}\text{ and }\lambda_{i\bullet}=\sum_{t=1}^{T_i}\lambda_{it}. \] It is interesting to mention that the same expression for the posterior premium is obtained as when considering the Gamma distribution for the random effect $\Theta_i$.

Remark. The formula (11.7) clearly shows that a priori and a posteriori pricing interact: the a posteriori adjustments depend on the quality of the a priori risk. An insured who is judged a better driver a priori gets less reduction if they do not report a claim than another driver judged less favorably a priori. However, even if the insured judged better a priori enjoys greater reductions when not reporting a claim, the revised frequencies for the insured judged better a priori are always lower.

Penalties for reporting a claim are higher for the insured judged better drivers a priori. If this interaction is not recognized, it has the effect of increasing the bonus granted for not reporting claims to insureds who have benefited from a priori reductions, while reducing the malus imposed on them when reporting a claim. Thus, insureds who have been subjected to a priori surcharges are penalized a second time a posteriori, for the same reasons. Clearly, a uniform credibility system superimposed on a priori segmentation thus undermines the fairness of the tariff.

11.4 Total Credibility

Large companies with relatively low claims may demand premiums based on their own experience; if so, this is referred to as total credibility. In this regard, the actuary must consider whether the size of the company is sufficient to grant total credibility.

Let’s assume that the amount of claims generated by the company is $S=\sum_{k=1}^NX_k$, where $N\sim\mathcal{P}oi(\lambda)$, and where $X_k$ are independent and identically distributed (with a mean of $\mu$ and a variance of $\sigma^2$), and independent of $N$.

Total credibility will be granted to the company only if the probability that $S$ deviates from the pure premium $\mathbb{E}[S]$ is sufficiently low, i.e., if \[\begin{eqnarray*} &&\Pr\Big[\big|S-\mathbb{E}[S]\big|>c\mathbb{E}[S]\Big]\leq\epsilon\\ &\Leftrightarrow&\Pr\Big[(1-c)\mathbb{E}[S]<S<(1+c)\mathbb{E}[S]\Big]\geq 1-\epsilon. \end{eqnarray*}\] The central limit theorem allows us to obtain the approximation \[ \frac{c\mathbb{E}[S]}{\sqrt{\mathbb{V}[S]}}=\frac{c\lambda\mu}{\sqrt{\lambda(\sigma^2+\mu^2)}}\approx z_{\epsilon/2}\Rightarrow \lambda_F=\frac{z_{\epsilon/2}^2}{c^2}\left(1+\left(\frac{\sigma}{\mu}\right)^2\right). \] The value of $\lambda_F$ should be interpreted as a measure of the volume that the company must represent in order to be granted total credibility.

In the expression for $\lambda_F$, claim amounts are involved through their coefficient of variation $\sigma/\mu$; if this ratio increases, $\lambda_F$ also increases. Taking $c=3\%$ and $\epsilon=5\%$, we have \[ \lambda_F=\left(\frac{1.96}{3\%}\right)^2\left(1+\left(\frac{\sigma}{\mu}\right)^2\right) =4268\left(1+\left(\frac{\sigma}{\mu}\right)^2\right). \] If the claim amounts are constant (fixed repair costs), we have the minimum value of $\lambda_F$ (i.e., 4,268 in our case). It is evident from the above that very few insureds can qualify for total credibility.

11.5 Multivariate Credibility

11.6 Modeling

In this section, based on (Denuit and Lambert 2001), we distinguish between material damages and bodily injury claims caused by policyholders in the portfolio in order to account for their respective costs in a priori and a posteriori pricing. More precisely, let $N_{it}^{\text{mat}}$ be the number of claims consisting exclusively of material damages caused by policyholder $i$ during period $t$, and $N_{it}^{\text{corp}}$ be the number of claims involving bodily injuries caused by the same policyholder; $N_{it}^{\text{tot}}=N_{it}^{\text{mat}}+N_{it}^{\text{corp}}$ represents the total number of claims over the period.

We introduce two random effects reflecting our ignorance of certain essential characteristics of the policyholders, $\Theta_i^{\text{mat}}$ and $\Theta_i^{\text{corp}}$, possibly correlated and such that $\mathbb{E}[\Theta_i^{\text{mat}}]=\mathbb{E}[\Theta_i^{\text{corp}}]=1$. This introduces dependence within the pairs $(N_{it}^{\text{mat}},N_{it}^{\text{corp}})$ as well as between the time series obtained by varying $t$ (with $i$ fixed). More precisely, we assume that conditional on $\Theta_i^{\text{mat}}=\theta_i^{\text{mat}}$, \[ N_{it}^{\text{mat}}\sim\mathcal{P}oi\big(\lambda_{it}^{\text{mat}}\theta_i^{\text{mat}}\big) \] and conditional on $\Theta_i^{\text{corp}}=\theta_i^{\text{corp}}$ \[ N_{it}^{\text{corp}}\sim\mathcal{P}oi\big(\lambda_{it}^{\text{corp}}\theta_i^{\text{corp}}\big). \] The specification of the model is as follows:

Conditionally on $\Theta_i^{\text{corp}}$, the random variables $N_{it}^{\text{corp}}$, $t=1,2,\ldots,T_i$, are independent;
Conditionally on $\Theta_i^{\text{mat}}$, the random variables $N_{it}^{\text{mat}}$, $t=1,2,\ldots,T_i$, are independent;
Conditionally on $(\Theta_i^{\text{mat}},\Theta_i^{\text{corp}})$, the sequences of random variables $\{N_{it}^{\text{corp}},t=1,2,\ldots,T_i\}$ and $\{N_{it}^{\text{mat}},t=1,2,\ldots,T_i\}$, are independent;
The variance-covariance matrix of the pair $(\Theta_i^{\text{corp}},\Theta_i^{\text{mat}})$ of random effects is \[ \left( \begin{array}{cc} \sigma_{\text{corp}}^2& \sigma_{\text{cm}}\\ \sigma_{\text{cm}}& \sigma_{\text{mat}}^2 \end{array} \right). \]

Note that the elements of the variance-covariance matrix of the random effects must satisfy the conditions \[ \sigma_{\text{corp}}^2\geq 0, \hspace{2mm}\sigma_{\text{mat}}^2\geq 0, \mbox{ and } |\sigma_{\text{cm}}|\leq\sigma_{\text{corp}}\sigma_{\text{mat}}. \] If the estimates of the variances and covariance do not satisfy these conditions, the model is not applicable.

Let $N_{i\bullet}^{\text{corp}}=\sum_{t=1}^{T_i}N_{it}^{\text{corp}}$ and $N_{i\bullet}^{\text{mat}}=\sum_{t=1}^{T_i}N_{it}^{\text{mat}}$ represent the total numbers of material and bodily injury claims reported by policyholder $i$ since entering the portfolio. Conditional on $\Theta_i^{\text{mat}}=\theta_i^{\text{mat}}$, \[ N_{i\bullet}^{\text{mat}}\sim\mathcal{P}oi\big(\lambda_{i\bullet}^{\text{mat}}\theta_i^{\text{mat}}\big)\text{ where }\lambda_{i\bullet}^{\text{mat}}=\mathbb{E}[N_{i\bullet}^{\text{mat}}]=\sum_{t=1}^{T_i}\lambda_{it}^{\text{mat}} \] and conditional on $\Theta_i^{\text{corp}}=\theta_i^{\text{corp}}$, \[ N_{i\bullet}^{\text{corp}}\sim\mathcal{P}oi\big(\lambda_{i\bullet}^{\text{corp}}\theta_i^{\text{corp}}\big)\text{ where }\lambda_{i\bullet}^{\text{corp}}=\mathbb{E}[N_{i\bullet}^{\text{corp}}]=\sum_{t=1}^{T_i}\lambda_{it}^{\text{corp}}, \] due to the conditional independence of annual claim counts in each category.

We can then write: \[\begin{eqnarray*} \mathbb{V}[N_{i\bullet}^{\text{mat}}]&=&\mathbb{E}\big[\mathbb{V}[N_{i\bullet}^{\text{mat}}|\Theta_i^{\text{mat}}]\big]+\mathbb{V}\big[\mathbb{E}[N_{i\bullet}^{\text{mat}}|\Theta_i^{\text{mat}}]\big]\\ &=&\lambda_{i\bullet}^{\text{mat}}+\{\lambda_{i\bullet}^{\text{mat}}\}^2\sigma_{\text{mat}}^2; \end{eqnarray*}\]

Similarly, \[ \mathbb{V}[N_{i\bullet}^{\text{corp}}]=\lambda_{i\bullet}^{\text{corp}}+\{\lambda_{i\bullet}^{\text{corp}}\}^2\sigma_{\text{corp}}^2; \] and finally, \[\begin{eqnarray*} \mathbb{C}[N_{i\bullet}^{\text{mat}},N_{i\bullet}^{\text{corp}}]&=&\mathbb{E}\big[\mathbb{C}[N_{i\bullet}^{\text{mat}},N_{i\bullet}^{\text{corp}}|\Theta_i^{\text{mat}},\Theta_i^{\text{corp}}]\big]\\ & & +\mathbb{C}\big[\mathbb{E}[N_{i\bullet}^{\text{mat}}|\Theta_i^{\text{mat}},\Theta_i^{\text{corp}}],\mathbb{E}[N_{i\bullet}^{\text{corp}}|\Theta_i^{\text{mat}},\Theta_i^{\text{corp}}]\big]\\ &=&\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}. \end{eqnarray*}\] The above equations naturally suggest the following estimators for the parameters $\sigma_{\text{mat}}^2$, $\sigma_{\text{corp}}^2$, and $\sigma_{\text{cm}}$:

\[\begin{eqnarray*} \widehat{\sigma_{\text{mat}}^2}&=&\frac{\sum_{i=1}^n\Big\{\big(n_{i\bullet}^{\text{mat}}-\widehat{\lambda_{i\bullet}^{\text{mat}}}\big)^2 -n_{i\bullet}^{\text{mat}}\Big\}} {\sum_{i=1}^n\{\widehat{\lambda_{i\bullet}^{\text{mat}}}\}^2}\\ \widehat{\sigma_{\text{corp}}^2}&=&\frac{\sum_{i=1}^n\Big\{\big(n_{i\bullet}^{\text{corp}}-\widehat{\lambda_{i\bullet}^{\text{corp}}}\big)^2 -n_{i\bullet}^{\text{corp}}\Big\}} {\sum_{i=1}^n\{\widehat{\lambda_{i\bullet}^{\text{corp}}}\}^2}\\ \widehat{\sigma_{\text{cm}}}&=&\frac{\sum_{i=1}^n\Big\{n_{i\bullet}^{\text{mat}}-\widehat{\lambda_{i\bullet}^{\text{mat}}}\Big\} \Big\{n_{i\bullet}^{\text{corp}}-\widehat{\lambda_{i\bullet}^{\text{corp}}}\Big\}} {\sum_{i=1}^n\widehat{\lambda_{i\bullet}^{\text{mat}}}\widehat{\lambda_{i\bullet}^{\text{corp}}}} \end{eqnarray*}\]

These estimators are proven to be convergent in the model with random effects.

11.6.1 Linear Credibility Premium

In the linear credibility approach, we seek to determine the best linear predictor \[ c_i^{\text{mat}}+c_i^{\text{mat/mat}}N_{i\bullet}^{\text{mat}}+c_i^{\text{corp/mat}}N_{i\bullet}^{\text{corp}} \] of the unknown premium $\lambda_{i,T_i+1}^{\text{mat}}\Theta_i^{\text{mat}}$ in the least squares sense, as well as the best linear predictor \[ c_i^{\text{corp}}+c_i^{\text{mat/corp}}N_{i\bullet}^{\text{mat}}+c_i^{\text{corp/corp}}N_{i\bullet}^{\text{corp}} \] of $\lambda_{i,T_i+1}^{\text{corp}}\Theta_i^{\text{corp}}$. Therefore, we must simultaneously minimize the mean squared errors $\mathcal{Q}^{\text{mat}}$ and $\mathcal{Q}^{\text{corp}}$ given by \[ \mathcal{Q}^{\text{mat}}=\mathbb{E}\big[\big\{\lambda_{i,T_i+1}^{\text{mat}}\Theta_i^{\text{mat}}-c_i^{\text{mat}}-c_i^{\text{mat/mat}}N_{i\bullet}^{\text{mat}}-c_i^{\text{corp/mat}}N_{i\bullet}^{\text{corp}}\big\}^2\big] \] and \[ \mathcal{Q}^{\text{corp}}=\mathbb{E}\big[\big\{\lambda_{i,T_i+1}^{\text{corp}}\Theta_i^{\text{corp}}-c_i^{\text{corp}}-c_i^{\text{mat/corp}}N_{i\bullet}^{\text{mat}}-c_i^{\text{corp/corp}}N_{i\bullet}^{\text{corp}}\big\}^2\big]. \] We can give the following meanings to the coefficients:

$c_i^{\text{mat/mat}}$ evaluates the predictive power of material claims on the future occurrence of material claims;
$c_i^{\text{corp/mat}}$ evaluates the predictive power of bodily injury claims on the future occurrence of material claims;
$c_i^{\text{mat/corp}}$ evaluates the predictive power of material claims on the future occurrence of bodily injury claims;
$c_i^{\text{corp/corp}}$ evaluates the predictive power of bodily injury claims on the future occurrence of bodily injury claims.

To cancel the first derivatives of the mean squared errors $\widetilde{\mathcal{Q}}^{\text{mat}}$ and $\widetilde{\mathcal{Q}}^{\text{corp}}$ with respect to the six parameters, we obtain the following system: \[\begin{eqnarray} c_{i0}^{\text{mat}}&=&\lambda_{i,T_i+1}^{\text{mat}}-c_{it}^{\text{mat/mat}}\lambda_{i\bullet}^{\text{mat}}-c_{it}^{\text{corp/mat}}\lambda_{i\bullet}^{\text{corp}} \tag{11.8}\\ c_{i0}^{\text{corp}}&=&\lambda_{i,T_i+1}^{\text{corp}}-c_{it}^{\text{mat/corp}}\lambda_{i\bullet}^{\text{mat}}-c_{it}^{\text{corp/corp}}\lambda_{i\bullet}^{\text{corp}} \tag{11.9}\\ 0&=&\lambda_{i,T_i+1}^{\text{mat}}\mathbb{E}[\Theta_i^{\text{mat}}N_{it}^{\text{mat}}]-c_{i0}^{\text{mat}}\lambda_{i\bullet}^{\text{mat}}-c_{it}^{\text{mat/mat}}\mathbb{E}\{N_{it}^{\text{mat}}\}^2\nonumber\\ &&-c_{it}^{\text{corp/mat}}\mathbb{E}[N_{it}^{\text{mat}}N_{it}^{\text{corp}}] \tag{11.10}\\ 0&=&\lambda_{i,T_i+1}^{\text{mat}}\mathbb{E}[\Theta_i^{\text{mat}}N_{it}^{\text{corp}}]-c_{i0}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}-c_{it}^{\text{mat/mat}}\mathbb{E}[N_{it}^{\text{mat}}N_{it}^{\text{corp}}]\nonumber\\ &&-c_{it}^{\text{corp/mat}}\mathbb{E}\{N_{it}^{\text{corp}}\}^2 \tag{11.11}\\ 0&=&\lambda_{i,T_i+1}^{\text{corp}}\mathbb{E}[\Theta_i^{\text{corp}}N_{it}^{\text{mat}}]-c_{i0}^{\text{corp}}\lambda_{i\bullet}^{\text{mat}}-c_{it}^{\text{mat/corp}}\mathbb{E}\{N_{it}^{\text{mat}}\}^2\nonumber\\ &&-c_{it}^{\text{corp/corp}}\mathbb{E}[N_{it}^{\text{mat}}N_{it}^{\text{corp}}] \tag{11.12}\\ 0&=&\lambda_{i,T_i+1}^{\text{corp}}\mathbb{E}[\Theta_i^{\text{corp}}N_{it}^{\text{corp}}]-c_{i0}^{\text{corp}}\lambda_{i\bullet}^{\text{corp}}-c_{it}^{\text{mat/corp}}\mathbb{E}[N_{it}^{\text{mat}}N_{it}^{\text{corp}}]\nonumber\\ &&-c_{it}^{\text{corp/corp}}\{N_{it}^{\text{corp}}\}^2 \tag{11.13}. \end{eqnarray}\] The expectations that appear in this system are as follows: \[\begin{eqnarray*} \mathbb{E}[\Theta_i^{\text{mat}}N_{it}^{\text{mat}}]&=&\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2+\lambda_{i\bullet}^{\text{mat}}\\ \mathbb{E}[\Theta_i^{\text{corp}}N_{it}^{\text{corp}}]&=&\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2+\lambda_{i\bullet}^{\text{corp}}\\ \mathbb{E}[\Theta_i^{\text{mat}}N_{it}^{\text{corp}}]&=&\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}+\lambda_{i\bullet}^{\text{corp}}\\ \mathbb{E}[\Theta_i^{\text{corp}}N_{it}^{\text{mat}}]&=&\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{cm}}+\lambda_{i\bullet}^{\text{mat}}\\ \mathbb{E}[\{N_{it}^{\text{mat}}\}^2]&=&\lambda_{i\bullet}^{\text{mat}}+\{\lambda_{i\bullet}^{\text{mat}}\}^2(\sigma_{\text{mat}}^2+1)\\ \mathbb{E}[\{N_{it}^{\text{corp}}\}^2]&=&\lambda_{i\bullet}^{\text{corp}}+\{\lambda_{i\bullet}^{\text{corp}}\}^2(\sigma_{\text{corp}}^2+1)\\ \mathbb{E}[N_{it}^{\text{mat}}N_{it}^{\text{corp}}]&=&\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}(\sigma_{\text{cm}}+1). \end{eqnarray*}\]

Injecting these expressions into (11.10)-(11.13), we can verify that the coefficients $c_{it}^{\text{corp/corp}}$, $c_{it}^{\text{corp/mat}}$, $c_{it}^{\text{mat/corp}}$, and $c_{it}^{\text{mat/mat}}$ are solutions of the following linear system: \[\begin{eqnarray*} \lambda_{i,T_i+1}^{\text{mat}}\sigma_{\text{mat}}^2&=&c_{it}^{\text{mat/mat}}(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)+c_{it}^{\text{corp/mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}\\ \lambda_{i,T_i+1}^{\text{mat}}\sigma_{\text{cm}}&=&c_{it}^{\text{mat/mat}}\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{cm}}+c_{it}^{\text{corp/mat}}(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)\\ \lambda_{i,T_i+1}^{\text{corp}}\sigma_{\text{cm}}&=&c_{it}^{\text{mat/corp}}(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)+c_{it}^{\text{corp/corp}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}\\ \lambda_{i,T_i+1}^{\text{corp}}\sigma_{\text{corp}}^2&=&c_{it}^{\text{mat/corp}}\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{cm}}+c_{it}^{\text{corp/corp}}(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2). \end{eqnarray*}\]

This finally yields:

\[\begin{eqnarray*} c_{it}^{\text{corp/mat}}&=&\frac{\lambda_{i,T_i+1}^{\text{mat}}\sigma_{\text{cm}}}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}\\ c_{it}^{\text{mat/corp}}&=&\frac{\lambda_{i,T_i+1}^{\text{corp}}\sigma_{\text{cm}}}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}\\ c_{it}^{\text{mat/mat}}&=&\lambda_{i,T_i+1}^{\text{mat}}\frac{\sigma_{\text{mat}}^2+\lambda_{i\bullet}^{\text{corp}}(\sigma_{\text{mat}}^2\sigma_{\text{corp}}^2-\sigma_{\text{cm}}^2)}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}\\ c_{it}^{\text{corp/corp}}&=&\lambda_{i,T_i+1}^{\text{corp}}\frac{\sigma_{\text{corp}}^2+\lambda_{i\bullet}^{\text{mat}}(\sigma_{\text{mat}}^2\sigma_{\text{corp}}^2-\sigma_{\text{cm}}^2)}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}. \end{eqnarray*}\]

It is easy to see that $c_{it}^{\text{corp/mat}}$ and $c_{it}^{\text{mat/corp}}$ increase with $\sigma_{\text{cm}}$ and decrease with $\sigma_{\text{mat}}^2$ and $\sigma_{\text{corp}}^2$.

We can then inject these solutions into the two equations (11.8)-(11.9) of the original system to obtain $c_{i0}^{\text{mat}}$ and $c_{i0}^{\text{corp}}$:

\[\begin{eqnarray*} c_{i0}^{\text{mat}}&=&\lambda_{i,T_i+1}^{\text{mat}}\frac{1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2-\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}\\ c_{i0}^{\text{corp}}&=&\lambda_{i,T_i+1}^{\text{corp}}\frac{1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2-\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{cm}}}{(1+\lambda_{i\bullet}^{\text{mat}}\sigma_{\text{mat}}^2)(1+\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{corp}}^2)-\lambda_{i\bullet}^{\text{mat}}\lambda_{i\bullet}^{\text{corp}}\sigma_{\text{cm}}^2}. \end{eqnarray*}\]

11.6.2 An Approach on Disaggregated Data

The model described in the previous section works with aggregated data for each policyholder (i.e., working with $N_{i\bullet}^{\text{mat}}$ and $N_{i\bullet}^{\text{corp}}$ rather than the annual counts $N_{it}^{\text{mat}}$ and $N_{it}^{\text{corp}}$, $t=1,2,\ldots,T_i$). The choice is not entirely innocent regarding the estimates of the variances $\sigma_{\text{mat}}^2$ and $\sigma_{\text{corp}}^2$, as well as the covariance $\sigma_{\text{cm}}$, as we will see later.

Let’s start by justifying working with aggregated data for the calculation of linear premium predictors. To do this, let’s determine the best linear predictor $c_{i0}^{\text{mat}}+\sum_{t=1}^{T_i}c_{it}^{\text{mat/mat}}N_{it}^{\text{mat}}+ \sum_{t=1}^{T_i}c_{it}^{\text{corp/mat}}N_{it}^{\text{corp}}$ of the unknown premium $\lambda_{i,T_i+1}^{\text{mat}}\Theta_i^{\text{mat}}$ in the least squares sense, as well as the best linear predictor $c_{i0}^{\text{corp}}+\sum_{t=1}^{T_i}c_{it}^{\text{mat/corp}}N_{it}^{\text{mat}}+ \sum_{t=1}^{T_i}c_{it}^{\text{corp/corp}}N_{it}^{\text{corp}}$ of $\lambda_{i,T_i+1}^{\text{corp}}\Theta_i^{\text{corp}}$. Therefore, we must simultaneously minimize \[\begin{eqnarray*} \widetilde{\mathcal{Q}}^{\text{mat}}&=&\mathbb{E}\Big[\Big\{\lambda_{i,T_i+1}^{\text{mat}}\Theta_i^{\text{mat}}-c_{i0}^{\text{mat}}- \sum_{t=1}^{T_i}c_{it}^{\text{mat/mat}}N_{it}^{\text{mat}}\\ &&-\sum_{t=1}^{T_i}c_{it}^{\text{corp/mat}}N_{it}^{\text{corp}}\Big\}^2\Big] \end{eqnarray*}\] and \[\begin{eqnarray*} \widetilde{\mathcal{Q}}^{\text{corp}}&=&\mathbb{E}\Big[\Big\{\lambda_{i,T_i+1}^{\text{corp}}\Theta_i^{\text{corp}}-c_{i0}^{\text{corp}}- \sum_{t=1}^{T_i}c_{it}^{\text{mat/corp}}N_{it}^{\text{mat}}\\ &&-\sum_{t=1}^{T_i}c_{it}^{\text{corp/corp}}N_{it}^{\text{corp}}\Big\}^2\Big]. \end{eqnarray*}\]

By following a reasoning similar to the one above, we easily obtain, by differentiating $\widetilde{\mathcal{Q}}^{\text{mat}}$ with respect to $c_{i0}^{\text{mat}}$ and $c_{is}^{\text{mat/mat}}$ and setting these derivatives to zero: \[\begin{eqnarray*} c_{i0}^{\text{mat}}&=&\lambda_{i,T_i+1}^{\text{mat}}-\sum_{t=1}^{T_i}c_{it}^{\text{mat/mat}}\lambda_{it}^{\text{mat}}-\sum_{t=1}^{T_i}c_{it}^{\text{corp/mat}}\lambda_{it}^{\text{corp}}\\ c_{is}^{\text{mat/mat}}&=&\lambda_{i,T_i+1}^{\text{mat}}\mathbb{V}[\Theta_i^{\text{mat}}]-\sigma_{\text{mat}}^2\sum_{t=1}^{T_i}c_{it}^{\text{mat/mat}}\lambda_{it}^{\text{mat}}\\ &&-\sigma_{\text{cm}}\sum_{t=1}^{T_i}c_{it}^{\text{corp/mat}}\lambda_{it}^{\text{corp}}; \end{eqnarray*}\] we can see from this last relation that $c_{is}^{\text{mat/mat}}$ does not depend on $s$. A similar reasoning would lead us to notice that $c_{is}^{\text{mat/corp}}$, $c_{is}^{\text{corp/mat}}$, and $c_{is}^{\text{corp/corp}}$ do not depend on $s$ either. This justifies the aggregated approach chosen earlier.

However, estimating the parameters $\sigma_{\text{mat}}^2$, $\sigma_{\text{corp}}^2$, and $\sigma_{\text{cm}}$ based on individual data provides different estimates than $\widehat{\sigma_{\text{mat}}^2}$, $\widehat{\sigma_{\text{corp}}^2}$, and $\widehat{\sigma_{\text{cm}}}$ obtained earlier. In practice, we prefer $\widehat{\sigma_{\text{mat}}^2}$, $\widehat{\sigma_{\text{corp}}^2}$, and $\widehat{\sigma_{\text{cm}}}$ to $\widetilde{\sigma_{\text{mat}}^2}$, $\widetilde{\sigma_{\text{corp}}^2}$, and $\widetilde{\sigma_{\text{cm}}}$, respectively, because the variance of the former is lower. Nevertheless, comparing these estimators allows us to assess the adequacy of the model. Indeed, since all these estimators are convergent, very different values obtained from individual and aggregated data suggest that the model may be misspecified.

11.7 Hierarchical Credibility

11.7.1 Motivation: Fleet Insurance

Fleet insurance is most often held by legal entities. In this case, portfolios of policies covering such fleets have multiple levels: the fleets and the vehicles within them. The size of the fleets is a determining variable in actuarial analysis.

Credibility models developed for fleets are based on hierarchical random effects, which allow separating an effect specific to the fleet and an effect specific to each vehicle within it.

The index $f=1,\ldots,F$ will now represent the fleet, and the vehicles within each fleet will be indexed by $i=1,\ldots,m_f$, where $m_f$ represents the size of fleet $f$. Let $N_{fi}$ be the number of claims related to vehicle $i$ in fleet $f$. Conditionally on a random effect $\Theta_{fi}=\theta$, we assume $N_{fi}$ follows a Poisson distribution $\mathcal{P}oi(\lambda_{fi}\theta)$. The random effect $\Theta_{fi}$ captures the residual effect of all hidden variables on the number of claims.

The expected number of claims for vehicle $i$ in fleet $f$, denoted by $\lambda_{fi}=\mathbb{E}[N_{fi}]$, depends on observable characteristics related to either the fleet or the specific vehicle. More precisely, we consider $\lambda_{fi}$ to be of the form \[ \lambda_{fi}=d_{fi}\exp(\boldsymbol{\alpha}^t\boldsymbol{x}_f+\boldsymbol{\beta}^t\boldsymbol{z}_{fi}) \] where

$d_{fi}$ is the exposure to risk for vehicle $f$ in fleet $i$.
$\boldsymbol{x}_f$ contains information related to fleet $i$, such as the industry sector, number of employees, etc.
$\boldsymbol{z}_{fi}$ contains information specific to vehicle $i$ within fleet $f$, such as power, fuel type, assignment, mileage, etc.

The specification of $\lambda_{fi}$ intentionally resembles the Poisson regression presented in the previous chapter. However, in this case, we need to consider the stratified nature of the portfolio when estimating the regression parameters $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$.

A convenient way to estimate these parameters is as follows. First, we aggregate the claims at the fleet level by calculating $N_{f\bullet}=\sum_{i=1}^{m_f}N_{fi}$, with an exposure to risk of $d_{f\bullet}=\sum_{i=1}^{m_f}d_{fi}$. We then use a Poisson regression on $\boldsymbol{x}_f$ to obtain $\widehat{\boldsymbol{\alpha}}$. Next, we create the offset $\widehat{\boldsymbol{\alpha}}^t\boldsymbol{x}+\ln d_{fi}$, and we regress $N_{fi}$ on $\boldsymbol{z}_{fi}$ to obtain $\widehat{\boldsymbol{\beta}}$. In this second estimation phase, we take into account the correlation among vehicles within the same fleet using the GEE technique discussed earlier (the Repeated option in the SAS GENMOD procedure). These estimates can potentially be improved by using them as initial values for the maximization of the joint likelihood function.

The random effects $\Theta_{fi}$ are split into a fleet-specific factor, denoted as $R_f$, and a factor specific to vehicle $i$ within fleet $f$, denoted as $S_{fi}$. We further assume that $\{R_f,\hspace{2mm}f=1,\ldots,F\}$ and $\{S_{fi},\hspace{2mm}f=1,\ldots,F,\hspace{2mm}i=1,\ldots,m_f\}$ are mutually independent and consist of independent and identically distributed random variables. The vehicle effect $S_{fi}$ reflects, among other things, the behavior of the driver(s) of that vehicle. The fleet effect $R_f$ reflects the attention paid by the leaders of the company owning the fleet to safety rules, vehicle maintenance, driver schedules, etc., as well as the financial health of the company (which can influence accident prevention and therefore the fleet’s risk level).

In a semi-parametric approach, the actuary specifies only the first two moments of the random effects (their distribution is not fixed). Therefore, we will assume that $\mathbb{E}[R_f]=\mathbb{E}[S_{fi}]=1$ and denote $\mathbb{V}[R_f]=\sigma_R^2$ and $\mathbb{V}[S_f]=\sigma_S^2$. Then, we have \[\begin{eqnarray*} \mathbb{E}[\Theta_{fi}]&=&\mathbb{E}[R_f]\mathbb{E}[S_{fi}]=1\\ \mathbb{V}[\Theta_{fi}]&=&\sigma_\Theta^2=\mathbb{E}[R_f^2]\mathbb{E}[S_{fi}^2]-1= \sigma_R^2+\sigma_S^2+\sigma_R^2\sigma_S^2. \end{eqnarray*}\] Similarly, \[\begin{eqnarray*} \mathbb{V}[N_{fi}]&=&\lambda_{fi}+\lambda_{fi}^2\sigma_\Theta^2\\ \mathbb{C}[N_{fi},N_{fi'}]&=&\mathbb{E}\Big[\mathbb{C}[N_{fi},N_{fi'}|R_f]\Big]\\ & & + \mathbb{C}\Big[\mathbb{E}[N_{fi}|R_f],\mathbb{E}[N_{fi'}|R_f]\Big]\\ &=&\lambda_{fi}\lambda_{fi'}\sigma_R^2\text{ for }i\neq i'. \end{eqnarray*}\]

After obtaining $\widehat{\lambda}_{fi}$, we estimate the other parameters using \[\begin{eqnarray*} \widehat{\sigma}_R^2&=&\frac{\sum_{f=1}^F\sum_{i\neq i'}(n_{fi}-\widehat{\lambda}_{fi}) (n_{fi'}-\widehat{\lambda}_{fi'})}{\sum_{f=1}^F\sum_{i\neq i'}\widehat{\lambda}_{fi}\widehat{\lambda}_{fi'}}\\ \widehat{\sigma}_\Theta^2&=&\frac{\sum_{f=1}^F\sum_{i=1}^{m_f}\Big\{(n_{fi}-\widehat{\lambda}_{fi})^2- n_{fi}\Big\}}{\sum_{f=1}^F\sum_{i=1}^{m_f}\widehat{\lambda}_{fi}^2}. \end{eqnarray*}\] From the relation $\sigma_\Theta^2=\sigma_R^2+\sigma_S^2+\sigma_R^2\sigma_S^2$, it is then easy to derive an estimator for ${\sigma}_S^2$, namely $$ _S^2=.

The value of $\widehat{\sigma}_R^2$ reflects the degree of correlation between past claims of different vehicles belonging to the same fleet. If $\widehat{\sigma}_R^2>0$, then this dependence is positive, and the history of one vehicle reveals information about the other vehicles in the same fleet. Note that the numerator of the ratio defining $\widehat{\sigma}_R^2$ can still be expressed as \[\begin{eqnarray*} & & \sum_{f=1}^F\sum_{i\neq i'}(n_{fi}-\widehat{\lambda}_{fi}) (n_{fi'}-\widehat{\lambda}_{fi'})\\ & = & \sum_{f=1}^F\sum_{i=1}^{m_f}\sum_{i'=1}^{m_f}(n_{fi}-\widehat{\lambda}_{fi}) (n_{fi'}-\widehat{\lambda}_{fi'})-\sum_{f=1}^F\sum_{i=1}^{m_f}(n_{fi}-\widehat{\lambda}_{fi})^2\\ & = & \sum_{f=1}^F(n_{f\bullet}-\widehat{\lambda}_{f\bullet})^2 -\sum_{f=1}^F\sum_{i=1}^{m_f}(n_{fi}-\widehat{\lambda}_{fi})^2. \end{eqnarray*}\] where \[ n_{f\bullet}=\sum_{i=1}^{m_f}n_{fi}\text{ and }\widehat{\lambda}_{f\bullet}=\sum_{i=1}^{m_f}\lambda_{fi}. \]

As explained earlier, we could have $\widehat{\sigma}_S^2<0$ in some cases, indicating that the hierarchical credibility model cannot be applied to the analyzed data. Now, let’s explain the meaning of $\widehat{\sigma}_S^2>0$ $\Leftrightarrow\widehat{\sigma}_\Theta^2> \widehat{\sigma}_R^2$. This implies that \[ \frac{\sum_{f=1}^F\sum_{i=1}^{m_f}\Big\{(n_{fi}-\widehat{\lambda}_{fi})^2- n_{fi}\Big\}}{\sum_{f=1}^F\sum_{i=1}^{m_f}\widehat{\lambda}_{fi}^2} \] \[ > \frac{\sum_{f=1}^F(n_{f\bullet}-\widehat{\lambda}_{f\bullet})^2 -n_{f\bullet}}{\sum_{f=1}^F\widehat{\lambda}_{f\bullet}^2 -\sum_{f=1}^F\sum_{i=1}^{m_f}\widehat{\lambda}_{fi}^2}. \] Some manipulations show that this inequality holds if, and only if, \[ \frac{\sum_{f=1}^F\sum_{i=1}^{m_f}\Big\{(n_{fi}-\widehat{\lambda}_{fi})^2- n_{fi}\Big\}}{\sum_{f=1}^F\sum_{i=1}^{m_f}\widehat{\lambda}_{fi}^2}> \frac{\sum_{f=1}^F(n_{f\bullet}-\widehat{\lambda}_{f\bullet})^2 -n_{f\bullet}}{\sum_{f=1}^F\widehat{\lambda}_{f\bullet}^2}. \] Therefore, $\widehat{\sigma}_S^2>0$ if the relative overdispersion calculated at the vehicle level is higher than the same overdispersion calculated at the fleet level.

11.7.2 Linear Credibility

We are now able to calculate a linear credibility premium for each vehicle in a given fleet, say $f_0$, based on the claims history of the fleet (with credibility coefficients specific to the considered vehicle, say $i_0$). Since credibility coefficients are calculated separately for each fleet, we omit the fleet index $f$ for simplicity.

Suppose that the fleet has been observed over the course of one year, and let’s calculate frequency revisions for the following year. The fleet is assumed to include $m$ vehicles during the first year. Assuming that the credibility premium for vehicle $i_0$ depends only on the total number $\sum_{i=1}^mN_i$ of claims related to the entire fleet, we seek coefficients $a_{i_0}$ and $b_{i_0}$ that minimize \[ \mathcal{Q}=\mathbb{E}\left[\left(\Theta_{i_0}-a-b\sum_{i=1}^mN_i\right)^2\right]. \] By setting $\partial\mathcal{Q}/\partial a$ to zero, we obtain $1=a_{i_0}+b_{i_0}\sum_{i=1}^m\mathbb{E}[N_i]$, so the credibility premium can be expressed in terms of credibility coefficients as \[ 1+b_{i_0}\sum_{i=1}^m(N_i-\lambda_i)\equiv 1-\text{cred}_{i_0}+\text{cred}_{i_0}\frac{\sum_{i=1}^mN_i}{\sum_{i=1}^m\lambda_i}. \] Minimizing $\mathcal{Q}$ leads to \[ \text{cred}_{i_0}=b_{i_0}\sum_{i=1}^m\lambda_i=\frac{\mathbb{C}\left[\Theta_{i_0},\sum_{i=1}^mN_i\right]} {\mathbb{V}\left[\sum_{i=1}^mN_i\right]}\sum_{i=1}^m\lambda_i. \] The elements involved in the calculations are \[\begin{eqnarray*} \widehat{\mathbb{C}[\Theta_{i_0},N_i]} &=& \widehat{\lambda}_i\mathbb{C}[\Theta_{i_0},\Theta_i] = \left\{ \begin{array}{l} \widehat{\lambda}_i\sigma_R^2\text{ if }i_0\neq i\\ \widehat{\lambda}_{i_0}\sigma_\Theta^2\text{ if }i_0=i \end{array} \right. \\ \widehat{\mathbb{V}[N_i]} &=& \widehat{\lambda}_i + \widehat{\lambda}_i\widehat{\sigma}_\Theta^2\\ \widehat{\mathbb{C}[N_i,N_{i'}]} &=& \widehat{\lambda}_i\widehat{\lambda}_{i'}\widehat{\sigma}_R^2\text{ for }i\neq i'. \end{eqnarray*}\]

11.7.3 The Case of New Vehicles

If the vehicle was not observed during the first year, then \[\begin{eqnarray*} \text{cred}_{i_0}=\alpha &=& \frac{\widehat{\sigma}_R^2\sum_{i=1}^m\widehat{\lambda}_i} {\sum_{i=1}^m\widehat{\lambda}_i+\sum_{i=1}^m\widehat{\lambda}_i^2\widehat{\sigma}_\Theta^2 +\sum_{i\neq i'}\widehat{\lambda}_i\widehat{\lambda}_{i'}\widehat{\sigma}_R^2} \sum_{i=1}^m\widehat{\lambda}_i\\ &=& \frac{\widehat{\sigma}_R^2\sum_{i=1}^m\widehat{\lambda}_i} {1+\widehat{\sigma}_R^2\sum_{i=1}^m\widehat{\lambda}_i +(\widehat{\sigma}_\Theta^2-\widehat{\sigma}_R^2)\frac{\sum_{i=1}^m\widehat{\lambda}_i^2}{\sum_{i=1}^m \widehat{\lambda}_i}}. \end{eqnarray*}\] We notice that $\text{cred}_{i_0}$ increases with the estimated variance $\widehat{\sigma}_R^2$ of the fleet effect and with the claims frequency $\sum_{i=1}^m\widehat{\lambda}_i$ for the entire fleet.

11.7.4 The Case of Existing Vehicles

If the vehicle belonged to the fleet during the first year, then \[ \text{cred}_{i_0}=\alpha+\beta_{i_0}\text{ with }\beta_{i_0}= \frac{(\widehat{\sigma}_\Theta^2-\widehat{\sigma}_R^2)\widehat{\lambda}_{i_0}} {1+\widehat{\sigma}_R^2\sum_{i=1}^m\widehat{\lambda}_i +(\widehat{\sigma}_\Theta^2-\widehat{\sigma}_R^2)\frac{\sum_{i=1}^m\widehat{\lambda}_i^2}{\sum_{i=1}^m \widehat{\lambda}_i}}. \] The credibility coefficient thus appears as the sum of a term $\alpha$ specific to the fleet (and therefore common to all fleet vehicles) and a term $\beta_{i_0}$ specific to the vehicle.

11.7.5 Open Fleets

Most policies available in the market cover open fleets, where the company owning the fleet is not required to report every new vehicle or any disposal of an existing vehicle to the insurance company. In this context, a credibility premium calculated at the vehicle level seems unrealistic.

If $\varrho$ denotes the average vehicle turnover rate in the fleet, a credibility factor of $\alpha+(1-\varrho)\overline{\beta}$ can be adopted for the fleet, where $\overline{\beta}$ is the average of the $\beta_i$.

11.8 Bibliographical Notes

The origins of credibility theory date back to (Mowbray 1914) and (Whitney 1918). However, it wasn’t until the work of (Bühlmann 1967) and (Bühlmann 1969) that a rigorous theory was developed. The book by (Dannenburg et al. 1996) provides a very accessible introduction to the subject, as do (Klugman, Panjer, and Willmot 1998) and (Kaas et al. 2008).

The frequential credibility model with explanatory variables was introduced in (Dionne and Vanasse 1989) and (Dionne and Vanasse 1992). Noteworthy works include those by (Pinquet 1997) and (Pinquet 1998).

The type of dependence induced by actuarial credibility models has been studied by (Purcaru and Denuit 2002), (Purcaru and Denuit 2002), (Purcaru and Denuit 2003), as well as by (Brouhns et al. 2003). A comparison of the corrections induced by different types of models is provided by (Purcaru, Guillén, and Denuit 2004). (Pinquet, Guillén, and Bolancé 2001) study a Spanish database using a credibility model with time-varying random effects. This allows for the consideration of the seniority of claims, which is not possible with the models described in this chapter. In this regard, readers may also refer to (Gerber, Jones, et al. 1975) and (Sundt 1988).

For more details on the statistical treatment of mixed Poisson models for claim counts (especially in the case of Gamma mixtures and mixed Gamma distributions), the reader may find useful references in (Denuit and Dhaene 2001) and the references mentioned therein. Regarding portfolios covering fleets of vehicles, we refer the reader to (Desjardins, Dionne, and Pinquet 2001).

Throughout this chapter, we have considered predictors that are optimal in terms of minimizing mean squared error. (Bermudez, Denuit, and Dhaene 2001) and (Denuit and Lambert 2001) have considered other loss functions, resulting in posterior corrections whose severity can be controlled with a parameter (while maintaining the property of financial equilibrium).

11.9 Exercises

Exercise 11.1 Consider a population divided into two subpopulations (modeled by a parameter $\Theta$), good and bad drivers, in equal proportions. Half of the drivers, the good ones corresponding to $\Theta = B$, have a probability of $1/5$ of having an accident in a year and $4/5$ of not having one. The other half of the drivers, the bad ones corresponding to $\Theta = M$, have a probability of $2/5$ of having an accident in a year and $3/5$ of not having one.

A driver has taken out an insurance policy for 3 years, and over the 3 years, they have had 2 accidents. Calculate the expected number of claims for the next two years.

Exercise 11.2 We assume the claim cost $X$ is modeled by a normal distribution with variance $\nu$ (given) and mean $\Theta$, where $\Theta$ follows a normal distribution with mean $\mu$ and variance $a$. Provide the (non-conditional) distribution of $X$, then calculate its expected value and variance. Given $k$ observations $X_1, \ldots, X_k$, provide a posterior estimator for $\Theta$ and compare it to the B"{u}hlmann estimator.

Exercise 11.3 Suppose that for an individual, their annual claim frequency has a mean of $\lambda$ and a variance of $\sigma^2$, where $\lambda$ follows a uniform distribution $\mathcal{U}ni[0.5, 1.5]$, and $\sigma^2$ follows an exponential distribution with a mean of $1.25$. An insured individual chosen at random had no claims in the first year. Using B"{u}hlmann’s credibility theory, estimate the number of claims for the second year.

Exercise 11.4 We have the following statistics on accident frequencies for individual auto insurance policies: { \[\begin{equation*} \vline% \begin{array}{rrr} \hline \vline & \text{Professional} & \text{Personal} \\ \hline \begin{array}{c} \\ \text{Rural} \\ \text{Urban} \\ \text{Total}% \end{array}% \vline & \begin{array}{cc} \text{Mean} & \text{Variance} \\ 1.0 & 0.50 \\ 2.0 & 1.00 \\ 1.8 & 1.06% \end{array}% \vline & \begin{array}{cc} \text{Mean} & \text{Variance} \\ 1.5 & 0.80 \\ 2.5 & 1.00 \\ 2.3 & 1.12% \end{array} \\ \hline \end{array}% \vline \end{equation*}\]} Assume there are an equal number of policies for personal and professional use. Determine the B"{u}hlmann credibility factor for a driver when they present themselves, knowing that we cannot determine whether they will use their vehicle for professional or personal purposes, in a rural or urban area.

Exercise 11.5

In a portfolio, there are three types of drivers (A, B, and C, let’s say). Type A drivers represent half of the portfolio, while type B drivers represent one-third. Knowing the type of driver, the annual numbers of claims $N_1, N_2, \ldots$ are independent random variables with the same distribution described in the following table: {

} An insured person chosen at random had 1 claim in the first year.
Calculate the expected number of claims for the second year.

Exercise 11.6 An insurance company has two “group” contracts in its portfolio, and the statistics for the last three years are shown below. Calculate the experience premium for year 4 for each of the two companies.

{

}

Exercise 11.7

Let $N_i$, $i=1,2,\ldots$, be the number of claims caused by an insured individual during the $i$th year. Conditionally on $\Theta=\theta$, the $N_i$ are independent and follow a Poisson distribution with mean $\theta$. The random effect $\Theta$ is assumed to follow a Gamma distribution with mean $a/\tau$ and variance $a/\tau^2$.

Exercise 11.8 Consider a portfolio consisting of 5 policies, each of which has been observed for 5 years. Each policy results in either 0 or 1 claim per year. Let $I_{it}$ be the number of claims produced by policy $i$ during year $t$, $i,t=1,2,3,4,5$.

Assuming the average cost of a single claim, calculate the premiums for the sixth year in the B"uhlmann model, knowing that \[ \sum_{t=1}^5I_{it}=0\text{ for }i=1,2,3, \] and \[ \sum_{t=1}^5I_{4t}=1,\hspace{2mm}\sum_{t=1}^5I_{5t}=2\text{ and } \sum_{t=1}^5I_{5t}^2=2. \]

Postface

Bermudez, Luis, Michel Denuit, and Jan Dhaene. 2001. “Exponential Bonus-Malus Systems Integrating a Priori Risk Classification.” Journal of Actuarial Practice 9: 84–112.

Brouhns, Natacha, Montserrat Guillén, Michel Denuit, and Jean Pinquet. 2003. “Bonus-Malus Scales in Segmented Tariffs with Stochastic Migration Between Segments.” Journal of Risk and Insurance 70 (4): 577–99.

Bühlmann, Hans. 1967. “Experience Rating and Credibility.” ASTIN Bulletin: The Journal of the IAA 4 (3): 199–207.

———. 1969. “Experience Rating and Credibility II.” ASTIN Bulletin: The Journal of the IAA 5 (3): 157–65.

Dannenburg, Dennis R, Rob Kaas, Marc J Goovaerts, et al. 1996. Practical Actuarial Credibility Models. Institute of Actuarial Science, Amsterdam.

Denuit, Michel, and Jan Dhaene. 2001. “Bonus-Malus Scales Using Exponential Loss Functions.” Blätter Der DGVFM 25 (1): 13–27.

Denuit, Michel, and Philippe Lambert. 2001. “Smoothed NPML Estimation of the Risk Distribution Underlying Bonus-Malus Systems.” In Proceedings of the Casualty Actuarial Society, 88:142. 198.

Desjardins, Denise, Georges Dionne, and Jean Pinquet. 2001. “Experience Rating Schemes for Fleets of Vehicles.” ASTIN Bulletin: The Journal of the IAA 31 (1): 81–105.

Dionne, Georges, and Charles Vanasse. 1989. “A Generalization of Automobile Insurance Rating Models: The Negative Binomial Distribution with a Regression Component.” ASTIN Bulletin: The Journal of the IAA 19 (2): 199–212.

———. 1992. “Automobile Insurance Ratemaking in the Presence of Asymmetrical Information.” Journal of Applied Econometrics 7 (2): 149–65.

Gerber, HU, D Jones, et al. 1975. “Credibility Formulas of the Updating Type.” Transactions of the Society of Actuaries 27 (1): 31–52.

Kaas, Rob, Marc Goovaerts, Jan Dhaene, and Michel Denuit. 2008. Modern Actuarial Risk Theory: Using r. Vol. 128. Springer.

Klugman, Stuart A, Harry H Panjer, and Gordon E Willmot. 1998. Loss Models: From Data to Decisions. John Wiley & Sons.

Mowbray, Albert H. 1914. “How Extensive a Payroll Exposure Is Necessary to Give a Dependable Pure Premium.” In Proceedings of the Casualty Actuarial Society, 1:24–30. 1.

Pinquet, Jean. 1997. “Allowance for Cost of Claims in Bonus-Malus Systems.” ASTIN Bulletin: The Journal of the IAA 27 (1): 33–57.

———. 1998. “Designing Optimal Bonus-Malus Systems from Different Types of Claims.” ASTIN Bulletin: The Journal of the IAA 28 (2): 205–20.

Pinquet, Jean, Montserrat Guillén, and Catalina Bolancé. 2001. “Allowance for the Age of Claims in Bonus-Malus Systems.” ASTIN Bulletin: The Journal of the IAA 31 (2): 337–48.

Purcaru, Oana, and Michel Denuit. 2002. “On the Dependence Induced by Frequency Credibility Models.” Belgian Actuarial Bulletin 2 (1): 73–79.

———. 2003. “Dependence in Dynamic Claim Frequency Credibility Models.” ASTIN Bulletin: The Journal of the IAA 33 (1): 23–40.

Purcaru, Oana, Montserrat Guillén, and Michel Denuit. 2004. “Linear Credibility Models Based on Time Series for Claim Counts.” Belgian Actuarial Bulletin 4 (1): 62–74.

Sundt, Bjørn. 1988. “Credibility Estimators with Geometric Weights.” Insurance: Mathematics and Economics 7 (2): 113–22.

Whitney, Albert W. 1918. “Theory of Experience Rating.” Proceedings of the Casualty Actuarial Society 4: 274–92.