Chapter 15 Large Risks

15.1 Introduction

15.1.1 The Concept of Catastrophe

Before delving into the mathematical approach to catastrophes and extreme events, it is essential to ponder for a moment on the very concept of a . Is it an event characterized by a colossal financial cost? Perhaps, but let us remember that in financial markets, a mere few percentage points’ variation in a stock index can correspond to losses of several billion euros, and yet, no one speaks of a . Should it then be defined by the number of victims? Again, let us recall that tens of thousands of people lose their lives each year on European roads, and these are referred to as , not . However, if a commercial airliner crashes with around a hundred passengers on board, it is indeed referred to as a . As emphasized by (Godard et al. 2003), perhaps it is the simultaneity and the nature of the damages that play a role in the definition of a . It must involve significant consequences originating from the same root cause.

The primary tools used in non-life insurance are the law of large numbers and the central limit theorem (see Chapters 3 and 4). But how can we manage extreme risks with often infinite variance? This chapter aims to provide some insights into this question.

15.1.2 Why Manage Extreme Events Ex Ante?

Historically, the occurrence of a catastrophe was long considered a fatalistic event, with religious connotations, which was perpetuated by the Church. However, in the 16th and 17th centuries, the need for protection entered political thought, particularly with Machiavelli but especially with Hobbes, who placed collective protection at the heart of his analysis in his work in 1651.

On November 1, 1755, an earthquake devastated Lisbon, followed by a tsunami that claimed the lives of 20,000 people (today, we would call it a tsunami). Voltaire then composed a famous poem (), which was considered the first secularization of natural risks: these events were interpreted in geophysical terms rather than theological ones. The flood was no longer a divine punishment but a colossal flood. Disasters were now within the realm of science.

However, history will also remember Rousseau’s response a few weeks later in his : “?”. %As (1995) puts it, “”. Nevertheless, progress prevailing overall, and despite industrialization leading to a significant number of accidents, collective resignation prevailed until the end of the 19th century when the concept of safety was taken into account in work (see, for example, (1986)). The concept of catastrophe would take on the dimension it has today after World War II.

15.1.3 What Types of Catastrophes?

Two types of catastrophes are to be considered: natural disasters and those involving the human factor (corresponding to ).

15.1.3.1 Technological (and “Human”) Catastrophes

In 1974, the explosion at the cyclohexane chemical plant in Flixborough, north of London, killed around thirty people, injured hundreds, and damaged nearly 2,500 homes. At the time, this disaster was considered merely an “accident.” It was the repetition of such events that raised awareness of the risk.

At the end of the 1970s, the chemical accident in Seveso (in 1976), the Amoco-Cadiz oil spill (in 1978), the Three Mile Island nuclear accident (in 1979), the explosion of the Ixtoc-1 offshore platform in Mexico (in 1979), and the railway accident involving the transportation of hazardous materials (TMD), leading to the long-term evacuation of over 200,000 people in Mississauga, Canada (in 1979), introduced the concept of “major” technological risk.

However, these accidents were perceived as false alarms (which highlighted the inefficiencies in crisis management) since they caused no fatalities (at least for the latter incidents). The 1980s marked the transition “from alert to alarm” following the explosion of a gas storage facility in Mexico (in 1984, resulting in several hundred deaths), a methyl isocyanate leak in Bhopal (in 1984, causing several thousand deaths), and the explosion of the Chernobyl nuclear power plant (in 1986, exposing 5 million people to radiation, with at least 1,700,000 being irradiated).

But at that time, the crisis was mainly limited to regions of the Third World or the Soviet bloc, even though incidents allowed Western countries to glimpse the potential risk.

An factory owned by the same group as the one in Bhopal experienced a serious incident in the United States in 1985, and the Sandoz depot caught fire in Basel in 1989. The inability of public authorities to deal with major crises also came to the forefront following the Exxon Valdez oil spill in 1989. However, the list of disasters originating from human factors did not stop there (contaminated blood scandals, asbestos, shipwrecks like the Aegan Sea, Erika, Prestige, or Tasman Spirit, the explosion at the AZF factory in Toulouse, mad cow disease and GMO issues, the September 11 attacks on the World Trade Center in New York, etc.).

15.1.3.2 A Word on Industrial Risk

While not always resulting in catastrophes, industrial risks can generate colossal costs, as evidenced by Table 15.1.

In 1996 in France, out of more than 12,600 business fire claims (amounting to nearly 1.4 billion euros), the top 4% largest claims accounted for over 90% of the total cost. The largest claim (fire at Crédit Lyonnais) alone represented 21% of the total cost, and the second largest (Eurotunnel fire) was 12%. Four years earlier, in 1992, a fire at a Total refinery accounted for 40% of the total claims cost for the year. Finally, more recently, the storms of 1999 cost French insurance companies nearly 3.4 billion euros, equivalent to 75% of the premium income in homeowner’s multi-risk insurance. These orders of magnitude help us better understand the impact of these few claims on a company’s results.

Table 15.1: Cost of commercial fire claims, France, in 1996, with cost distribution for the 12,670 largest claims, and the 5 largest (Source: F.F.S.A. - Fédération Française des Sociétés d’Assurances)
Cost.per.claim Number Cost Propotion
$0-15,000 11121 54.4 M 3.9%
$15,000-75,000 930 57.2 M 4.1%
$75,000-150,000 182 37.7 M 2.7%
$150,000-750,000 299 202.4 M 14.5%
$750,000-1,500,000 64 114.4 M 8.2%
more than $1,500,000 74 915.6 M 65.6%
total 12670 1,395.7 M 100.0%
Crédit Lyonnais 297.3 M 21.3%
Eurotunnel 172.7 M 12.3%
Ste Française Hoechst 25.8 M 1.8%
Superba 15.2 M 1.1%
Pedelhez 14.9 M 1.1%

15.1.3.3 Natural Disasters

It’s worth noting that most major disasters in terms of insured costs (with the exception of the World Trade Center attacks in 2001 and the explosion of an oil platform in 1988) are generally due to natural disasters: hurricanes (typhoons or storms), earthquakes, or floods, as shown in Table 15.2.

While the risk of natural disasters is shared worldwide, its nature is specific to the geographical area considered. In Germany, the north is particularly exposed to severe winds, while the risk of flooding seems to affect the entire country. Belgium is also subject to both of these risks, as flooding can be caused by seawater intrusion or river overflow. Spain is mainly exposed to the risk of flooding. The United States, on the other hand, seems to have a relatively wide range of risks, including earthquakes in California, hurricanes in the east and southeast, and floods in the central region (to which the “tornado alley” should be added).

Table 15.2: Insured amount and number of victims of the largest claims that occurred during the period 1970-2003 (Source: Swiss Re)
Amount Victims Date Event_Country
21062 3025 09/11/2001 Terrorist Attack (WTC), USA
20900 43 08/23/1992 Hurricane Andrew, USA, Bahamas
17312 60 01/17/1994 Northridge Earthquake, USA
7598 51 09/27/1991 Typhoon Mireille, Japan
6441 95 01/25/1990 Winter Storm Daria, France, Europe
6382 110 12/25/1999 Winter Storm Lothar, France, Europe
6203 71 09/15/1989 Hurricane Hugo, Puerto Rico, USA
6203 22 10/15/1987 Storms and Floods, France, Europe
6432 64 02/25/1990 Winter Storm Vivian, Central Europe
4445 26 09/22/1999 Typhoon Bart, Japan
3969 600 09/20/1998 Hurricane Georges, USA, Caribbean
3261 33 06/05/2001 Tropical Storm Allison, Flooding, USA
3205 45 05/02/2003 Thunderstorm, Hail, USA
3100 167 07/06/1988 Piper Alpha Platform Explosion, North Sea
2973 6425 01/17/1995 Kobe Earthquake, Japan
2641 45 12/27/1999 Winter Storm Martin, France, Switzerland
2597 70 09/10/1999 Hurricane Floyd, USA, Bahamas
2548 38 08/06/2002 Floods, Europe
2526 59 10/01/1995 Hurricane Opal, USA, Mexico
2288 26 10/20/1991 Fires, Drought, USA
2277 NA 04/06/2001 Hail, Floods, USA
2220 246 03/10/1993 Blizzard, Tornadoes, USA, Mexico
2090 4 09/11/1992 Hurricane Iniki, USA
1959 23 10/23/1989 Petrochemical Plant Explosion, USA
1899 NA 08/29/1979 Hurricane Frederic, USA

Figure 15.1 shows the trend in costs of disasters.

Insured and uninsured losses, 1970-2019 in billions of dollars at 2020 prices

Figure 15.1: Insured and uninsured losses, 1970-2019 in billions of dollars at 2020 prices

However, it should be noted that it is sometimes difficult to define a “natural disaster,” partly due to the temporal dimension. Some events are very brief (earthquakes, flash floods), while others are much longer (droughts, epidemics). The direct and especially indirect impacts are also difficult to quantify. In the Kobe earthquake of 1995, most victims were burned or asphyxiated in fires caused by gas pipeline ruptures following the earthquake. In most major disasters, it is the indirect impacts (sometimes far from the source of the disaster) that often claim the most lives.

15.1.4 Data Presentation

In the initial sections of this chapter, we will analyze a set of fire claims that occurred in Denmark between 1980 and 1990. The series of 2167 claims is depicted in Figure 15.2, and a fit to common parametric distributions is presented in Figure ??. The solid line that fits the data best, especially for the larger values, corresponds to a Pareto distribution.

Fire claims that occurred in Denmark

Figure 15.2: Fire claims that occurred in Denmark

15.2 Limit Law of Maxima

15.2.1 Behavior of the Maximum in Large Samples

Consider a sample \(X_1, ..., X_n\) of independent random variables with the same distribution function \(F\), and let us define the associated order statistic, denoted as \(X_{1:n}, ..., X_{n:n}\), by \[ \min\{X_1, ..., X_n\} = X_{1:n} \leq X_{2:n} \leq \ldots \leq X_{n-1:n} \leq X_{n:n} = \max\{X_1, ..., X_n\}. \]

Similar to the case of the sum, we can ask if there exists a “law of large numbers” and a “central limit theorem” for the maximum. In other words, can we find a non-degenerate limit law for the maximum by suitably normalizing it? Does this limit law depend on the chosen normalization? Let \(x_F\) be the upper bound of the support of \(X\), possibly infinite, i.e., \[ x_F = \sup\{x \in \mathbb{R} \mid F(x) < 1\} \leq \infty. \]

When considering the distribution function \(X_{n:n}\), the exact formula \[ \Pr[X_{n:n} \leq x] = \big(F(x)\big)^n \] is not very useful. Indeed, for any \(x < x_F,\) \[ \lim_{n\to\infty} \Pr[X_{n:n} \leq x] = \lim_{n\to\infty} \big(F(x)\big)^n = 0. \] If \(x_F < \infty,\) then for any \(x \geq x_F,\) \[ \lim_{n\to\infty} \Pr[X_{n:n} \leq x] = \lim_{n\to\infty} \big(F(x)\big)^n = 1. \]

This degeneracy of the limit explains why the asymptotic theory of extreme values has been developed. Therefore, we need to normalize \(X_{n:n}\) appropriately. Similar to the case of sums, a “central limit theorem” for the maximum will be obtained if suitable normalization constants \(a_n \in \mathbb{R}\) and \(b_n > 0\) can be found such that \[ \Pr\left[\frac{X_{n:n} - a_n}{b_n} \leq x\right] = F^n(a_n + b_n x) \rightarrow G(x), \text{ as } n \rightarrow \infty, \] where \(G\) is a non-degenerate distribution function. The first difficulty is to see whether the form of the limit distribution depends on the sequences \((a_n)\) and \((b_n)\). If it does, then what is the relationship between the different limit laws? This leads us to introduce the following concept.

Definition 15.1 Two random variables \(X\) and \(Y\) are said to be of the same type if there exist two constants \(a \in \mathbb{R}\) and \(b > 0\) such that \(X \stackrel{\text{law}}{=} bY + a\).

In other words, random variables that are “of the same type” have the same probability distribution, up to a translation and scaling factor. It can be shown that if we consider different sequences of normalization, the limit laws may be different, but they are necessarily of the same type. This is precisely the subject of the following result.

Similar to limit laws for sums being stable distributions, the limit laws for maxima satisfy the following stability property.

Definition 15.2 A probability distribution with a distribution function \(G\) is called max-stable if, for every \(n \geq 2,\) there exist constants \(a_n \in \mathbb{R}\) and \(b_n > 0\) such that \[ G^n(a_n + b_n x) = G(x), \text{ for all } x \in \mathbb{R}. \]

The following result indicates that the max-stable distribution functions we have defined correspond to the limit laws of normalized maxima.

Proof. Let’s assume that \[ \lim_{n\rightarrow\infty} F^n(a_n+b_n x) = G(x), \text{ for all } x\in{\mathbb{R}}, \] where \(G\) is a non-degenerate distribution function. Note that for any \(k\in\mathbb{N},\) \[\begin{eqnarray*} \lim_{n\rightarrow\infty} F^{nk}(a_n+b_n x) &=& \left(\lim_{n\rightarrow\infty} F^n(a_n+b_n x)\right)^k\\ &=& G^k(x),\hspace{2mm}x\in{\mathbb{R}}, \end{eqnarray*}\] and furthermore, \[ \lim_{n\rightarrow\infty} F^{nk}(a_{nk}+b_{nk} x) = G(x),\hspace{2mm} x\in{\mathbb{R}}. \] Using the property of limit laws of the same type, we deduce that there exist constants \(a'_n\in{\mathbb{R}}\) and \(b'_n>0\) such that \[ G^k(a'_k+b'_k x) = G(x), \text{ for all } x\in{\mathbb{R}}, \] meaning that the distribution \(G\) is max-stable.

15.2.2 Large Deviations

Consider \(\{X_1,X_2,...\}\) a sequence of independent and identically distributed random variables (e.g., insurance claim costs), and let \(S_n=X_1+...+X_n\). The law of large numbers states that if the \(X_i\) have a finite expectation \(\mu\), then \(S_n/n\) converges in probability to \(\mu\). This result suggests using the pure premium as the “reference premium” (Chapter 3 of Volume 1). However, we can investigate the behavior of \(\Pr[S_n/n>\mu+\varepsilon]\) as \(n\rightarrow\infty\), where \(\varepsilon>0\). Under certain assumptions, it is possible to show that \[ \Pr\left[\frac{S_n}{n}>\mu+\varepsilon\right]\sim \exp(-n\cdot h(\varepsilon)) \text{ as }n\rightarrow\infty, \] where \(h\) depends on the distribution of \(X_i\) (and corresponds to the “Cramér transform”).

The central limit theorem corresponds to a “scaling” of \(S_n/n\) around the mean (by multiplying \(S_n/n-\mu\) by a factor of \(\sqrt{n}\)), which allows us to obtain equivalents of the form \[ \Pr\left[\sqrt{n}\left(\frac{S_n}{n}-\mu\right)> a\cdot \sigma\right]\sim -\phi(a), \] where \(\phi\) represents the density of the \(\mathcal{N}ormal(0,1)\) distribution.

Results related to large deviations are different and were established in the 1930s by Khinchine and Cramér. The idea here is to obtain an upper bound of the form \[ \Pr\left[\frac{S_n}{n}>a\right]\leq \exp(-n\cdot h(a)), \] using Markov’s inequality and assuming the existence of the Laplace transform in a neighborhood of \(0\).

Note that another characterization of max-stable distributions is possible, as shown in the following result.

Proposition 15.1 The distribution function \(G\) corresponds to a max-stable distribution if for any \(X_0\), \(X_1\), …, \(X_n\) independent and identically distributed with the same distribution function \(G\), there exist \(a_n\in{\mathbb{R}}\) and \(b_n>0\) such that \[ \max\left\{X_1,...,X_n\right\}=_{\text{law}}a_n+b_n X_0, \text{ for all }n\geq 2. \]

In this form, we find a relationship very similar to the stability definition encountered in Section 4.2.8 of Volume 1.

Now let’s examine the possible limit distributions for maxima, as given in the following result, which is central in extreme value theory.

Depending on the limit \(G_+\), \(\Lambda\), or \(G_-\), we will say that \(F\) is in the max-domain of attraction of \(G_+\), \(\Lambda\), or \(G_-\). These three distributions are, in fact, the three special cases of the GEV (Generalized Extreme Value) distribution corresponding to the Jenkinson-von Mises representation:

\[ G_{\mu,\sigma,\xi}\left( x\right) =\left\{ \begin{array}{ll} \exp \left( -\left( 1+\xi \frac{ x-\mu}{\sigma} \right)^{-1/\xi}\right), & \text{if }\xi \neq 0, \\ \exp \left( -\exp \left( -\frac{ x-\mu}{\sigma} \right)\right), & \text{if }\xi =0,% \end{array}% \right. \] if \(\mu+\xi x/\sigma>0\). This distribution will be denoted as GEV(\(\mu,\sigma,\xi\)) in the following. Furthermore, we will simply denote \(G_\xi\) as the distribution function \(G_{0,1,\xi}\). Thus, \(F\) belongs to the max-domain of attraction of \(G_\xi\), \(\xi>0\), when \(F\) belongs to the max-domain of attraction of \(G_+\).

15.2.3 Estimation of the Maximum Distribution

In practice, the maximum of \(n\) variables \(X_1,...,X_n\) constitutes only a single observation, making the estimation of the distribution of this observation challenging. Consider a sample \(X_1,...,X_n\) of independent and identically distributed random variables, and let \(Y\) be the maximum. If we have a sample of maxima \(Y_1,...,Y_m\), standard methods of distribution fitting, such as maximum likelihood estimation, can be applied (for example, maximum likelihood estimation).

The idea proposed by (Gumbel 1958) is to consider maxima in blocks: we divide the sample of size \(n\) into \(m\) samples of size \(n/m\). We then denote \(Y_i\) as the maximum obtained within the \(i\)th block (i.e., obtained from \(\{X_{(i-1)n/m+1},X_{(i-1)n/m+2},...,X_{in/m}\}\)). In particular, if \(\xi\neq 0\), the log-likelihood of the sample \(Y_1,...,Y_m\) from a GEV(\(\mu\),\(\sigma\),\(\xi\)) distribution is written as follows:

\[\begin{eqnarray} \log\mathcal{L}&=&-m\log\sigma-(1+\xi^{-1})\sum_{i=1}^m\log \left( 1+\xi \frac{Y_i-\mu }{\sigma }\right)\nonumber\\ &&-\sum_{i=1}^m \left( 1+\xi \frac{Y_i-\mu }{\sigma }\right)^{-1/\xi}. \tag{15.1} \end{eqnarray}\]

If \(\xi= 0\), the log-likelihood becomes:

\[\begin{eqnarray} \log\mathcal{L}&=&-m\log\sigma-\sum_{i=1}^m\exp \left( 1+\xi \frac{Y_i-\mu }{\sigma }\right)\nonumber\\ &&-\sum_{i=1}^m \left( 1+\xi \frac{Y_i-\mu }{\sigma }\right). \tag{15.2} \end{eqnarray}\]

(Hosking and Wallis 1987) showed that if \(\xi>-1/2\), then \[ \sqrt{m}\Big((\widehat{\mu},\widehat{\sigma},\widehat{\xi})-(\mu,\sigma,\xi)\Big) \rightarrow_{\text{law}}\mathcal{N}or(\boldsymbol{0},\boldsymbol{V}), \] where we refer to this author for the form of the variance-covariance matrix \(\boldsymbol{V}\).

15.2.4 The \(n\)-th Largest Value

The results of the previous section focused on the largest observed value among \(n\). In practice, the \(k\) largest values can have a significant impact on the insurer’s result. This leads us to consider the \(k\) largest order statistics.

15.2.4.1 Some Reminders on Order Statistics

From \(n\) independent random variables \(X_1,...,X_n\) with the same distribution function \(F\), it is possible to deduce the distribution of the \(i\)-th order statistic \(X_{i:n}\), as shown in the following result.

Proposition 15.2 The distribution function of \(X_{i:n}\) is given by% \[\begin{equation*} F_{i:n}\left( x\right) =\Pr\left[ X_{i:n}\leq x\right] =\sum_{j=i}^{n}\binom{n}{j}\big( F\left( x\right) \big)^{j}\big( 1-F\left( x\right) \big) ^{n-j}. \end{equation*}\]%

Proof. This result is obtained by noting that the indicators \(\mathbb{I}[{X_{i}\leq x}]\) are independent and identically distributed Bernoulli variables with parameter \(p=F(x)\). If we are interested in the event \(\{X_{i:n}\leq x\}\), we are interested in the case where, in the sample of Bernoulli variables, the value 1 appears more than \(i\) times. Since the number of times 1 appears follows a binomial distribution with parameters \(F(x)\) and \(n\), we obtain the announced result.

Example 15.1 Figure ?? shows the density functions of order statistics for a sample from \(\mathcal{N}or(0,1)\) of size \(n\), as well as the evolution of the density of the maximum for samples of different sizes.

Remark. However, this expression can be simplified by introducing incomplete beta functions. For reference, the beta function is defined, as a generalization of the gamma function, by% \[\begin{equation*} b\left( p,q\right) =\int_{0}^{1}t^{p-1}\left( 1-t\right) ^{q-1}dt=\frac{% \Gamma \left( p+q\right) }{\Gamma \left( p\right) \Gamma \left( q\right) }. \end{equation*}\]% The incomplete beta function is obtained by integrating not over $% $ but over $$, and then normalizing% \[\begin{equation*} b_{s}\left( p,q\right) =\frac{1}{b\left( p,q\right) }\int_{0}^{s}t^{p-1}% \left( 1-t\right) ^{q-1}dt. \end{equation*}\]% Using this notation, it is easy to show that% \[\begin{equation*} F_{i:n}\left( x\right) =b_{F\left( x\right) }\left( k,n-k+1\right). \end{equation*}\] Also, by denoting \(U_{k:n}=F\left( X_{k:n}\right)\), it is straightforward to show that $U_{k:n}et( k,n-k+1) $ for all \(k\).

15.2.4.2 Limiting Behavior when \(k\) is Fixed, and $n-k$

Results similar to those obtained for \(X_{n:n}\) can be established (see, for example, (Galambos 1978) or (Beirlant et al. 2006)). Below, we summarize the most significant facts.

Proposition 15.3 The normalized distribution (using the sequences \((a_n)\) and \((b_n)\)) of the \(k\)th largest value converges to \(G_k\), a non-degenerate distribution, meaning \[ \Pr\left[\frac{X_{n-k+1:n}-a_n}{b_n}\leq x\right]=F_{n-k+1:n}(a_n+b_n x)\rightarrow G_k(x), \] if, and only if, the normalized distribution of the maximum (using the same sequences \((a_n)\) and \((b_n)\)) converges to a GEV distribution with distribution function \(G\). In this case, \[ G_k(x)=G(x) \cdot \left( 1+ \sum_{i=1}^{k-1} \frac{1}{i!} \left(-\log G(x) \right)^i \right). \]

This result can be obtained using the same normalization sequences as for the maximum, and by using the distribution function of the \(n-k\)th value (obtained in Proposition 15.2).

Example 15.2 For the \(\mathcal{N}or(0,1)\) distribution, by setting \(b_n=\Phi^{-1}(1-1/n)\) and \(a_n=1/b_n\), we obtain \[ \Pr[X_{n-k+1:n}\leq a_n+b_n x]\rightarrow \exp(-\exp(-x)), \hspace{2mm}x\geq 0, \] (Gumbel distribution, this result will be detailed in Section 15.4.5). Using this result, for \(k\geq 1\), we obtain that \[ \Pr[X_{n-k+1:n}\leq a_n+b_n x] \] \[ \rightarrow \exp(-\exp(-x))\cdot\left( 1+ \sum_{i=1}^{k-1} \frac{\exp(-ix)}{i!} \right) , \hspace{2mm}x\geq 0. \] For example, for \(k=2\), the limiting distribution is then written as \[ G_2(x)=\exp(-\exp(-x)) + \exp(-\exp(-x)-x) , \hspace{2mm}x\geq 0. \] Also, if \(x\) is sufficiently large, the behavior of the \(2\)nd largest value is close to that of the largest value.

(Beirlant et al. 2006) study more precisely the limiting behavior of order statistics, even if \(k\rightarrow\infty\) but slower than \(n\) (i.e., \(k=o(n)\)).

15.2.5 Random Frequency of Claims

One of the difficulties in insurance is precisely that the number of claims occurring during a period is a random variable. Thus, the sample size \(n\) should be seen as a random variable \(N\).

For this, one can use the so-called Rényi-Mogyor'odi method, which allows us to conclude that if a statistic \(T_n\) satisfies \[ U_n(T,a,b)=\frac{T_n-a_n}{b_n}\rightarrow_\text{law} G, \] non-degenerate, as \(n\rightarrow\infty\), for sequences of constants \((a_n)\) and \((b_n)\), and if \(N_t\) is a process independent of \(X_i\), with \(N_t\rightarrow_\text{probability}+\infty\) as \(t\rightarrow\infty\), then \(U_{N_t}(T,a,b)\) also converges to \(G\).

Relatively general results can be obtained for maxima, with the assumption that \(N_t / t\rightarrow_\text{probability} M\) as \(t\rightarrow\infty\), where \(M\) is a positive random variable. We then have the following result.

Proposition 15.4 If the claim counting process \((N_t)\) is such that \(N_t / t\rightarrow_\text{probability} M\) as \(t\rightarrow\infty\), where \(M\) is a positive random variable, with distribution function \(F_M\), and the variables \(X_i\) are independent and identically distributed, in the max-domain of attraction of a distribution \(H\), with normalization constants \(a_n\) and \(b_n\), then \[ \Pr[X_{N_t:N_t}\leq a_n+b_n x]\rightarrow \int_{\mathbb{R}^+} H^m dF_M(m) \] as \(t\to +\infty\).

The proof of this result is relatively technical and can be found in (Galambos 1978). The general idea is to work with conditioning (with respect to \(N_t = kt\)), and to use results on limiting behaviors for sequences of exchangeable random variables.

Note that results can also be obtained in the case of sums. This concerns behavior in the tails of composite distributions (introduced in Section 7.3 of Volume 1).

15.3 Tail Thickness of Distributions

In actuarial terminology, a risk that can occur with a low, but not zero, probability, and presents a huge loss (which could be a characteristic of a catastrophe) is often called “fat-tailed.” It seems natural, therefore, that the study of these events can be made from the asymptotic behavior of the tail function.

15.3.1 The Concept of Regular Variation

15.3.1.1 Function with Regular Variation

A function \(h:\mathbb{R}^+\rightarrow \mathbb{R}\) is said to be of regular variation at infinity if, and only if, there exists a function \(g\) such that% \[\begin{equation} \lim_{t\rightarrow \infty }\frac{h\left( tx\right) }{h\left( t\right) }% =g\left( x\right) \text{ for all }x>0. (\#eq:def-reg_var-dim-1) \end{equation}\]

It can then be noted that for all \(x,y>0\),% \[\begin{eqnarray*} \lim_{t\rightarrow \infty }\frac{h\left( txy\right) }{h\left( t\right) }% &=&\lim_{t\rightarrow \infty }\frac{h\left( txy\right) }{h\left( tx\right) }% \frac{h\left( tx\right) }{h\left( t\right) }\\ &=&\lim_{t\rightarrow \infty }% \frac{h\left( txy\right) }{h\left( tx\right) }\lim_{t\rightarrow \infty }% \frac{h\left( tx\right) }{h\left( t\right) }\\ &=&g\left( y\right)g\left( x\right) \text{.} \end{eqnarray*}\] In other words, the function \(g\) necessarily satisfies, for all \(x,y>0\), the Cauchy functional equation \[\begin{equation} g\left( xy\right) =g\left( x\right) g\left( y\right) \tag{15.3} \end{equation}\] whose general solution is of the form \(g\left( x\right) =x^{\theta }\) for some \(\theta \in {\mathbb{R}}\).

Definition 15.3 A positive function \(h:\left] 0,+\infty \right[ \rightarrow {\mathbb{R}}\) is said to be of regular variation (at \(+\infty\)) with index \(\alpha\), denoted \(\mathcal{R}(\alpha)\), if \(h\) satisfies \[\begin{equation*} \lim_{x\rightarrow \infty }\frac{h\left( tx\right) }{h\left( x\right) }% =t^{\alpha }\text{ for all }t>0. \end{equation*}\] If \(\alpha =0\), then it is called a slowly varying function, i.e., \(h\left( tx\right) /h\left( x\right) \rightarrow 1\) as $x$ for all \(t\). And if \(\alpha =\infty\), then it is called a rapidly varying function.

Also, trivially, if \(h\) is a function of regular variation with index \(\alpha\), then there exists a slowly varying function \(\mathcal{L}\) such that \(h\left( x\right) =x^{\alpha}\mathcal{L}\left( x\right) .\) Furthermore, it can be noted that if \(\alpha>0\) then \(h(t)\rightarrow + \infty\) as \(t\rightarrow +\infty\), and if \(\alpha<0\) then \(h(t)\rightarrow 0\) as \(t\rightarrow +\infty\). However, no conclusions can be drawn for slowly varying functions.

Among slowly varying functions, constant functions, \(x\mapsto \log x\), iterations of the logarithm (i.e., $x( x) $…), and functions of the form $x( ^{}) $ where \(0<\gamma <1\) are noteworthy.

15.3.1.2 Properties of Functions with Regular Variation

Many other properties can be obtained for functions with regular variation.

Proposition 15.5 The notion of regular variation is stable under integration, differentiation, composition, addition, multiplication, and more. Specifically,

  1. If \(f\) is of regular variation with index \(\alpha>-1\), then any primitive of \(f\) will be of regular variation with index \(\alpha+1\).
  2. If \(f\) is of regular variation with index \(\alpha\), then if its derivative is monotone from a certain \(z>0\), its derivative will be of regular variation with index \(\alpha-1\).
  3. If \(f\) is of regular variation with index \(\alpha\), then \(f^\kappa\) will be of regular variation with index \(\alpha\kappa\).
  4. If \(f\) and \(g\) are of regular variation with indices \(\alpha\) and \(\beta\), respectively, \(f+g\) will be of regular variation with index \(\max\{\alpha,\beta\}\), and \(f \circ g\) will be of regular variation with index \(\alpha\beta\).

15.3.1.3 Regular Variation for Random Variables

The following result, sometimes called the Tauberian theorem, links the behavior of the tail distribution to the limiting behavior of its Laplace transform.

Theorem 15.1 A random variable \(X\) with cumulative distribution function \(F\) and Laplace transform \(L_F\) is said to be of regular variation with index \(-\alpha\), where \(\alpha \leq 0\), if one of the following equivalent conditions holds, where \(\mathcal{L}_F\), \(\mathcal{L}_{F^{-1}}\), \(\mathcal{L}_L\), and \(\mathcal{L}_f\) represent slowly varying functions:

  1. The survival function \(\overline{F}\) is of regular variation with index \(-\alpha\), meaning that \(\overline{F}\) has the representation \[ \overline{F}(x)=x^{-\alpha}\mathcal{L}_F\left( x\right). \]
  2. The quantile function, or Value-at-Risk, is of regular variation, defined as \[\begin{equation*} F^{-1}\left(1- 1/x\right) =x^{1/\alpha }\mathcal{L}_{F^{-1}}\left( x\right). \end{equation*}\]
  3. The Laplace transform satisfies \(L_{F}\left( t\right) \sim t^{\alpha }\mathcal{L}_L\left( 1/t\right)\).
  4. If the density exists and satisfies \(xf(x)/\overline{F}(x)\rightarrow \alpha\) as \(x\rightarrow\infty\), then the density is of regular variation with index \(-(1+\alpha)\), \[ f(x)=x^{-(1+\alpha)}\mathcal{L}_f\left( x\right). \]

A proof of this result can be found in (Bingham, Goldie, and Teugels 1989). We emphasize here the connection between the properties of the survival function at \(+\infty\) and the behavior of the Laplace transform at \(0\). Recall that in the collective model case, it can be simpler to work with the Laplace transform. Additionally, note that if \(X\) is a positive random variable of regular variation with index \(-\alpha\), then \(\mathbb{E}[X^\kappa]<\infty\) if \(\alpha < \kappa\) and \(\mathbb{E}[X^\kappa]= \infty\) if \(\alpha > \kappa\).

Example 15.3 For a Pareto distribution, \[ \overline{F}(x)=\left(\frac{\theta}{\theta+x}\right)^\alpha, \text{ for }x\geq 0, \] the survival function is of regular variation with index \(-\alpha\). Furthermore, the quantile function is given by \[ F^{-1}(x)=\theta\cdot\left((1-x)^{-1/\alpha}-1\right), \] which can be written as \[ F^{-1}(1-1/x)=\theta\cdot\left(x^{1/\alpha}-1\right), \] and it is of regular variation with index \(1/\alpha\). Finally, the Laplace transform does not have a simple explicit form. However, an asymptotic expansion can provide an equivalent expression near \(0\).

15.3.1.4 Stability of the Property of Regular Variation by Convolution

Note that the property of regular variation is stable under convolution. That is, if $=x^{-}{X}(x) $ and $=y^{-}{Y}( y) $ where \(X\) and \(Y\) are two independent risks, then \(X+Y\) is of regular variation with parameter $-$, i.e., \[ \Pr\left[ X+Y>s\right] \sim s^{-\alpha }\big( \mathcal{L}% _{X}\left( s\right) +\mathcal{L}_{Y}\left( s\right) \big) , \] as $s$.

This result naturally extends to the sum of \(n\) random variables \(X_1,\ldots,X_n\). More precisely, if $=x^{-}( x) $ for \(i=1,...,n\), then, denoting \(S_{n}=X_{1}+...+X_{n}\), we have \[ \Pr\left[ S_{n}>x\right] \sim ns^{-\alpha }\mathcal{L}\left( x\right) =n\Pr\left[ X>x\right] , \] as \(s\rightarrow \infty .\)

Moreover, if \(X_1,\ldots,X_n\) are independent and identically distributed with a distribution of regular variation, then \[\begin{equation} \Pr\left[ X_{1}+...X_{n}>x\right] \sim \Pr\left[ \max \left\{ X_{1},...,X_{n}\right\} >x\right] , \tag{15.4} \end{equation}\] as \(x\rightarrow \infty\). This means that the tail distribution of the sum of \(n\) losses corresponds to the tail distribution of the maximum (in this case, we talk about sub-exponential distributions).

15.3.1.5 Karamata’s Theorem: Integration of Functions of Regular Variation

Karamata’s theorem allows us to obtain results on moments of any order for densities of regular variation. More precisely, we have the following result:

Proposition 15.6 If \(h\) is a function of regular variation with index \(\alpha\), where \(\alpha<-1\), then \(\int_{y}^\infty x^\alpha h(x)dx\) converges, and (under some technical conditions), \[ -\frac{y^{\alpha+1} h(y)}{\int_{y}^\infty x^\alpha h(x)dx}\rightarrow \alpha +1,\text{ as } y\rightarrow\infty. \]

15.3.2 Regular Variation and the Max-Domain of Attraction of the Fréchet Distribution

In this section, we will relate the behavior of regular variation to the limiting behavior in terms of the max-domain of attraction, in the case of Fréchet or Gumbel limit distributions.

Proposition 15.7 A cumulative distribution function \(F\) belongs to the max-domain of attraction of the Fréchet distribution with parameter \(\xi>0\) if, and only if, \(\overline{F}\) is a function of regular variation with index \(-1/\xi\), i.e., \(\overline{F}(x)=x^{-1/\xi}\mathcal{L}(x)\), where \(\mathcal{L}\) is a slowly varying function.

This characterization of laws in the max-domain of attraction for Fréchet distributions using regular variation results is unfortunately not generalizable to laws in the max-domain of attraction of the Gumbel distribution. To address Gumbel distributions, second-order conditions are required, which will not be presented here.

15.3.3 Sum, Maximum, and Subexponential Distribution

15.3.3.1 Definition

A particularly useful concept in actuarial science, known as subexponentiality, is defined below.

Definition 15.4 A cumulative distribution function \(F\), with support in \(\mathbb{R}^{+}\), is called subexponential if \[ \lim_{x\rightarrow \infty }\frac{\overline{F}^{* (2)}(x)}{\overline{F}(x)}=2. \] We will denote by \(\mathcal{S}\) the class of subexponential distribution functions.

A recursive argument leads to the following result.

Proposition 15.8 The cumulative distribution function \(F\) with \(F(0)=0\) is subexponential if, and only if, for \(n=2,3,\ldots\) \[\begin{equation} \lim_{n\to\infty}\frac{\overline{F}^{* (n)}(x)}{\overline{F}(x)}=n. \tag{15.5} \end{equation}\]

15.3.3.2 Properties

For any cumulative distribution function \(F\), the following equality holds: \[\begin{equation} \frac{\overline{F}^{* (2)}(x)}{\overline{F}(x)}=1+\int_{0}^{x}\frac{\overline{F}(x-y)}{\overline{F}(x)}\,dF(y) \tag{15.6} \end{equation}\] since \[\begin{eqnarray*} \overline{F}^{* (2)}(x)&=&1-F^{* (2)}(x)=1-\int_{0}^{x}F(x-y)\,dF(y)\\ &=&1-\int_{0}^{x}\,dF(y)+\int_{0}^{x}\overline{F}(x-y)\,dF(y) \\ &=&1-F(x)+\int_{0}^{x}\overline{F}(x-y)\,dF(y). \end{eqnarray*}\] We can state the following result:

Proposition 15.9 If \(F\in\mathcal{S}\), then for any \(y>0\), we have \[\begin{equation} \lim_{x\to\infty}\frac{\overline{F}(x-y)}{\overline{F}(x)}=1 \tag{15.7} \end{equation}\] and \[\begin{equation} \lim_{x\to\infty}\int_{0}^{x}\frac{\overline{F}(x-y)}{\overline{F}(x)}\,dF(y)=1 \tag{15.8} \end{equation}\]

Proof. For any \(y\leq x\), identity (15.6) implies \[ \frac{\overline{F}^{* (2)}(x)}{\overline{F}(x)}=1+\int_{0}^{y}\frac{\overline{F}(x-y)}{\overline{F}(x)}\,dF(y)+\int_{y}^{x}\frac{\overline{F}(x-y)}{\overline{F}(x)}\,dF(y). \] From the inequalities \[ \int_{0}^{y}\frac{\overline{F}(x-t)}{\overline{F}(x)}\,dF(t)\geq F(y) \] and \[ \int_{y}^{x}\frac{\overline{F}(x-t)}{\overline{F}(x)}\,dF(t)\geq \frac{\overline{F}(x-y)}{\overline{F}(x)}(F(x)-F(y)), \] we deduce that \[ 1\leq \frac{\overline{F}(x-y)}{\bar{F}(x)}\leq \bigg(\frac{\overline{F}^{* (2)}(x)}{\overline{F}(x)}-1-F(y)\bigg)\big(F(x)-F(y)\big)^{-1} . \] Since \[ \lim_{x\to\infty}\bigg(\frac{\overline{F}^{* (2)}(x)}{\overline{F}(x)}-1-F(y)\bigg)\Big(F(x)-F(y)\Big)^{-1}=1, \] by the definition of elements in \(\mathcal{S}\), we deduce that (15.7) is satisfied. Relation (15.8) is a direct consequence of (15.6).

15.3.3.3 Connection with the Maximum Distribution

As seen in (15.4), for risks with regular variation, the tail distribution of their sum is equivalent to that of their maximum. This property is characteristic of subexponential distributions, as shown by the following result.

Proposition 15.10 If \(X_1, \ldots, X_n\) are \(n\) independent random variables with the same cumulative distribution function \(F \in \mathcal{S}\), then \[\begin{equation} \lim_{x\to\infty}\frac{\Pr\lbrack X_{n:n}> x \rbrack}{\Pr\lbrack S_n>x \rbrack}=1 \tag{15.9} \end{equation}\]

Proof. First, let’s recall that for any real number \(a\) and any integer \(n \geq 1\), we have the identity: \[ 1-a^n=(1-a)\sum_{k=0}^{n-1}a^{k}. \] Since \(\lim_{x\to\infty}F(x)=1\), we also have for any \(n \geq 1\): \[ \lim_{x\to\infty}\sum_{k=0}^{n-1}F^{k}(x)=n. \] From these two identities, it follows that: \[\begin{eqnarray*} 1&=&\lim_{x\to\infty}\frac{\overline{F}^{* (n)}(x)}{n\overline{F}(x)}\\ &=&\lim_{x\to\infty}\frac{\overline{F}^{* (n)}(x)}{\big(1-F(x)\big)\sum_{k=0}^{n-1}F^{k}(x)}\text{ with }a=F(x)\\ &=&\lim_{x\to\infty}\frac{\overline{F}^{* (n)}(x)}{1-F^{n}(x)}\\ &=&\lim_{x\to\infty}\frac{\Pr\lbrack X_{n:n}> x \rbrack}{\Pr\lbrack S_n>x \rbrack}, \end{eqnarray*}\] which proves the stated property.

The behavior in the tails of a sum of subexponential risks is essentially determined by the maximum distribution.

15.3.3.4 High-Risk Index

Here we define a high-risk index that allows for the empirical detection of dangerous situations for insurers.

Definition 15.5 If \(F_{X}\) is a continuous cumulative distribution function with finite expectation $$, then we define the high-risk index as follows: \[ D_{F_{X}}\left( p\right) =\frac{1}{\mathbb{E}\left[ X\right] }% \int_{1-p}^{1}F_{X}^{-1}\left( t\right) dt\text{ for }p\in \left[ 0,1\right] \text{.} \]

Note the similarity between the index defined above and the Value-at-Risk (VaR) introduced in Chapter 5. Also note that $p-D_{F_{X}}( 1-p) $ represents the Lorenz curve.

The empirical version of \(D_{F_X}(p)\) based on a sample of \(n\) independent random variables with the same distribution \(X_1, X_2, \ldots, X_n\) is then $T_{n}( p) $, which represents the proportion of the $$ largest claims relative to the sum of all claims, i.e.: \[ T_{n}\left( p\right) =\frac{X_{1:n}+X_{2:n}+\ldots+X_{\left[ np\right] :n}}{% X_{1}+X_{2}+\ldots+X_{n}}\text{ where }\frac{1}{n}<p\leq 1\text{.} \]% In fact, $T_{n}( p) $ can also be written as% \[ T_{n}\left( p\right) =D_{\widehat{F}_{n}}\left( \frac{\left[ np\right] }{n}-% \frac{1}{n}\right) . \]

Example 15.4 Figure ?? shows the evolution of \(T_n\) for Danish fire claims and for common parametric distributions (log-normal, exponential, Cauchy, etc.) for \([np]=1,2,3,4\) (from bottom to top). It should be noted that Danish fire claims are indeed high-risk, and the shape of the curves resembles that of the Pareto distribution.

15.4 Study of the Excess Distribution

The study of the maximum has historically been the first method to analyze extreme events. However, in reinsurance, another point of interest can be the distribution of excesses over a sufficiently high threshold \(u\) (i.e., the distribution of \(X\) given \(X>u\)). This section provides approximations for the excess distribution.

15.4.1 Generalized Pareto Distribution

15.4.1.1 Definition

In this section, we are interested in the behavior of \(X|X>u\) for sufficiently high thresholds \(u\), i.e., as \(u\rightarrow\infty\). This study naturally introduces the Generalized Pareto Distribution (GPD), which is defined as follows:

Definition 15.6 Let \(H\) be the cumulative distribution function defined by% \[ H\left( x\right) =\left\{ \begin{array}{ll} 1-\left( 1+\xi x\right) ^{-1/\xi }, & \text{if }\xi \neq 0, \\ 1-\exp \left( -x\right), & \text{if }\xi =0,% \end{array} \right. \] for \(x\geq 0\) if \(\xi \geq 0\), and $0x/$ if \(\xi<0\). This cumulative distribution function corresponds to the Generalized Pareto Distribution $( GPD) $ with parameter $$. A three-parameter version \(\mu\), \(\sigma\), and \(\xi\) of this distribution, denoted as GPD\((\xi,\sigma,\mu)\), is obtained by replacing \(x\) with $( x-) /$; the corresponding cumulative distribution function is denoted as \(H_{\xi,\sigma,\mu}\).

Note that the parameter \(\sigma\) plays a fundamental role here, and we denote \(H_{\xi ,\sigma }\) as the distribution obtained when \(\mu =0\), with the cumulative distribution function being% \[ H_{\xi ,\sigma }\left( x\right) =1-\left( 1+\xi \frac{x}{\sigma }\right) ^{-1/\xi }. \] This distribution will also be denoted as GPD\((\xi ,\sigma )\).

15.4.1.2 Properties

If \(Y\sim\mathcal{E}xp(1)\), then \[\begin{eqnarray} X=\frac{\beta}{\xi}\big( \exp(\xi Y)-1 \big) \end{eqnarray}\] follows a GPD\((\xi,\beta)\) distribution. Thus, the GPD can be seen as a log-gamma distribution.

Let \(X\) be a random variable following GPD\((\xi,\beta)\). We have \(\mathbb{E}[X]<\infty\) if and only if \(\xi<1.\) In this case, \[\begin{eqnarray} e(u) &=&\frac{\beta + \xi u}{1-\xi}\\ \mathbb{E}[X] &=&\frac{\beta}{1-\xi}. \end{eqnarray}\] If \(\xi<\frac{1}{2}\), then \(\mathbb{V}[X]\) exists and is equal to \[\begin{eqnarray} \mathbb{V}[X] &=&\frac{\beta^{2}}{(1-\xi)^{2}(1-2\xi)}. \end{eqnarray}\]

More generally, the moments of \(X\sim\text{GPD}(\xi,\sigma)\) can be obtained using the relationships \[ \mathbb{E}\left[\left( 1+\frac{\xi }{\sigma }X\right) ^{-r}\right]=\frac{1}{1+\sigma r}\text{ where }r>-1/\xi \text{, } \]% and \[ \mathbb{E}\left[X^{r}\right]=\frac{\sigma ^{r}}{\xi ^{r+1}}\frac{\Gamma \left( 1/\xi -r\right) }{\Gamma \left( 1/\xi +1\right) }r!\text{ for $\xi <1/r$ and $r\in {\mathbb{N}}$}. \]

15.4.1.3 Stability

The GPD is stable when considering exceedances above a given threshold, as shown in the following result.

Proposition 15.11 If \(X\) is a GPD random variable with cumulative distribution function \(H_{\xi ,\sigma }\), then \(\lbrack X-u\mid X>u \rbrack\) is also GPD with cumulative distribution function \(H_{\xi ,\sigma +\xi u}\) for any threshold \(u>0.\)

Proof. Let’s prove the result for \(\xi\neq 0\). It suffices to write \[\begin{eqnarray*} \Pr \lbrack X-u>x\mid x>u \rbrack &=&\frac{\overline{H}_{\xi ,\sigma }(x+u)} {\overline{H}_{\xi,\sigma}(u)}\\ &=&\Big(1+\xi \frac{x}{\sigma+\xi u}\Big)^{-1/\xi}\\ &=&\overline{H}_{\xi,\sigma+\xi u}(x), \end{eqnarray*}\] which proves the result. %From %\[ %\lim_{x\to 0}\frac{\ln (1+\xi x/\beta)}{\xi}=\frac{x}{\beta} %\]

15.4.1.4 GPD and Mixture of Exponentials

By setting \(\alpha=1/\xi\) (this notation will be used throughout this chapter) and \(\beta=\sigma/\xi\), it is possible to rewrite the cumulative distribution function of the GPD (with \(\mu=0\)) as \[ F_{\alpha,\beta}(x)=F(x|\alpha,\beta)=1-\left(1+\frac{x}{\beta}\right)^{-\alpha}, \hspace{2mm}x\geq 0, \] and the probability density function as \[ f(x|\alpha,\beta)=\frac{\alpha}{\beta}\left(1+\frac{x}{\beta}\right)^{-\alpha-1}, \hspace{2mm}x\geq 0. \]

In the case where \(\xi>0\), this last expression can be rewritten (as shown by (Reiss, Thomas, and Reiss 1997)) as a mixture \[ f(x|\alpha,\beta)=\int_0^\infty y\exp(-\theta y) \pi(\theta|\alpha,\beta)d\theta, \hspace{2mm}x\geq 0, \] where the mixing distribution \[ \pi(\theta|\alpha,\beta)=\frac{\beta^\alpha}{\Gamma(\alpha)}\theta^{\alpha-1}\exp(-\beta\theta), \hspace{2mm}\theta\geq 0, \] corresponds to the density of the Gamma distribution with parameters \(\alpha\) and \(\beta\).

Thus, the Generalized Pareto Distribution with \(\xi>0\) appears as a mixture of exponential distributions (which are in the maximum domain of attraction of the Gumbel distribution), with the parameter following a Gamma distribution. Thus, a mixture of light-tailed distributions can lead to a heavy-tailed distribution.

15.4.2 Pickands-Balkema-de Haan Theorem

The Generalized Pareto Distribution (GPD) is particularly interesting because it corresponds to the limit distribution of excesses, as shown in the following result (known as the Pickands-Balkema-de Haan theorem).

(#thm:Propo14.4.2) The cumulative distribution function \(F\) belongs to the maximum domain of attraction of \(G_{\xi}\) if, and only if, \[ \lim_{u\rightarrow x_{F}}\sup_{0<x<x_{F}}\left\{ \left\vert \Pr\left[ X-u\leq x|X>u\right] -H_{\xi ,\sigma \left( u\right) }\left( x\right) \right\vert \right\} =0, \]% where $( ) $ is a positive function.

This fundamental result shows that the tail index \(\xi\) involved in the maximum limit is identical to the parameter \(\xi\) of the GPD describing the behavior of exceedances above a sufficiently high threshold \(u\). This demonstrates the connection between the maximum domain of attraction (and the Generalized Extreme Value distribution of the maximum) and the limiting behavior of the distribution of excesses.

The GPD approximation from Proposition @ref(prp:Propo14.4.2) is illustrated in Figure @ref(fig:Figure14.9) for Danish fire insurance claims, with a threshold \(u=10\).

15.4.3 Number of Exceedances

Let’s show that the number of exceedances above a sufficiently high threshold approximately follows a Poisson distribution.

Proposition 15.12 Let \(N_{n}\) be the number of exceedances above the threshold \(u_{n}\) in a sample of size \(n\). If the sequence \((u_{n})\) of thresholds satisfies \[\begin{eqnarray} \lim_{n\to \infty}n\overline{F}(u_{n}) =\tau \tag{15.10} \end{eqnarray}\] then \(N_n\to_{\text{law}}\mathcal{P}oi(\tau)\) as \(n\to +\infty\).

Proof. By definition, \[ N_{n}=\sum_{i=1}^{n}\mathbb{I}\lbrack X_{i}>u_{n}\rbrack\sim\mathcal{B}in\big(n,\overline{F}(u_{n})\big). \] Since \(\mathbb{E}\lbrack N_{n} \rbrack=n\bar{F}(u_{n})\) and \[ \lim_{n \to \infty}\mathbb{E}\lbrack N_{n} \rbrack=\lim_{n \to \infty}n\overline{F}(u_{n})= \tau, \] we have \(N_{n}\to_{\text{law}}\mathcal{P}oi(\tau )\) as \(n\to +\infty\).

Thus, the number of claims above a sufficiently high threshold \(u\) approximately follows a Poisson distribution with mean \(\lambda_u\) given by $ n _{{},}(u)$.

15.4.4 Sample Size with a Poisson Distribution

Often, the sample size needs to be considered as a random variable. Indeed, if we are interested in the largest claim that will affect the portfolio next year, we are actually looking at the maximum of a random number of observations (since the number of claims that will affect the portfolio next year is unknown). In this section, we consider the classic case where the number of observations follows a Poisson distribution.

Proposition 15.13 Consider a sequence \(X_1,X_2,\ldots\) of independent random variables with the same cumulative distribution function \(H_{\xi,\beta}.\) If \(N\) is a random variable with a \(\mathcal{P}oi(\lambda)\) distribution, independent of the \(X_i\), then \(M_{N}=\max\{X_{1},\ldots,X_{N}\}\) follows a GEV\((\mu,\sigma,\xi)\) distribution, where \[ \mu=\beta\xi^{-1}(\lambda ^{\xi}-1)\text{ and }\sigma =\beta\lambda ^{\xi}. \]

Proof. Let’s first consider \(\xi\ne 0\) and determine in this case the probability that the maximum claim \(M_{N}\) does not exceed a given threshold \(x\): \[\begin{eqnarray*} \Pr\lbrack M_{N}\leq x \rbrack &=&\Pr\lbrack \max\{X_{1},\ldots,X_{N}\}\leq x \rbrack \\ & =&\sum_{n=0}^{\infty}\Pr\lbrack \max\{X_{1},\ldots ,X_{N}\}\leq x \rbrack \Pr[N=n]\\ & =&\sum_{n=0}^{\infty}\big(\Pr[X_{1}\leq x]\big)^{n}\Pr\lbrack N=n\rbrack\\ & =&\sum_{n=0}^{\infty}\frac{\big(\lambda H_{\xi,\beta}(x)\big)^n}{n!} \exp\{-\lambda \}\\ & =&\exp\Big(-\lambda \big(1-H_{\xi,\beta}(x)\big)\Big)\\ & =&\exp\Big(-\lambda \Big(1 + \xi\frac{x}{\beta}\Big)^{-1/\xi}\Big)\\ & =&\exp\Bigg(-\bigg( 1+\xi \frac{x-\xi^{-1}\beta(\lambda^{\xi}-1)}{\beta\lambda^{\xi}}\bigg)^{-1/\xi}\Bigg), \end{eqnarray*}\] which proves the result when \(\xi\neq 0\). In the case of \(\xi=0\), it is sufficient to note that \[ \Pr\lbrack M_{N}\leq x\rbrack=\exp\left(-\exp\left(-\frac{1}{\beta}(x-\beta \ln \lambda)\right)\right). \]

15.4.5 Examples of Tail Behavior

15.4.5.1 Exponential Distribution

Consider the \(\mathcal{E}xp(1)\) distribution with the cumulative distribution function \(F(x)=1-e^{-x}\). Setting \(a_n=1\) and \(b_n=\log n\), then \[\begin{eqnarray*} F^n(a_nx+b_n)&=& \left(1-e^{-x-\log n}\right)^n\\ &=&\left(1-{e^{-x}\over n}\right)^n\rightarrow \exp\left(-e^{-x}\right), \end{eqnarray*}\] which means that the limit distribution of the normalized maximum is the Gumbel distribution (with GEV parameter \(\xi=0\)). Also, the distributions in the maximum domain of attraction of the Gumbel distribution are sometimes referred to as exponential-type distributions.

For the threshold approach, note that, \[ \Pr\left[ X>u+y|X>u\right] =\frac{1-F\left( u+y\right) }{1-F\left( u\right) }=\exp(-y), \]% for all \(y>0\). Therefore, the limit distribution is the GPD with \(\xi =0\) and \(\sigma _{u}=1\). Note that in this case, the GPD is not just the limit distribution, but it is the exact distribution for any \(u\). Also, note that for both approaches, we obtain the limiting case of a tail index of zero (\(\xi=0\)).

15.4.5.2 Pareto Distribution Case

Consider the Pareto distribution with the tail function \(\overline{F}(x)= cx^{-\alpha}\), where \(c>0\), \(\alpha>0\). Let \(b_n=0\) and \(a_n=(nc)^{1/\alpha}\), then for \(x>0\), \[\begin{equation*} F^n(a_nx)= \Big(1-c\left(a_nx\right)^{-\alpha}\Big)^n=\left(1-{x^{-\alpha}\over n}\right)^n\rightarrow\exp\left(-x^{-\alpha}\right), \end{equation*}\] where the limit corresponds to the Fréchet distribution. Also, distributions in the domain of attraction of the Fréchet distribution are sometimes called Pareto-type distributions.

For the threshold approach, let \(\sigma(u)=ub\) where \(b>0\), and denote \(F_u\) as the cumulative distribution function of excesses above the threshold \(u\), such that \[\begin{eqnarray*} F_u(\sigma(u) z)&=&{F(u+ubz)-F(u)\over 1-F(u)}\\ &\sim&{cu^{-\alpha}-c(u+ubz)^{-\alpha}\over cu^{-\alpha}}\\ &=&1-(1+bz)^{-\alpha}. \end{eqnarray*}\] By setting \(\xi={1/\alpha}>0\) and \(b=\xi\), the limit is then the GPD with parameter \(\xi\).

15.4.5.3 Bounded (Right-Tailed) Distribution Case

Finally, consider the \(\mathcal{Uniform}(0,1)\) distribution with the cumulative distribution function \(F\left( x\right) =x\) on $$. For \(x<0\), assume that \(n>-x\) and denote \(a_{n}=1/n\) and \(b_{n}=1\). Then% \[ F^{n}\left( a_{n}x+b_{n}\right) =F^{n}\left( n^{-1}x+1\right) =\left( 1+% \frac{x}{n}\right) ^{n}\rightarrow e^{x} \]% as $n$. Thus, the limit distribution is the Weibull distribution with parameter \(% \xi =-1\).

In terms of the distribution of excesses, note that \[\begin{eqnarray*} \Pr\left[ X>u+y|X>u\right] &=&\frac{1-F\left( u+y\right) }{1-F\left( u\right) }\\ &=&\frac{1-\left( u+y\right) }{1-u}\\ &=&1-\frac{y}{1-u}, \end{eqnarray*}\] for \(0\leq y\leq 1\), which corresponds to the GPD with parameter \(\xi =-1\) (obtained by considering \(\sigma _{u}=1-u\) in the Balkema-Pickands-de Haan theorem).

15.4.5.4 Normal Distribution Case

As usual, let \(\Phi(\cdot)\) denote the cumulative distribution function of the \(\mathcal{N}or(0,1)\) distribution. Since \[ 1-\Phi(x)\sim {1\over x\sqrt{2\pi}}\exp(-x^2/2) \text{ as }x\rightarrow\infty, \] we can deduce that as \(u\rightarrow\infty\), \[\begin{eqnarray*} \frac{1-\Phi(u+z/u)}{1-\Phi(u)} &\sim &\left(1+{z\over u^2}\right)^{-1}\cdot \exp\left( -{1\over 2}\left(u+{z\over u}\right)^2+{1\over 2}u^2\right)\\ &\sim &\exp(-z). \end{eqnarray*}\]

Therefore, considering \(b_n\) as a solution of \(\Phi(b_n)=1-1/n\) and \(a_n=1/b_n\), we obtain \[ n\big(1-\Phi\left(a_nx+b_n\right)\big) ={1-\Phi\left(a_nx+b_n\right)\over 1-\Phi\left(b_n\right)}\rightarrow \exp(-x) \] so that \[ \lim_{n\rightarrow\infty}\Phi^n\left(a_nx+b_n\right) =\lim_{n\rightarrow\infty}\left(1-{\exp(-x)\over n}\right)^n =\exp\left(-\exp(-x)\right), \] meaning that the limit of the maximum is the Gumbel distribution.

In terms of the distribution of excesses, let \(\sigma(u)=1/u\), and then \[ {\Phi(u+\sigma(u) z)-\Phi(u)\over 1-\Phi(u)}\rightarrow 1-\exp(-z) \text{ as }u\rightarrow\infty, \] so the limit distribution corresponds to that obtained in the case of the exponential distribution.

15.4.6 Heavy-Tailed Distributions: the Max-Domain of Attraction of the Fréchet Distribution

The Pickands-Balkema-de Haan theorem states that even if the limit distributions belong to different families, the behavior of the tail and the maximum is similar and only involves a single parameter \(\xi\). More precisely, these two approaches allow distinguishing between two cases, \(\xi=0\) and \(\xi>0\), corresponding respectively to exponential-type tails or heavy-tailed tails, such as Pareto-type tails.

Among the distributions belonging to the max-domain of attraction of the Fréchet distribution (\(\xi>0\)), we note: 1. The Cauchy distribution, with density \(f(x)=(\pi(1+x^2))^{-1}\), \(x\in{\mathbb{R}}\). 2. \(\alpha\)-stable distributions. 3. The Pareto distribution for \(\alpha>1\), with density \(f(x)=\alpha x^{-(\alpha+1)}\), \(x>1\). 4. The log-gamma distribution, with density \[ f(x)=\alpha^{\beta}\Gamma(\beta)^{-1}(\log x)^{\beta-1}x^{-\alpha-1}, \hspace{2mm}x>1. \] 5. The Student’s t-distribution, with density \[ f(x)=\Gamma((n+1)/2)\left(1+x^2/n\right)^{-(n+1)/2}/\sqrt{n\pi}\Gamma(n/2), \hspace{2mm}x\in{\mathbb{R}}. \]

The characterization of the distributions belonging to the max-domain of attraction of the Fréchet distribution can be done by studying the behavior of the tail function (in terms of regular variation). As noted before, \(F\) belongs to the max-domain of attraction of the Fréchet distribution with parameter \(\xi>0\) if, and only if, \(\overline{F}\) belongs to the class \(\mathcal{R}_{-1/\xi}\) of regularly varying functions of index \(-1/\xi\), meaning \[ \lim_{x\rightarrow\infty}\frac{\overline{F}(tx)}{\overline{F}(x)}=t^{-1/\xi},\hspace{2mm} t>0. \] Also, the tail function decreases as a power function.

The following result completes the description of the max-domain of attraction of the Fréchet distribution.

Proposition 15.14 If \(F\) has \(f\) as its density function and \(r\) as its hazard rate, and \[ \lim_{x\rightarrow\infty}\frac{xf(x)}{\overline{F}(x)}=\lim_{x\rightarrow\infty} xr(x) =\frac{1}{\xi}>0, \] then \(F\) belongs to the max-domain of attraction of the Fréchet distribution.

15.4.7 Thin-Tailed Distributions: the Max-Domain of Attraction of the Gumbel Distribution

If the tail function decreases polynomially for elements in the max-domain of attraction of the Fréchet distribution, it decreases exponentially for elements in the max-domain of attraction of the Gumbel distribution. Among the distributions belonging to the max-domain of attraction of the Gumbel distribution (\(\xi=0\)), we note:

  • The exponential distribution.
  • The gamma distribution.
  • The normal distribution.
  • The log-normal distribution.

Here, we assume that \(x_F=\infty\). The following result completes the description of the max-domain of attraction of the Gumbel distribution.

Proposition 15.15 If \(F\) is continuous on \(]x_0,\infty[\), has density \(f\), and has a hazard rate \(r\), and \[ \lim_{x\rightarrow\infty}\frac{xf(x)}{\overline{F}(x)}=\lim_{x\rightarrow\infty} xr(x) =\infty, \] then \(F\) belongs to the max-domain of attraction of the Gumbel distribution.

15.5 Estimation of Extreme Quantiles

Among all the concepts for evaluating extreme risks, the probabilities of rare events (costs exceeding a very high threshold) or, conversely, quantiles (or VaR) associated with high levels (e.g., \(99\%\), \(99.9\%\), \(99.99\%\)…) are the actuary’s preferred tools. Estimating the tail index \(\xi\) indicating the significance of extreme risks for a given distribution is necessary for estimating these quantities.

15.5.1 Estimation of the Tail Index

15.5.1.1 Using the Generalized Extreme Value Distribution (GEV)

Consider independent and identically distributed random variables \(X_1,...,X_n\), and let \(Y\) be the maximum. If we have a sample of maxima \(Y_1,...,Y_m\), classical methods of distribution fitting would be possible (maximum likelihood, for example, with log-likelihoods (15.1)-(15.2).

Example 15.5 In the case of Danish fire insurance claims, the maximum likelihood estimation based on maxima by blocks provides the following estimates: \[ \small{ \begin{tabular}{|l|ccc|}\hline & $\widehat{\mu }$ & $\widehat{\sigma }$ & $\widehat{\xi }$ \\\hline Estimation & 1.483 & 0.593 & 0.917 \\ Standard Deviation & 0.015 & 0.018 & 0.030 \\\hline \end{tabular}} \] Figure ?? provides a detailed view of these results.

15.5.1.2 Using the Generalized Pareto Distribution (GPD)

It is also possible to use the maximum likelihood method to estimate the parameters \(\xi\) and \(\sigma\) of the GPD (Generalized Pareto Distribution) for excesses, invoking the Pickands-Balkema-de Haan theorem.

For \(\xi\neq 0\), the log-likelihood calculated from \(k\) excesses above a sufficiently high threshold \(u\), say \(Y_1,...,Y_k\), is given by \[ L(\beta,\xi)=-k\log\sigma-\left(1+\frac{1}{\xi}\right)\sum_{i=1}^k\ln\left(1+\xi\frac{Y_i}{\sigma}\right), \] provided that \(1+\xi Y_i/\sigma>0\) for all \(i=1,\ldots,k\). Figure @ref(fig:Fig14.11) illustrates the results obtained in this way for Danish insurance claims, depending on the chosen threshold.

15.5.1.3 Pickands, Hill, and Variants Estimators

Most tail parameter estimators rely on the use of order statistics. The Hill estimator is probably the most commonly used estimator. It is based on the idea that if the tail function can be represented as \(\overline{F}(x)=x^{1/\xi}\mathcal{L}(x)\), then the quantile function can be expressed as \(F^{-1}(1-p)=p^{-\xi}\mathcal{L}^\ast(1/p)\), which can be rewritten as \[ \log F^{-1}(1-p) = -\xi\log p +\log\mathcal{L}^\ast(1/p). \] As an estimator of \(F^{-1}(1-k/(n+1))\) is \(X_{n-k+1:n}\), it suffices to graphically represent the scatterplot \[ \left(\log(X_{n-k+1:n}),\log\left(\frac{k}{n+1}\right)\right), \hspace{2mm}k=1,...,m, \] which should align along a line with slope \(\xi\). The estimator of the slope is then \[ \widehat{\xi}=\frac{\frac{1}{m}\sum_{j=1}^m \log X_{n-j+1:n} -\log X_{n-m:n} }{\frac{1}{m}\sum_{j=1}^m\log\frac{j}{n+1}-\log\frac{m}{n+1} }. \] For sufficiently large \(m\), the denominator is approximately equal to \(1\), giving the estimator \[ \widehat{\xi}_{n,m}^{Hill}=\frac{1}{m}\sum_{j=1}^m \log X_{n-j+1:n} -\log X_{n-m:n} . \] Figure ?? illustrates this technique for estimating \(\xi\).

Two other estimators are also widely used, namely the Pickands estimator \[\begin{equation*} \xi _{n,m}^{Pickands}=\frac{1}{\log 2}\log \frac{X_{n-m:n}-X_{n-2m:n}}{% X_{n-2m:n}-X_{n-4m:n}} \end{equation*}\]% and the Dekkers-Einmalh-de Haan estimator \[\begin{equation*} \xi _{n,m}^{DEdH}=\xi _{n,m}^{H\left( 1\right) }+1-\frac{1}{2}\left( 1-\frac{% \left( \xi _{n,m}^{H\left( 1\right) }\right) ^{2}}{\xi _{n,m}^{H\left( 2\right) }}\right) ^{-1}, \end{equation*}\]% \[\begin{equation} \xi _{n,m}^{H\left( r\right) }=\frac{1% }{m}\sum_{i=1}^{m-1}\Big( \log X_{n-i:n}-\log X_{n-m:n}\Big) ^{r}\text{, }% r=1,2,\ldots (\#eq:DefxiH(r)) \end{equation}\]

The main difficulty is in choosing \(m\), the number of order statistics considered to estimate \(\xi\). If \(m\) is too large, it goes beyond the tail of the distribution, and if \(m\) is too small, the small number of observations makes the estimator unstable.

15.5.1.4 Comparison of Estimators

It is possible to show that these estimators converge as \(n\rightarrow \infty\), \(m\rightarrow \infty\), and \(m/n\rightarrow 0\). Under assumptions of regularity and convergence rate of \(m\) with respect to \(n\), we have the asymptotic normality of the estimators:

\[\begin{equation*} \sqrt{m}\left( \xi _{n,m}^{Pickands}-\xi \right) \rightarrow_{\text{Law}} \mathcal{N}or\left( 0,\frac{\xi ^{2}\left( 2^{\xi +1}+1\right) }{% \left( 2\left( 2^{\xi }-1\right) \log 2\right) ^{2}}\right), \end{equation*}\]

\[\begin{equation*} \sqrt{m}\left( \xi _{n,m}^{Hill}-\xi \right) \rightarrow_{\text{Law}}\mathcal{N}or\left( 0,\xi ^{2}\right)\text{ for }\xi > 0, \end{equation*}\]

\[\begin{equation*} \sqrt{m}\left( \xi _{n,m}^{DEdH}-\xi \right) \rightarrow_{\text{Law}}\mathcal{N}or\left( 0,1+\xi ^{2}\right) \text{ for }\xi \geq 0. \end{equation*}\]

The Hill estimator has a lower asymptotic variance than the other three (for \(\xi > 0\)), so it is generally the most commonly used.

15.5.2 Time and Return Period

Reinsurers often speak of a return period rather than quantiles. This concept originated in hydrology (like many concepts and results in extreme values), and can be defined as follows.

Definition 15.7 Let \(X_{1},X_{2},...\) be the amounts of annual maxima, assumed to be independent and identically distributed. Consider a threshold \(u\) beyond which an observation is considered . The return time is the random variable associated with the first exceedance of the threshold \(u\), that is, \[ N\left( u\right) =\inf \left\{ i\geq 1|X_{i}\geq u\right\} . \] The return period is then defined as the average return time, that is, \(\mathbb{E}[N(u)]\).

Note that the variable $N( u) $ follows a geometric distribution: indeed, \[ \Pr\left[ N\left( u\right) =k+1\right] =\Pr\left[ X_{1}<u,...,X_{k}<u,X_{k+1}\geq u\right] \] which can be rewritten, thanks to the assumption of independence of annual maxima, \[\begin{eqnarray*} \Pr\left[ N\left( u\right) =k+1\right] &=&\Pr\left[ X_{1}<u\right] \Pr\left[ X_{2}<u\right]\\ &&...\Pr\left[X_{k}<u\right] \Pr\left[ X_{k+1}\geq u\right]\\ & =&\left( 1-p\right) ^{k}p\text{ where }% p=\Pr\left[ X\geq u\right] . \end{eqnarray*}\]

Under the classical assumptions of independence and equidistribution, it follows that the return period is \[ \mathbb{E}\left[ N\left( u\right) \right] =\frac{1}{p}=\frac{1}{\Pr\left[ X\geq u\right] }. \] Also, the threshold \(u\) associated with a hundred-year flood satisfies \(\Pr\left[ X\geq u\right] =1/100,\) meaning that the threshold is \(u={\text{VaR}}[X,1/100]\). Conversely, one can write \[ \mathbb{E}\left[ N\left( {\text{VaR}}\left[ X;\frac{1}{T}\right] \right)\right] =T, \] so the threshold associated with a flood with a return period of \(T\) is the VaR at the probability level \(1/T\).

15.5.3 GPD Approximation for VaR

Let \[ N_u=\sum_{i=1}^n \mathbb{I}[X_i>u], \] the number of exceedances of \(u\) in a sample \(X_1,...,X_n\) of independent and identically distributed random variables with the same distribution function \(F\). For \(x>u\), we have \[\begin{eqnarray*} \overline{F}(x)&=&\Pr[X>u]\Pr[X>x|X>u]\\ &=&\overline{F}(u)\Pr[X>x|X>u]\\ &=&\overline{F}(u)\overline{F}_u(x-u), \end{eqnarray*}\] where \(F_u(t)=\Pr[X-u\leq t|X>u]\) can be approximated by a GPD distribution for appropriate values of \(\xi\) and \(\beta\).

Having estimated the parameters of the GPD, a natural estimator for \(\overline{F}(x)\) is based on the use of an empirical estimator of \(\overline{F}(u)\), and the GPD approximation of \(\overline{F}_u(x)\), i.e.,

\[ \widehat{F}(x)=1-\frac{N_u}{n}\left(1+\widehat{\xi}\frac{x-u}{\widehat{\beta}}\right)^{-1/\widehat{\xi}} \]

for all \(x>u\), and \(u\) sufficiently large. Consequently, a natural estimator for \({\text{VaR}}[X;p]\) is \(\widehat{x}_p\) defined by

\[ \widehat{x}_p=u+\frac{\widehat{\beta}}{\widehat{\xi}}\left(\left(\frac{n}{N_u}(1-p)\right)^{-\widehat{\xi}}-1\right) \]

Note that an asymptotic confidence interval can be obtained by considering the profile likelihood method. This technique is illustrated in Figures @ref(fig:Fig14.13)-?? for Danish fire insurance claims.

Remark. Suppose we choose a threshold \(u\) such that \(u=X_{n-k:n}\). The threshold is then the \(k\)-th largest observation. In this case, we have

\[ \widehat{x}_{p,k}=X_{n-k:n}+\frac{\widehat{\beta}_k}{\widehat{\xi}_k}\left(\left(\frac{n}{k}(1-p)\right)^{-\widehat{\xi}_k}-1\right) \]

for \(k>n(1-p)\). It should be noted that if \(k=[n(1-p)]\), \(\widehat{x}_{p,k}\) coincides with the empirical estimator of the quantile, i.e., \(X_{[np+1]:n}\).

Remark. The level \(P_m\) associated with a return period of \(m\) years (i.e., the level that will be exceeded, on average, every \(m\) observations) is the solution to

\[ \Pr[X>u]\left(1+\xi\left(\frac{P_m-u}{\sigma}\right)\right)^{1-\xi}=\frac{1}{m}. \]

If \(X|X>u\) can be modeled by a GPD, then we have

\[ P_m=u+\frac{\sigma}{\xi}\left(\big(m\Pr[X>u]\big)^\xi-1\right) \]

under the assumption that \(\xi\neq0\), and

\[ P_m=u+\sigma\log(m\Pr[X>u])) \]

if \(\xi=0\).

15.5.4 Hill Estimator for VaR

Recall that if \(\overline{F}(x)=x^{-1/\xi}\mathcal{L}(x)\) with \(\xi>0\), then for \(x\geq X_{n-k:n}\) and sufficiently small \(k\), we have

\[ \frac{\overline{F}(x)}{\overline{F}(X_{n-k:n})}=\frac{\mathcal{L}(x)}{\mathcal{L}(X_{n-k:n})} \left(\frac{x}{X_{n-k:n}}\right)^{-1/\xi}. \]

If we assume that the ratio of slowly varying functions is negligible, then

\[ \overline{F}(x) \sim \overline{F}(X_{n-k:n})\left(\frac{x}{X_{n-k:n}}\right)^{-1/\xi_{n,k}^{Hill}}. \]

Also, a natural estimator of the distribution function is

\[ \widehat{F}(x)=1-\frac{k}{n}\left(\frac{x}{X_{n-k:n}}\right)^{-1/\xi _{n,k}^{Hill}}, \]

for \(x\geq X_{n-k:n}\). By considering the inverse of this function, we obtain the following natural estimation of VaR:

\[ \widehat{x}_p^{Hill}=X_{n-k:n}\left(\frac{n}{k}(1-p)\right)^{-\xi _{n,k}^{Hill}}, \]

for quantiles \(x_p\) such that \(p>1-k/n\). Note that this estimator can also be written as

\[ \widehat{x}_{p,k}^{Hill}=X_{n-k:n}+X_{n-k:n}\left(\left(\frac{n}{k}(1-p)\right)^{-\xi _{n,k}^{Hill}}-1\right); \]

in this latter form, it can be compared to the estimator obtained by maximum likelihood on the GPD model.

Recall that the Hill estimator is only relevant if \(\xi>0\) (and not if \(\xi=0\)). Therefore, the approach described above is only applicable if \(\xi>0\). Additionally, the finite-sample properties of the Hill estimator (especially for small samples) are relatively disappointing. However, the properties of the Hill estimator have been much more studied in the literature than those of the maximum likelihood estimator.

Example 15.6 Consider the case of using the Hill estimator for losses with an exponential distribution, a log-normal distribution, and a Pareto distribution (only the latter case corresponds to \(\xi>0\)). For the first two cases, the theoretical values are \(\xi=0\), \({\text{VaR}}[X;0.99]= 4.6\), and \({\text{VaR}}[X;0.999]= 6.9\) for the exponential distribution, \(\xi=0\), \({\text{VaR}}[X;0.99]= 10.2\), and \({\text{VaR}}[X;0.999]= 22\) for the log-normal distribution. The results are visible in Figures ??, ?? and ??.

Remark. Hill or Pickands estimators are based on the use of the \(k\) largest observations. As asymptotic results are obtained when \(k\rightarrow\infty\) (with \(k/n\rightarrow0\)), one seeks an optimal function \(k^\ast(n)\): (De Haan and Peng 1998) suggested retaining, as the optimal value \(k^\ast\), the value that asymptotically reduces the mean squared error (MSE), that is

\[ k^\ast(n)=1+n^{2 \xi /(2\xi+1)}\cdot \left(\frac{(1+\xi)^2}{2\xi}\right)^{1 /(2\xi+1)}\text{ if }\xi\in]0,1[, \]

and

\[ k^\ast(n)=2 n^{2/3}\text{ if }\xi\in]1,\infty[. \]

15.5.5 Cumulative Risk, Extremes, and Compound Distributions

Several reinsurance coverages allow for the risk to be covered by event, not just by policy. Therefore, for an event (e.g., a storm or a flood), we denote \(N\) as the number of policies affected, and \(X_i\) as the amount of the \(i\)-th claim. The following two results can be obtained in this context (see, for example, (Embrechts, Klüppelberg, and Mikosch 1997)).

Proposition 15.16

  1. If \(N \sim \mathcal{P}oi(\lambda)\) and \(G \sim \mathcal{CP}oi(\lambda, F)\), then \(\overline{G}\) is of regularly varying index \(\alpha\) if and only if \(\overline{F}\) is of regularly varying index \(\alpha\). Moreover, in this case, \[ \lim_{x \rightarrow \infty} \frac{\overline{G}(x)}{\overline{F}(x)} = \lim_{x \rightarrow \infty} \frac{\Pr[X_1 + \ldots + X_N > x]}{\Pr[X_i > x]} = \lambda. \]
  2. If \(N\) is a counting random variable, and \(G\) represents the associated compound distribution, with \(\overline{F}\) being subexponential, then \(\overline{G}\) is also subexponential. Furthermore, \[ \lim_{x \rightarrow \infty} \frac{\overline{G}(x)}{\overline{F}(x)} = \lim_{x \rightarrow \infty} \frac{\Pr[X_1 + \ldots + X_N > x]}{\Pr[X_i > x]} = \mathbb{E}[N], \] provided that there exists \(\varepsilon > 0\) such that \[ \sum_{n=1}^\infty (1+\varepsilon)^n \cdot \Pr[N=n] < \infty. \]

15.6 Multivariate Extreme Value Theory

Chapter 8 of Volume 1 introduced various tools for modeling multiple risks. Potential dependence among extremes should be considered since these events often have a significant impact on a company’s results.

One challenge in analyzing these risks is that there is no unique way to define multivariate extremes, largely because there is no “natural” order relationship in \({\mathbb{R}}^n\).

15.6.1 Componentwise Maxima

The extension of extreme value theory to the multivariate setting was developed in the late 1950s. A clear and detailed presentation is provided by (Beirlant et al. 2006). We study the joint distribution of componentwise maxima \(\left( X_{n:n}, Y_{n:n} \right)\). Similar to the univariate case, we assume the existence of normalization parameters \(\alpha_{X,n}\), \(\alpha_{Y,n}\), \(\alpha'_{X,n}\), \(\alpha'_{Y,n} > 0\), and \(\beta_{X,n}\), \(\beta_{Y,n}\), \(\beta'_{X,n}\), \(\beta'_{Y,n}\) such that \[\begin{eqnarray*} &&\Pr\left[ \frac{X_{n:n}-\beta_{X,n}}{\alpha_{X,n}}\leq x, \frac{Y_{n:n}-\beta_{Y,n}}{\alpha_{Y,n}}\leq y \right]\\ &=& F_{X,Y}^{n}\left( \alpha_{X,n}x+\beta_{X,n}, \alpha_{Y,n}y+\beta_{Y,n} \right)\\ &\rightarrow& G\left( x,y \right),\\ \end{eqnarray*}\] as \(n \rightarrow \infty\), where \(G\) is a bivariate cumulative distribution function with non-degenerate marginals. Just like in the univariate case, \(G\) must satisfy max-stability conditions: for all \(n \geq 1\), there exist \(a_{X,n}, a_{Y,n} > 0\) and \(b_{X,n}, b_{Y,n}\) such that \[ \overline{G}^{n}\left( a_{X,n}x+b_{X,n}, a_{Y,n}y+b_{Y,n} \right) = \overline{G}\left( x,y \right). \]

15.6.2 Expression of Limit Distributions

Many results obtained in the univariate case remain valid, especially that the limit distribution does not depend on the normalization. Consider a sample \(\left( X_1, Y_1 \right), \ldots, \left( X_n, Y_n \right), \ldots\) of independent pairs with the same cumulative distribution function \(F_{X,Y}\) and normalization coefficients \(\alpha_{X,n}\), \(\alpha_{Y,n}\), \(\alpha'_{X,n}\), \(\alpha'_{Y,n}\), \(\beta_{X,n}\), \(\beta_{Y,n}\), \(\beta'_{X,n}\), \(\beta'_{Y,n}\) such that \[ \left\{ \begin{array}{l} F_{X,Y}^{n}\left( \alpha_{X,n}x+\beta_{X,n}, \alpha_{Y,n}y+\beta_{Y,n} \right) \rightarrow G\left( x,y \right)\\ F_{X,Y}^{n}\left( \alpha'_{X,n}x+\beta'_{X,n}, \alpha'_{Y,n}y+\beta'_{Y,n} \right) \rightarrow G'\left( x,y \right), \end{array} \right. \] as \(n \rightarrow \infty\), where \(G\) and \(G'\) are two non-degenerate cumulative distribution functions. Then, the marginal distributions \(G\) and \(G'\) are unique up to an affine transformation, i.e., there exist \(\alpha_X, \alpha_Y\), \(\beta_X, \beta_Y\) such that \[ G_X(x) = G'_X(\alpha_X x+\beta_X) \text{ and } G_Y(y) = G'_Y(\alpha_Y y+\beta_Y). \] Furthermore, the copulas of \(G\) and \(G'\) are identical, i.e., \(C_G=C_{G'}\). From this property, we can derive the concept of max-domain of attraction for copulas.

Definition 15.8 The pair \((X,Y)\) follows an extreme value law if, and only if, \[ \left\{ \begin{array}{l} \Pr\left[ X\leq x\right] =\exp \left( -1/x\right) ,\quad x>0, \\ \Pr\left[ Y\leq y\right] =\exp \left( -1/y\right) ,\quad y>0, \\ G\left( x,y\right) =\exp \left( -V\left( x,y\right) \right),\quad x,y>0, \\ \end{array}% \right. \]% where \[ V\left( x,y\right) =2\int_{0}^{1}\max \left\{ \frac{\omega }{x},\frac{% 1-\omega }{y}\right\} dH\left( \omega \right), \]% and \(H\) is a cumulative distribution function on $$ with mean \(1/2\).

Example 15.7 If \(H\) is the cumulative distribution function associated with a Dirac mass at \(\omega = 1/2\), then \[ G\left( x,y\right) =\exp \left( -\max \left\{ x^{-1},y^{-1}\right\} \right) \text{, }x,y>0, \] corresponding to a pair of comonotonic random variables with Fréchet marginal distributions (with parameter \(1\)).

Example 15.8 If \(H\) has a density \(h\), defined as \[ h\left( \omega \right) =\frac{1}{2}\left( \alpha ^{-1}-1\right) \big( \omega \left( 1-\omega \right) \big) ^{-1-1/\alpha }\big( \omega ^{-1/\alpha }+\left( 1-\omega \right) ^{-1/\alpha }\big) ^{\alpha -2}, \] on $] 0,1[ $ where $] 0,1$ satisfies \(A(0)=A(1) =1\), and \[\begin{equation} \max \left\{ \omega ,1-\omega \right\} \leq A\left( \omega \right) \leq 1,\quad \omega \in \left[ 0,1\right] , \tag{15.11} \end{equation}\] with \(A\) being a convex function. The lower bound in (15.11) corresponds to comonotonicity, and the upper bound corresponds to independence.

15.6.3 Estimation of the Dependence Function

Several estimators can be considered to estimate the dependence function, including the Pickands and Deheuvels estimators, to name a few: \[ A_{Pickands}(\omega) = n\left\{\sum_{i=1}^n \min\left(\frac{X_{i}}{\omega}, \frac{Y_{i}}{1-\omega}\right)\right\}^{-1} \] and \[\begin{eqnarray*} A_{Deheuvels}(\omega) &=& n\Big\{\sum_{i=1}^n \min\left(\frac{X_{i}}{\omega}, \frac{Y_{i}}{1-\omega}\right) - \omega\sum_{i=1}^n X_{i}\\ &&- (1-\omega) \sum_{i=1}^n Y_{i} +n\Big\}^{-1}. \end{eqnarray*}\]

15.6.4 Copulas of Multivariate Extreme Value Distributions

Any dependence function \(A\) generates a copula as indicated in the following result.

Proposition 15.17 Let \(A\) be a dependence function, then \(C\) defined by \[ C(u,v) = \exp\left(\left(\log u+\log v\right)A\left(\frac{\log u}{\log u+\log v}\right)\right) \] is a copula.

Example 15.9 If \(A(\omega)=\exp\left((1-\omega)^\theta+\omega^\theta\right)^{1/\theta}\), then \(C\) corresponds to the Gumbel copula.

Example 15.10 If \(A\left( \omega \right) =\max \left\{ 1-\alpha \omega ,1-\beta \left( 1-\omega \right) \right\}\), where $0, $, then \(C\) corresponds to the Marshall and Olkin copula.

15.6.5 Correlation Coefficient

Most measures of concordance can be characterized from the dependence function \(A\). In particular, Kendall’s \(\tau\) is given by \[ \tau = \int_0^1\frac{\omega(1-\omega)}{A(\omega)}dA'(\omega) \] and Spearman’s \(\rho\) is given by \[ \rho = 12\int_0^1\frac{d\omega}{(A(\omega)+1)^2}-3. \]

15.6.6 Comparison of Dependence

The dependence function \(A\) determines the degree of association between the two components of the pair, as shown in the following result.

Proposition 15.18 Let \(C_1\) and \(C_2\) be two extreme copulas with dependence functions \(A_1\) and \(A_2\) respectively. Then, denoting \(\preceq_{\text{sm}}\) the supermodular order introduced in Section 8.4, \(C_1\preceq_{\text{sm}}C_2\) if, and only if, \(A_1(\omega)\leq A_2(\omega)\) for all \(0\leq\omega\leq 1\).

15.6.7 Tail Dependence Coefficient

Let’s define the coefficient \(\lambda\) for strong tail dependence.

Definition 15.9 For a random pair \((X,Y)\), we define the lower and upper tail dependence coefficients as follows: \[ \lambda _{L}=\lim_{u\rightarrow 0}\Pr\left[ X\leq F_{X}^{-1}\left( u\right) |Y\leq F_{Y}^{-1}\left( u\right) \right], \]% and \[ \lambda _{U}=\lim_{u\rightarrow 1}\Pr\left[ X>F_{X}^{-1}\left( u\right) |Y>F_{Y}^{-1}\left( u\right) \right], \]% when the limits exist.

These coefficients can be naturally introduced as follows: let \[ \theta(x)=\frac{\log\Pr[\max\{X,Y\}\leq x]}{\log\Pr[X\leq x]}, \] then \[ \lambda_U=2-\lim_{x\rightarrow\infty}\theta(x), \] since, as \(x\rightarrow\infty\), \[\begin{eqnarray*} 2-\frac{\log\Pr[\max\{X,Y\}\leq x]}{\log\Pr[X\leq x]}&\sim& \frac{\Pr[X>x,Y>x]}{\Pr[X>x]}\\ &=&\Pr[Y>x|X>x]. \end{eqnarray*}\]

Note that these coefficients depend only on the copula of the pair \((X,Y)\) and not on the marginal distributions: if \((X,Y)\) has the copula \(C\), the tail dependence coefficients are defined, provided the limit exists, by \[ \lambda_L=\lim_{u\rightarrow 0}\frac{C(u,u)}{u}\text{ and } \lambda_U=\lim_{u\rightarrow 1}\frac{\overline{C}(u,u)}{1-u}, \] where \(\overline{C}\) denotes the survival copula associated with \(C\) (i.e., the cumulative distribution function of the pair \((1-U,1-V)\) if \((U,V)\) has the cumulative distribution function \(C\)).

::: {.remark}[Weak Tail Dependence, \(\eta\)]

In many cases, tail independence (\(\lambda=0\)) will hold. Therefore, it becomes particularly difficult to compare different types of dependence. Another approach that can be considered is as follows: consider a pair \((X,Y)\) whose marginal distributions are Fréchet distributions with parameter \(1\), and assume that \[\begin{equation} \Pr[X>t,Y>t]\sim \mathcal{L}(t)\cdot \Big(\Pr[X>t]\Big)^{1/\eta}, \hspace{2mm}t\rightarrow\infty, \tag{15.12} \end{equation}\] where \(\mathcal{L}\) is a slowly varying function, and \(\eta\in]0,1]\) is called the weak tail dependence coefficient. We can then distinguish the following cases:

  • If \(\eta=1\), the tails are perfectly correlated (positive tail dependence),
  • If \(1/2<\eta<1\) and \(\mathcal{L}\rightarrow c>0\), the tails are more dependent than in the independent case,
  • If \(\eta=1/2\), the variables are asymptotically independent (\(\lambda=0\)),
  • If \(0<\eta<1/2\), the tails are less dependent than in the independent case.

Note that, here too, the tail coefficient \(\eta\) depends only on the copula.

15.6.8 Application in Reinsurance, Cost vs. Expenses

15.6.8.1 Context

Consider a stop-loss reinsurance contract with a retention (or priority) of \(R\). As long as the actual cost (\(C\)) does not exceed \(R\), the loss remains entirely the responsibility of the insurer (including expenses), and if it exceeds \(R\), the reinsurer reimburses the excess amount (\(C-R\)), as well as a fraction of the expenses (\(F\)), on a pro-rata basis. It can also be assumed that the reinsurer wants to limit its liability by capping the indemnities associated with the actual cost at \(L\). Formally, the amount paid by the reinsurer is expressed, based on the actual costs and expenses, as follows: \[ g(C,F)=\left\{ \begin{array}{l} 0,\text{ if } C\leq R,\\ (C-R)+\frac{C-R}{C}\cdot F,\text{ if }R<C\leq L,\\ (L-R)+\frac{L-R}{C}\cdot F,\text{ if }L<C.\\ \end{array} \right. \]

The pure premium for this type of contract is then given by \[ \pi=\nu\cdot \mathbb{E}[N]\cdot \mathbb{E}[g(C,F)] \] where \(\nu\) is the number of policies in the portfolio, and \(\mathbb{E}[N]\) is the average number of claims per policy.

Figure ?? shows the costs of claims and the associated expenses for 1500 liability claims in the United States. Note that we omit in this section the fact that some observations were censored, as the coverage ceiling was reached for the (actual) cost of claims.

15.6.8.2 Marginal Models for Costs and Expenses

First, it is necessary to model the marginal distributions of the actual cost of claims on one side and the incurred expenses on the other. Table 15.3 describes the fitting with common distributions (Weibull, Lognormal, and Pareto). As noted by (Klugman and Parsa 1999) and (Frees and Valdez 1998), fitting with Pareto distributions may be appropriate.

Table 15.3: Estimation of parameters (by maximum likelihood) for different marginal distributions (Pareto, Weibull, and Lognormal)
Distribution Actual.Costs…alpha. Actual.Costs…beta. Incurred.Expenses…alpha. Incurred.Expenses…beta.
Pareto \(\mathcal{P}ar(\alpha,\beta)\) 14453.000 1.135 15133.000 2.223
Weibull \(\mathcal{W}ei(\alpha,\beta)\) 0.644 24740.000 0.753 9694.000
Lognormal \(\mathcal{LN}or(\alpha,\beta)\) 9.322 1.609 8.502 1.413

15.6.8.3 Global Modeling of Dependence

Figure ?? shows the copula representation of the actual cost - incurred expenses pair.

Figure ?? represents a modeling with generalized Pareto distributions for the distribution tails and a Gumbel-type dependence structure. The joint cumulative distribution function is then given by \[\begin{eqnarray*} F(x,y)&=&\exp\left( -\left( \left( -\ln\left(1+\xi_X\frac{x-u_X}{\sigma_X}\right)_+^{-1/\xi_X} \right)^{1/\alpha}\right.\right.\\ &&+\left.\left.\left( -\ln\left(1+\xi_Y\frac{y-u_Y}{\sigma_Y}\right)_+^{-1/\xi_Y} \right)^{1/\alpha} \right)^\alpha \right). \end{eqnarray*}\]

15.6.8.4 Modeling Dependence in the Upper Tail

In the same way as in the univariate case, it is possible to use the block maxima method to study extremes. For this, from the 1500 joint observations, 50 blocks of size 30 are constructed, and we attempt to model the dependence structure of these 50 block maxima.

Figure ?? shows the estimation of the dependence function \(A\), based on \(50\) block maxima and \(100\) block maxima (shown on the left). On the right are the estimation of the dependence function using the Gumbel copula (solid line), as well as two non-parametric estimators of the dependence function.

15.6.8.5 Calculation of Reinsurance Contract Premiums

For simplicity, let’s assume that the upper limit of the reinsurer’s intervention is infinite \((L=+\infty)\). First, note that the indemnity function \(g\) is then supermodular. If the cost-expense pair is positively quadrant-dependent, we can apply Tchen’s theorem (Section 8.4.2 of Volume 1), and deduce \[ \mathbb{E}[g(C^\perp,F^\perp)]\leq \mathbb{E}[g(C,F)]. \] In other words, assuming both components are independent would tend to underestimate the pure premium. Table 15.4 thus compares the pure premiums of reinsurance contracts for different dependence structures, depending on the deductible \(R\).

Table 15.4: Reinsurance contract premium amounts, depending on the dependence structure.
R Independence Gumbel Comonotonicity
1e+04 33309 36766 39963
5e+04 19108 21228 23271
1e+05 12403 13801 15408
5e+05 1800 1875 2308
1e+06 805 850 985

It is also possible to compare pure premiums with given marginal structures (Pareto distributions) using a Gumbel copula (as done by (Frees and Valdez 1998)) or using the empirical copula. Note that the differences are relatively small. Table 15.5 shows the evolution of the pure premium as a function of \(R\) (\(L\) is assumed to be fixed at 500,000).

Table 15.5: Pure premium of reinsurance contracts with a logistic dependence structure (Gumbel copula) and the empirical version
R Gumbel Empirical
0 49367 49482
125000 17457 17604
250000 9234 9273
375000 4007 4082

15.7 Standard Reinsurance Treaties

15.7.1 Reinsurance

While insurance is relatively old, the first reinsurance transaction appears to date back to the 17th century when an insurer of a ship had ceded part of its commitments to another insurer. In 2000, the number of reinsurers was less than 200, with a net written premium volume of around \(93\) billion dollars (Munich Re and Swiss Re each representing \(15\) billion), of which \(80\%\) is dedicated to non-life insurance. The top \(15\) reinsurers represented \(80\%\) of the premiums, with total equity exceeding \(160\) billion (largely due to the third-largest global reinsurer, Berkshire Hathaway, with an equity/premium ratio exceeding \(4.7\), equivalent to \(50\) billion dollars of equity).

Two forms of reinsurance are generally distinguished:

  1. Proportional reinsurance, where premiums and claims are allocated between the insurer and the reinsurer according to a contractual ratio.
  2. Non-proportional reinsurance, where the amount paid by the reinsurer depends contractually on a threshold (called a priority), below which the insurer will settle all claims, and the reinsurer commits to take on claims exceeding the threshold.

15.7.2 Proportional Reinsurance

Two types of contracts are distinguished: the quota-share treaty and the excess of loss treaty.

15.7.2.1 Excess of Loss Treaty

For each risk \(i=1,...,n\), a retention rate \(0\leq a_{i}\leq 1\) is associated, and the corresponding premiums and claims are defined for the retained risk and the ceded risk as follows:

\[ \vline% \begin{array}{rccc}\hline \vline & \text{Total Risk} & \text{Retained Risk} & \text{Ceded Risk} \\ \hline \text{Premiums }\vline & P=\sum_{i=1}^{n}P_{i} & \sum_{i=1}^{n}a_{i}P_{i} & \sum_{i=1}^{n}\left( 1-a_{i}\right) P_{i} \\ \text{Claims }\vline & S=\sum_{i=1}^{n}S_{i} & \sum_{i=1}^{n}a_{i}S_{i} & \sum_{i=1}^{n}\left( 1-a_{i}\right) S_{i}\\ \hline \end{array}% \vline \]

This is a proportional treaty in the sense that, regardless of the size of the claim, the reinsurer pays the insurer a portion of the claim corresponding to the proportion of premium they received on the policy. This form of reinsurance is called individual because reinsurance is determined contract by contract and not at the portfolio level.

In practice, there is often a maximum sum insured per policy accepted by the insurer, called the underwriting limit \(K_{i}\). The insurer specifies the maximum commitment per claim and per risk, called the retention \(R_{i}\). The excess of loss has a capacity \(C_{i}\), which is most often expressed as a multiple of retentions, briefly called “plenums” (\(C_{i}=\lambda _{i}K_{i}\)).

The retention rate is then defined as:

\[ 1-a_{i}=\frac{\min \left\{ C_{i},\max \left\{ 0,K_{i}-C_{i}\right\} \right\} }{K_{i}}. \]

15.7.2.2 Quota-Share Treaty

In a quota-share treaty, the retention rate (denoted as \(a\)) is constant for all policies. The retained risk and the ceded risk are as follows:

\[ \vline% \begin{array}{rccc}\hline \vline & \text{Total Risk} & \text{Retained Risk} & \text{Ceded Risk} \\ \hline \text{Premiums }\vline & P=\sum_{i=1}^{n}P_{i} & \sum_{i=1}^{n}aP_{i}=aP & \sum_{i=1}^{n}\left( 1-a\right) P_{i}=\left( 1-a\right) P \\ \text{Claims }\vline & S=\sum_{i=1}^{n}S_{i} & \sum_{i=1}^{n}aS_{i}=aS & \sum_{i=1}^{n}\left( 1-a\right) S_{i}=\left( 1-a\right) S\\ \hline \end{array}% \vline \]

There are two types of quota-share treaties: the general quota-share, or pure participation, which applies to the entire insurer’s portfolio, and the retention quota-share, which applies to a proportional program.

15.7.2.3 Practical Aspects of Proportional Reinsurance

While the theoretical operation of this coverage may seem simple, in practice, several charges need to be considered, which can sometimes be significant. In particular, since the reinsurer typically has fewer claims handling expenses than the insurer, the reinsurer pays a reinsurance commission, which can be a function of the claims-to-premium ratio, the type of risk, the type of treaty, etc. The commission rate can vary between 20% and 40%.

In cases where the results are relatively good for the reinsurer, they may be required to return a portion of their profit through profit-sharing.

15.7.3 Non-Proportional Reinsurance

15.7.3.1 Excess of Loss (XL) by Risk

The mechanism of excess loss is similar to that of a policy with a mandatory deductible: the reinsurer pays the insurer as soon as the claim exceeds a certain level, the priority \(\pi_i\), and within the limit of an amount specified in the treaty, the layer \(M_i - \pi_i\). Therefore, \(M_i\) is the treaty cap.

If the annual amount of claims can be written as

\[ S_i = \sum_{j=1}^{N_i} Y_{ij} \]

where \(N_i\) is the number of claims related to risk \(i\), and \(Y_{ij}\) is the amount of the \(j\)-th claim related to risk \(i\), then the corresponding premiums and claims for the retained risk and the ceded risk are determined as follows:

\[ \vline% \begin{array}{rccc}\hline \text{ }\vline & \text{Total Risk} & \text{Retained Risk} & \text{Ceded Risk} \\ \hline \text{Premiums }\vline & P=\sum_{i=1}^{n}P_{i} & \left( 1-Q\right) P & QP \\ \text{Claims }\vline & S=\sum_{i=1}^{n}S_{i} & \sum_{i=1}^{n}\overline{S}_{i} & \sum_{i=1}^{n}\left( S_{i}-\overline{S}_{i}\right) \\\hline \end{array}% \vline \]

where

\[ \left( S_{i}-\overline{S}_{i}\right) =\sum_{j=1}^{N_{i}}\min \left\{ \left( Y_{ij}-\pi _{i}\right)_{+},M_{i}\right\} \]

and where the coefficient \(Q\) is determined by the contract.

It is possible that \(\pi _{i}\) and \(M_{i}\) are the same for different risks. For example, if \(a\) is the priority (or deductible) of the excess loss and \(b\) is the layer (the coverage offered by the reinsurer), this treaty is denoted as \(b\) XS \(a\) or \(layer\) XS \(priority\).

The most common method for determining the reinsurance premium, including the coefficient \(Q\), is to use the “Burning Costs,” which are the amounts of claims relative to premiums in the past. However, caution is advised, and factors such as inflation indices and changing conditions should be considered. Risks in progress and claims not yet reported must also be taken into account.

Similar to proportional reinsurance, the insurer’s commitments are covered by the reinsurer within a non-proportional reinsurance program composed of several stacked excess loss layers. These are referred to as layers of the program.

The equilibrium of excess loss is expressed, among other things, by the rate on line or the payback period:

\[ \text{Rate on Line} = 100 \times \frac{\text{reinsurance premium}}{\text{layer coverage}} \]

and

\[ \text{Pay Back} = \frac{100}{\text{Rate on Line}}. \]

Example 15.11 Consider exponential costs with parameter \(\lambda\). The pure premium of an excess loss treaty with a priority of \(d>0\) and infinite coverage is given by \[ \mathbb{E}[(X-d)_+] =\frac{1}{\lambda}\exp\left(- \lambda \cdot d\right). \] Note that for Pareto claims, whose tail distribution function is \(\Pr[X>x]=(\theta/(\theta+x))^\alpha\), the same premium becomes \[ \mathbb{E}[(X-d)_+] =\frac{\theta+d}{\alpha -1}\left(\frac{\theta}{\theta+d}\right)^{\alpha}, \text{ for }\alpha>1. \]

Suppose the considered premium is not the pure premium but a Wang premium, for example, a PH-transform premium (with distortion operator \(g(x)=x^{1/\rho}\), as presented in Section 5.2.6). Then the premium is written as \[ \pi(g,d)=\int_d^\infty g(\overline{F}(t))dt=\int_d^\infty \overline{F}(t)^{1/\rho}dt. \] In the case of exponential claims, the premium is then given by \[ \pi(g,d)=\frac{\rho}{\lambda}\exp\left(- \frac{\lambda \cdot d}{\rho}\right), \] and for Pareto claims,

\[ \pi(g,d)=\frac{\beta+d}{\alpha/\rho -1}\left(\frac{\beta}{\beta+d}\right)^{\alpha/\rho}, \text{ for }\alpha>\rho. \]

15.7.3.2 Aggregate Loss or Stop-Loss Reinsurance

The non-proportional treaty of the stop-loss type aims to cover the insurer against very high claims and an accumulation of claims.

In the case of an aggregate loss, we look at the cumulative claims for a year. The priority \(T\) and the coverage \(U\) are set at the portfolio level. Then, the premiums and corresponding claims for the retained risk and the ceded risk are determined as follows: \[ \vline% \begin{array}{rccc}\hline \text{ }\vline & \text{Total Risk} & \text{Retained Risk} & \text{Ceded Risk} \\ \hline \text{Premiums }\vline & P=\sum_{i=1}^{n}P_{i} & \left( 1-Q\right) P & QP \\ \text{Claims }\vline & S=\sum_{i=1}^{n}S_{i} & \overline{S} & S-\overline{% S} \\\hline \end{array}% \vline \]% where \[ \overline{S}=\min \left\{ S,T\right\} +\left( \left( S-T\right)_{+}-U\right)_{+} \] and \[ S-\overline{S}=\min \left\{ \left( S-T\right)_{+},U\right\}. \] In the case of a stop-loss treaty, the priority and coverage are generally expressed as a percentage of the insurer’s premium.

This type of treaty allows the insurer to hedge against deviations in its claims-to-premium ratio. However, three major reasons make this type of treaty rarely used:

  1. As soon as the treaty’s priority is reached, the insurer is no longer involved in the worsening or not of the claims experience, making these treaties subject to moral hazard.
  2. Pricing these treaties is challenging.
  3. If the priority is not indexed to inflation, the ceded risk can increase significantly.

15.7.3.3 ECOMOR Treaties

ECOMOR treaties (excess of mean ordinary risk) are a type of excess loss treaty, but their threshold is not constant; it corresponds to the cost of the \(k\)th largest claim. The reinsurer will then pay% \[\begin{eqnarray*} R&=&\sum_{i=1}^{N}\left( X_{N-i+1:N}-X_{N-k+1:N}\right) _{+}\\ &=&\sum_{i=1}^{k-1}X_{N-i+1:N}-\left( k-1\right) X_{N-k+1:N}\text{.} \end{eqnarray*}\]

15.7.3.4 Practical Aspects of Non-Proportional Reinsurance

For risks that unfold relatively slowly, considering jurisprudential developments or cost inflation, a “stabilization clause” allows taking into account the proportions that existed in the risk-sharing: the additional burden due to drift is then shared between the insurer and its reinsurer. In the same vein, “indexation clauses” allow for accounting for inflation in the thresholds.

15.7.4 Pricing of Non-Proportional Treaties

Two methods will be presented here: the “Burning Cost” method and the “Pareto Law” method. But before that, let’s redefine some exploratory statistical tools to visualize tail behavior.

15.7.5 Quantile-Quantile (Q-Q) Plots

Quantile-quantile (Q-Q) plots are used to graphically test the fit of a family of distributions to data. While historically introduced for normal distributions, note that they can also be useful for a wide range of parametric families.

Example 15.12 For the exponential model, where \(F(x)=1-\exp(-\lambda x)\) for \(x\geq 0\), the quantile function is given by \[\begin{equation*} F^{-1}(x)=-\frac{1}{\lambda }\log (1-u),\hspace{2mm}u\in ]0,1[. \end{equation*}\]% The idea is to check if the points \((-\log (1-u),\widehat{F}_n^{-1}(u))\) are aligned for different values of \(u\). Thus, we graphically represent the points \((-\log (1-u_{i:n}),\widehat{F}% _{n}^{-1}(u_{i:n}))\) for \(i=1,...,n\), where $u_{i:n}=i/( n+1) $% If the exponential model is valid, the points should be align% ed (and the slope of the line should provide an estimate of $$).

Table 15.6 lists the functions to study for drawing Q-Q plots. Figure ?? illustrates the use of QQ plots based on Danish fire claims data and three sets of simulated data.

Table 15.6: SQ-Q Plot representation for some parametric distributions
Distribution Q.Q.Plot
Normal \(\left( \Phi ^{-1}\left( u_{i:n}\right) ,X_{i:n}\right)\)
Lognormal \(\left( \Phi ^{-1}\left( u_{i:n}\right) ,\log X_{i:n}\right)\)
Exponential \(\left( -\log \left( 1-u_{i:n}\right) ,X_{i:n}\right)\)
Pareto \(\left( -\log \left( 1-u_{i:n}\right) ,\log X_{i:n}\right)\)
Weibull \(\left( \log \left[ -\log \left( 1-u_{i:n}\right) \right] ,\log X_{i:n}\right)\)

FIG

15.7.6 Mean Excess Function

Studying the conditional distribution of \(X\) given the event \(\{X>x\}\) can be particularly interesting for reinsurance coverage. The mean excess of loss \(e(t)=\mathbb{E}[X-t|X>t]\), defined in Volume I, is very useful in this regard and is often referred to as the mean excess function.

We graphically represent pairs \((X_{k:n},e_{k:n})\) where \[\begin{equation*} e_{k:n}=\frac{1}{k}\sum_{i=1}^nX_{n-i+1:n}-X_{n-k:n} \end{equation*}\] represents the empirical version of the mean excess function at the point \(X_{n-k:n}\). Figure ?? illustrates empirical mean excess functions for Danish fire claims and three sets of simulated data.

FIG

::: {.example}[The Burning Cost Method]

This method involves using an empirical estimator of the pure premium for the reinsurer. For an excess-of-loss contract with retention \(R\), where the pure premium is \(\mathbb{E}\left(X-R\right)_+\), the empirical version is \[ \widehat{\Pi}(R)=\frac{1}{n}\sum_{i=1}^n(X_i-R)_+, \] which can be expressed simply if the retention is not too high, and if you can find a \(k\) such that \(R=X_{n-k:n}\): \[ \widehat{\Pi}(R)=\frac{k}{n}e_{k,n}. \] :::

15.7.7 The Lorenz Curve

Chapter 5 of Volume 1 presented tools for comparing risks, particularly order relations based on VaR and TVaR. In the context of extreme risk analysis, another commonly used order is the Lorenz order.

Assuming that the claim costs have finite expectations, the concentration function is defined on $$ as: \[\begin{eqnarray*} L\left( p\right) &=&\frac{1}{\mathbb{E}[X]}\int_{0}^{{\text{VaR}}[X;p]}xdF\left( x\right)\\ &=&\frac{1}{\mathbb{E}[X]}% \int_{0}^{p}{\text{VaR}}[X;u] du. \end{eqnarray*}\]

Noting that $F( x) $ is simply the proportion of claims with costs less than \(x\), $L( p) $ can be interpreted as the contribution of the smallest \(p\) proportion of claims to the total claim cost. It follows that \(L\) is increasing and convex on \(]0,1[\). This function is also known as the Lorenz curve, and it is widely used in economics as a measure of concentration.

Example 15.13 For some common distributions, the Lorenz curve sometimes has a simple expression. Specifically, for the exponential distribution \(\mathcal{E}% xp(\lambda)\), \[ L\left( p \right) = (1-p )\log \left( 1-p \right) +p \] which does not depend on \(\lambda\), and \[ L(p)=1-\left( 1-p\right)^{1-1/\alpha } \] for the Pareto distribution \(\mathcal{P}ar(\alpha,\theta)\). Finally, for the log-normal distribution \(\mathcal{LN}(or)(\mu,\sigma)\), \[ L(p)=\Phi \left( \Phi ^{-1}\left( p\right) -\sigma \right). \]

15.7.8 Approximation of Pure Premium

Let’s calculate the pure premium for an excess-of-loss (XS) treaty with infinite scope: \[ \pi(d)=\int_d^\infty \overline{F}(t)dt. \] If \(\overline{F}(x)=x^{-1/\xi}\mathcal{L}(x)\), the premium \(\pi(d)\) can be rewritten as: \[ \pi(d)=\int_d^\infty t^{-1/\xi}\mathcal{L}(t)dt. \] If \(\xi<1\), the Karamata theorem allows us to obtain: \[ \pi(d)\sim \frac{d^{1-1/\xi}\mathcal{L}(d)}{1/\xi-1}, \] as \(d\rightarrow \infty\). An approximation of this quantity is then: \[ \pi(d)\sim \frac{1}{1/\xi-1}d\cdot \overline{F}(d), \] which can be estimated by considering: \[ \widehat{\pi}(d)\sim \frac{1}{1/\widehat{\xi}-1}d\cdot \widehat{p}_d, \] where \(\widehat{p}(d)\) is an estimator of \(\Pr[X>d]\).

Remark. If the calculation of the pure premium is performed using the approximation: \[ \pi(d)\sim \pi^\ast(d)= \frac{d^{1-1/\xi}\mathcal{L}(d)}{1/\xi-1}, \] it is possible to show that: \[ \frac{\pi(d)}{\pi^\ast(d)}=1+\left(\frac{1}{\xi}-1\right)\int_{1}^\infty t^{-1/\xi\left(\frac{\mathcal{L}(dx)}{\mathcal{L}(d)}-1\right)}dt. \] The Potter theorem (which can be found, among other places, in (Bingham, Goldie, and Teugels 1989)) provides upper bounds on the integrated term. The dominated convergence theorem shows that this ratio indeed tends towards 1.

15.7.9 Approximation of a Wang Premium

For premiums obtained through distortion, as studied in Volume I, of the form: \[ \pi(g,d)=\int_{d}^{\infty} g(\overline{F}_X(t))dt, \] analogous relationships can be derived. You just need to assume that \(g(1/\cdot)\) is also of regular variation, i.e., \(g(x)=x^{-\beta}\mathcal{L}'(1/x)\). Using results on the composition of functions with regular variation, it follows that: \[ g(\overline{F}(x))=x^{\beta/\xi}\mathcal{L}''(x). \] Still invoking the Karamata theorem, it follows that: \[ \pi(g,d)\sim \frac{d^{1+\beta/\xi}\mathcal{L}''(d)}{-\beta/\xi-1}\sim\frac{1}{-\beta/\xi-1}d\cdot g(\overline{F}(d)). \] And in the same manner, this quantity can be estimated by considering: \[ \widehat{\pi}\sim \frac{1}{-\beta/\widehat{\xi}-1}d\cdot g(\widehat{p}_d). \]

15.7.10 Estimation of TVaR

The estimation of TVaR for Danish fire losses is illustrated in Figure ??.

15.7.11 Index-Based Coverage and Securitization

Among the various techniques used to cover large risks, securitization has become, in recent years, a preferred tool for covering natural catastrophe risks.

Most of the largest catastrophes (in terms of insured amounts) have a natural origin. Among the most significant, Hurricane Andrew devastated parts of Florida in 1992, causing over 20 billion dollars in damages. Most of the centennial catastrophe scenarios exceed these amounts in the United States. The capitalization of non-life insurance companies in the United States was \(200\) billion dollars in 2000. Considering that these amounts represent barely 1% of the capitalization of the American financial markets, it may seem interesting to offer financial products that can cover these risks.

Investing in this type of risk can be particularly attractive for an investor because it can provide an interesting source of diversification, as catastrophe risks are generally independent of market risks. Note that participating in the capital of an insurance company does not represent true diversification for an investor because the results of an insurance company depend heavily on financial markets (given the very large number of financial securities held by the company). Securitization can then be a solution, allowing the complete separation of insurance risk from other risks.

Among all the solutions proposed, Catastrophe Bonds, or Cat Bonds, have been the most widely used. The transfer of natural catastrophe risks is done through bond issuance: if the catastrophe occurs, bondholders lose coupons and/or capital in favor of the insurer.

The insurer then issues a variable-rate bond, the funds are held in a special-purpose company (SPV), which establishes a traditional reinsurance policy for the insurer. The collected funds are transferred to the insurer in the event of a catastrophe, and the SPV stops repaying the bonds.

15.8 Insolvency and Large Risks

In Chapter 7, we presented a number of analytical results, such as Lundberg’s inequality, which provides an upper bound on the probability of long-term insolvency (Property 7.3.24). However, all the results were obtained by restricting ourselves to the case where the cumulative distribution functions of claim costs were Cramér, meaning they had a finite moment-generating function \(M_X(t)\) for some \(t>0\). This excludes Pareto distributions but also log-normal distributions.

15.8.1 Probability of Ruin in the Presence of Large Losses: von Bahr’s Approximation

We assume, as in Chapter 7, that the occurrence process is a Poisson process with a parameter of \(\lambda\). We denote \(c\) as the instantaneous premium rate, \(\kappa\) as the initial capital, and \(\mu\) as the average claim cost. Then (Theorem 7.3.13 from Volume 1), the probability of ruin \(\psi(\kappa)\) is given by \[\begin{equation} \psi(\kappa)=\left(1-\frac{\lambda\mu}{c}\right)\sum_{n=0}^\infty \left(\frac{\lambda\mu}{c}\right)^n \left(1-\tilde{F}^{\ast (n) }(\kappa)\right), \tag{15.13} \end{equation}\] where \(\tilde{F}\) is the cumulative distribution function defined as \[ \tilde{F}(x)=\int_{y=0}^x \frac{\overline{F}(y)}{\mu}dy, \hspace{2mm}x>0. \]

We can obtain the following result by assuming that \(\tilde{F}\) is subexponential.

Proposition 15.19 If \(\tilde{F}\) is a subexponential distribution, then \[ \psi (\kappa)\sim \left(\frac{c}{\lambda \mu}-1\right)^{-1} \big(1-\tilde{F}(\kappa)\big), \] as \(\kappa\rightarrow\infty\).

Proof. The formula (15.13) allows us to write, for \(\kappa>0\), \[ \frac{\psi(\kappa)}{1-\tilde{F}(\kappa)}=\left(1-\frac{\lambda\mu}{c}\right)\sum_{n=0}^\infty \left(\frac{\lambda\mu}{c}\right)^n \frac{1-\tilde{F}^{\ast (n) }(\kappa)}{1-\tilde{F}(\kappa)}. \] Using the subexponential property of \(\tilde{F}\), note that \[ 1-\tilde{F}^{\ast (n) }(\kappa) \sim n \cdot (1-\tilde{F}(\kappa)) \] as \(\kappa\rightarrow\infty\), and thus \[ \lim_{\kappa\rightarrow\infty} \frac{\psi(\kappa)}{1-\tilde{F}(\kappa)} =\left(1-\frac{\lambda\mu}{c}\right)\sum_{n=0}^\infty n \cdot \left(\frac{\lambda\mu}{c}\right)^n \sim \left(\frac{c}{\lambda \mu}-1\right)^{-1}. \]

The above-established approximation is known as von Bahr’s approximation.

15.8.2 Using the Pollaczeck-Khinchine-Beekman Formula

We can also arrive at this result as follows. If \(Z_1,Z_2,\ldots\) are independent random variables with a subexponential cumulative distribution function \(G\), and if \(K\) is an independent counting variable of the \(Z_i\) such that \(\varphi_K(z)=\mathbb{E}[z^K]<+\infty\) for certain values of \(z>1\), then \[ \frac{\Pr\left[\sum_{i=1}^KZ_i>t\right]}{\overline{G}(t)}=\sum_{n=0}^{+\infty}\Pr[K=n]\frac{\overline{G}^{*(n)}(t)}{\overline{G}(t)} \] \[ \to \sum_{n=0}^{+\infty}\Pr[K=n]n=\mathbb{E}[K] \] from which we deduce \[ \Pr\left[\sum_{i=1}^KZ_i>t\right]\approx\mathbb{E}[K]\overline{G}(t). \] Now, we invoke the Pollaczeck-Khinchine-Beekman theorem, which guarantees that \(\psi(\kappa)=\Pr\left[\sum_{i=1}^KZ_i>\kappa\right]\) where \(K\) follows a geometric distribution with parameter \(\lambda\mu/c\) and where the \(Z_i\) are independent with cumulative distribution function \(\tilde F\).

Example 15.14 It can be shown that if \(F\) corresponds to the LogNormal distribution \[ 1-\tilde F(x)\sim\frac{\sigma x\exp\left(-\frac{1}{2\sigma^2}(\ln x-\mu)^2\right)} {\exp\left(\mu+\frac{\sigma^2}{2}\right)(\ln x)^2\sqrt{2\pi}}, \] then \(\tilde F\) is always subexponential. In this case, \[ \psi(\kappa)\sim\frac{\frac{\lambda\mu}{c}}{1-\frac{\lambda\mu}{c}} (1-\tilde F(\kappa)). \]

Example 15.15 Let’s consider claims with individual costs following a Pareto distribution, \(\overline{F}(x)=1-(\theta/x)^\alpha\). If \(\alpha>1\), the mean is finite, and \(\tilde{F}\) is given by \[ \tilde{F}(x)=\frac{1}{\alpha}\left(\frac{\theta}{x}\right)^{\alpha-1},\text{ for }x\geq \theta, \] and therefore, the approximation from Proposition ?? allows us to write \[ \psi(\kappa)\sim \left(\frac{c}{\lambda \mu}-1\right)^{-1} \cdot \frac{1}{\alpha}\left(\frac{\theta}{x}\right)^{\alpha-1}. \]

15.9 Bibliographical Notes

Historically, the study of the probability distribution of the maximum of a sample of \(n\) random variables was the first approach to describe extreme events. (Fisher and Tippett 1928) were the first to heuristically deduce the possible limit laws for the maximum of a sequence of independent random variables with the same distribution, before (Gnedenko 1943) rigorously established the convergence. Applications began with the work of (Gumbel 1958), particularly in hydrology.

Several recent books provide a simple and clear approach to extreme value theory. The reference book that summarizes most of the theoretical results on the subject is (Embrechts, Klüppelberg, and Mikosch 1997). (Kotz and Nadarajah 2000) also provides a comprehensive overview of the subject, though less detailed. Order statistics are studied in detail in (David and Nagaraja 2004).

For statistical modeling, we recommend (Coles 2001), (Beirlant, Teugels, and Vynckier 1996), (Beirlant et al. 2006), (Coles 2001), as well as (Beirlant et al. 2006). These books also include several detailed and instructive case studies.

Finally, (Reiss, Thomas, and Reiss 1997) provide numerous practical examples in finance, insurance, and environmental sciences. You may also refer to (Cebrián, Denuit, and Lambert 2003) for a case study on North American hospitalization insurance data. For a detailed study of cost and settlement expense modeling, as discussed in Section 14.6.8, see (Denuit et al. 2004).

15.10 Exercises

Exercise 15.1 Suppose \(X\) follows a \(\mathcal{Uni}(0,1)\) distribution, and consider a sample \(X_{1},...,X_{n}\) drawn from this distribution. Show that \[\begin{eqnarray*} \mathbb{E}\left[ X_{k:n}\right] &=&n\binom{n-1}{k-1} \int_{0}^{1}t^{k}\left[ 1-t\right] ^{n-k}dt=\frac{k}{n+1}\\ \mathbb{V}\left[ X_{k:n}\right] &=&\frac{k\left( n-k+1\right) }{\left( n+1\right) ^{2}\left( n+2\right) }\\ \mathbb{C}\left[ X_{i:n},X_{j:n}\right] &=&\frac{% i\left( n-j+1\right) }{\left( n+1\right) ^{2}\left( n+2\right) }. \end{eqnarray*}\]

Exercise 15.2 Recall that the hazard rate function is defined as $r( x) =f( x) /( x) $ for \(x\geq 0\) (see Volume I).

  1. Show that \(F\) is a subexponential distribution if and only if \[ \lim_{x\rightarrow \infty }\int_{0}^{x}\exp \left( zr\left( z\right) \right) f\left( z\right) dz=1\text{.} \]
  2. Show that if $( zr( z) ) f( z) $ is integrable over \([0,\infty[\), then \(F\) is a subexponential distribution.
  3. Show that if \(X_{i}\) are independent and have the same distribution function \(F\) that is subexponential, then for \(1\leq k\leq n\), \[ \lim_{x\rightarrow \infty }\Pr \left[ X_{1}+....+X_{k}>x|X_{1}+....+X_{n}>x% \right] =\frac{k}{n}\text{.} \]

Exercise 15.3 Suppose that the occurrence of claims follows a Poisson process with a parameter \(\lambda\), and the individual claim costs are independent and have the same cumulative distribution function \(F\). Show that the distribution function of the largest claim occurring in the period \([0, t]\) is given by: \[ F_{\max}(x) = \frac{\exp\left(-\lambda t \overline{F}(x)\right) - \exp(-\lambda t)}{1 - \exp(-\lambda t)}, \quad x \geq 0. \]

Exercise 15.4 Consider independent but not identically distributed random variables \(X_1, X_2, \ldots, X_n\) such that \(\Pr[X_i > x] = \exp(-\theta_i x)\). Determine the cumulative distribution function of the minimum \(X_{1:n}\) and the maximum \(X_{n:n}\).

Exercise 15.5 Suppose that the variables \(X_i\) are independent and have the same cumulative distribution function \(F\) and probability density function \(f\). Consider the record times \(T_n\) defined as \(T_0 = 1\) and \(T_n = \min\{j > T_{n-1} \,|\, X_j > X_{T_{n-1}}\}\) for \(n \geq 1\). The sequence of record values is then \(R_n(X) = X_{T_n}\). We also define the sequence of record indicators as \(I_n = 1\) if \(X_n\) is a record and \(0\) otherwise. Finally, let \(N_n = I_1 + \ldots + I_n\) be the number of records observed among the first \(n\) observations.

  1. Provide the conditional distribution of \(R_{n+1}(X)\) given \(R_n(X)\). Use this to iteratively show that the probability density function of \(R_n(X)\) is given by: \[ g_n(x) = \frac{1}{n!}(-\ln \overline{F}(x))^n \cdot f(x). \]

  2. Show that records are invariant under strictly increasing transformations of the random variables, meaning that \(R_n(g(X)) \overset{d}{=} g(R_n(X))\).

  3. Demonstrate that the variables \(I_n\) are independent, and that \(I_n\) follows a Bernoulli distribution with \(\mathcal{B}er\left(\frac{1}{n}\right)\).

  4. Show that: \[ \mathbb{E}[N_n] = \sum_{j=1}^n \frac{1}{j} \quad \text{and} \quad \mathbb{V}[N_n] = \sum_{j=1}^n \frac{j-1}{j^2}. \]

  5. Using the asymptotic behavior as \(n\rightarrow\infty\), prove that: \[ \frac{N_n - \ln n}{\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1). \]

Postface

Beirlant, Jan, Yuri Goegebeur, Johan Segers, and Jozef L Teugels. 2006. Statistics of Extremes: Theory and Applications. John Wiley & Sons.
Beirlant, Jan, Jozef L Teugels, and Petra Vynckier. 1996. Practical Analysis of Extreme Values. Leuven University Press.
Bingham, Nicholas H, Charles M Goldie, and Jef L Teugels. 1989. Regular Variation. 27. Cambridge university press.
Cebrián, Ana C, Michel Denuit, and Philippe Lambert. 2003. “Generalized Pareto Fit to the Society of Actuaries’ Large Claims Database.” North American Actuarial Journal 7 (3): 18–36.
Coles, Stuart. 2001. An Introduction to Statistical Modeling of Extreme Values. Springer.
David, Herbert A, and Haikady N Nagaraja. 2004. Order Statistics. John Wiley & Sons.
De Haan, Laurens, and Liang Peng. 1998. “Comparison of Tail Index Estimators.” Statistica Neerlandica 52 (1): 60–70.
Denuit, Michel, Oana Purcaru, Ingrid Van Keilegom, et al. 2004. “Bivariate Archimedean Copula Modelling for Loss-ALAE Data in Non-Life Insurance.” IS Discussion Papers 423.
Embrechts, Paul, Claudia Klüppelberg, and Thomas Mikosch. 1997. Modelling Extremal Events: For Insurance and Finance. Springer.
Fisher, Ronald Aylmer, and Leonard Henry Caleb Tippett. 1928. “Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample.” In Mathematical Proceedings of the Cambridge Philosophical Society, 24:180–90. 2. Cambridge University Press.
Frees, Edward W, and Emiliano A Valdez. 1998. “Understanding Relationships Using Copulas.” North American Actuarial Journal 2 (1): 1–25.
Galambos, Janos. 1978. “The Asymptotic Theory of Extreme Order Statistics.”
Gnedenko, Boris. 1943. “Sur La Distribution Limite Du Terme Maximum d’une Serie Aleatoire.” Annals of Mathematics, 423–53.
Godard, O, C Henry, P Lagadec, and E Michel-Kerjan. 2003. Traite Des Nouveaux Risques: Precaution, Crise, Assurance. Gallimard.
Gumbel, Emil Julius. 1958. Statistics of Extremes. Columbia university press.
Hosking, Jonathan RM, and James R Wallis. 1987. “Parameter and Quantile Estimation for the Generalized Pareto Distribution.” Technometrics 29 (3): 339–49.
Klugman, Stuart A, and Rahul Parsa. 1999. “Fitting Bivariate Loss Distributions with Copulas.” Insurance: Mathematics and Economics 24 (1-2): 139–48.
Kotz, Samuel, and Saralees Nadarajah. 2000. Extreme Value Distributions: Theory and Applications. world scientific.
Reiss, Rolf-Dieter, Michael Thomas, and RD Reiss. 1997. Statistical Analysis of Extreme Values. Vol. 2. Springer.