Geoff Werner highlights the bridge from a 1996 vision proposed by Randall Brubaker to the arrival of TNEDICCA’s precise crash location data, enabling that vision to become a reality.

This research article is authored by Geoff Werner from WERNER ADVISORY LLC. The original article and other great posts by Geoff can  be found on his website

Geographic (or location) risk is a major determinant of personal automobile insurance premiums and is based on a consumer’s garage location. Actuaries employ special techniques to determine territories and associated relativities because of unique attributes of this rating characteristic. Territory is also commonly criticized by consumer groups citing disparate premium impact on minorities.

This article outlines data and techniques that could be used to make a step-change improvement in the accuracy of estimating and reflecting geographic risk particularly for medium- and small-sized insurers. These improvements may also alleviate some of the consumer groups’ concerns.

Actual Texas bodily injury (BI) data is used to illustrate the points. Similar or more stable results occur if another coverage with higher frequency and less severity variation was used.

Background

Rating variables are comprised of levels representing homogenous groupings of insureds with respect to that characteristic. Rating variables with relatively few levels are easier to analyze as there is more data in each level. Location is a high-dimensional variable which means even the largest insurers may have limited loss experience in some geographic areas. Fortunately, two neighbors are likely to have similar geographic risk and actuaries have used this assumption to create techniques to develop territories and territorial rates.

Insurers start by subdividing a state into small building blocks (e.g., zip codes or census block groups) and aggregating the direct loss experience of insureds who reside within each building block (regardless of where the claim occurred). To bolster the credibility, they may augment the experience with that of nearby building blocks and/or external data sources. Once done, the insurer may use clustering routines to combine similar building blocks into a territory with an associated territorial relativity.

In some cases, the resulting territories can span a relatively large geographic area. This is particularly true in sparsely populated areas where there is less direct experience data and more smoothing is required. This can weaken the accuracy of the cost estimate for individuals within a given territory and result in significant premium discontinuities at territorial boundaries.

Figure A is a map of zip codes in and around Boerne, TX 78006 which is a suburban area on the northwest side of San Antonio, TX. Each color represents a different territory which, in this case, is a grouping of zip codes. 78006 (i.e., level 3) has a BI territorial relativity of 1.03 but is bordered by zip codes with relativities ranging from 0.75 to 1.27. Practically speaking, that means the BI territorial premium for residents located at the northern boundary of 78006 is about 37% (=1.03/0.75-1.0) higher than for their neighbors immediately across the zip code line. Similarly, residents in the zip code directly below 78006 are charged about 23% (=1.27/1.03-1.0) more than their neighbors in 78006. It is unlikely that BI geographic risk changes that drastically at the zip code boundary and suggests that zip code 78006 is likely comprised of heterogenous risks with a combined average risk of 1.03.

Figure A is a map of zip codes in and around Boerne, TX 78006 which is a suburban area on the northwest side of San Antonio, . Each color represents a different territory which, in this case, is a grouping of zip codes. 78006 (i.e., level 3) has a BI territorial relativity of 1.03 but is bordered by zip codes withrelativities ranging from 0.75 to 1.27. Practically speaking, that means the BI territorial premium for residents located at the northern boundary of 78006 is about 37% (=1.03/0.75-1.0) higher than for their neighbors immediately across the zip code line. Similarly, residents in the zip code directly below 78006 are charged about 23% (=1.27/1.03-1.0) more than their neighbors in 78006. It is unlikely that BI geographic risk changes that drastically at the zip code boundary and suggests that zip code 78006 is likely comprised of heterogenous risks with a combined average risk of 1.03.

Randall Brubaker’s Vision

In 1996, Randall Brubaker wrote “Geographic Rating of Individual Risk Transfer Costs Without Territorial Boundaries”. His paper describes a procedure to replace traditional rating territories to address the discontinuity issue.

His procedure involves establishing the geographic risk for a series of “grid points” with the distance between grid points varying depending on expected geographic cost differences in that part of the state. The risk for each grid point would be based on loss and exposure data from the surrounding area and/or from areas with similar geographic characteristics (e.g., similar population density). The geographic risk of a single residential address is then based on interpolation between the closest grid points. The result of his procedure is a continuous risk surface that reflects the gradual change in premium from one location to another devoid of significant premium discontinuities.

Brubaker noted implementation challenges related to technological limitations of computing power and geo-coding. That was 1996; data and technology has improved dramatically over the last 25 years.

Introducing TNEDICCA
TNEDICCA (accident spelled backwards) is a company with a mission to reduce future traffic accidents. They leverage data and analytics to provide solutions to auto insurance, navigation service, automotive manufacturing, and transportation planning industries. To do so, TNEDICCA ingested approximately 30 million police reports from 40 states representing over 90% of U.S. automobile insurance written premium. Figure B shows the location and number of crashes in and around Boerne, Texas.

Their database has two advantages over traditional actuarial datasets. First, each insurer only has data on claims involving their insureds while TNEDICCA has data for every reported accident. Second, TNEDICCA’s database includes the exact location of all reported accidents (as opposed to actuarial databases that assigns each claim to the insured’s address rather than where it occurred). Leveraging this unique database, TNEDICCA has quantified the relative geographic risk for every 50-meter by 50-meter segment of road. The data shows that risk varies significantly by segment and that 10% of the locations account for two-thirds of reported accidents. In other words, where a person drives really matters.

TNEDICCA calculates a Location-based Risk Score for each address given proximity to each 50-meter by 50-meter segment of road. To do this, TNEDICCA uses a proprietary continuous weighting function that gives more weight to road segments nearer the garage location. Therefore, an insured who lives slightly closer to a major accident hot spot than their neighbor will have a slightly higher Location-based Risk Score. The result of this procedure is a smooth, continuous measure of risk.

Critics may worry the TNEDICCA database only includes reported accidents and/or does not include a severity element. TNEDICCA has performed dozens of tests with insurers to test the predictive power; Figure C shows one such test. The company’s insureds were ordered based on the Location-based Risk Score, the risks were aggregated into ten equal-sized buckets (or deciles), and the pure premium was calculated for each bucket. The solid blue line shows the actual BI pure premium relativity for each decile quantifying the predictive power. The ratio of the actual BI pure premium for the highest and lowest deciles is above 2.00. For comparison’s sake, the comparable ratio for a top 10 insurer was 1.78. In other words, the TNEDICCA Location-based Risk Score is already more predictive than indications developed by a top 10 insurer using traditional procedures. The difference would likely be greater for a medium- or small-sized insurer with less company loss and exposure data. As TNEDICCA implements planned improvements to incorporate a better severity component, it will only increase the gap in predictive power.

Now that the predictive power has been confirmed, let us return to the Boerne example to examine the impact on individuals. Figure D shows the relative risk for every address in and around 78006 with higher bars representing higher risk. (Please note, the map was rotated slightly to make it easier to see the height of the bars.) 78006 is clearly made up of heterogenous risks whose risk decreases steadily moving from the southern to northern part of zip code.
Figure D is a little challenging to interpret. In Figure E, individual addresses were clustered into five levels (or territories) based on the Location-based Risk Score without regards to zip code boundary lines. Given the variation the addresses probably should have been divided into more than five levels, but five was chosen to facilitate comparisons with Figure A. Without the the requirement to group all risks within a given zip code into the same level, 78006 risks are now in levels 1 through 4. Furthermore, there are no longer any instances of a two-level jump between neighbors. In other words, these new territories result in individual risks being charged a more accurate BI territorial premium and the maximum discontinuity between any two neighbors (i.e., on each side of a “boundary”) is reduced from over 35% to under 20%. Therefore, an insurer that is not ready to eliminate territories altogether, could still make a significant improvement simply by leveraging the TNEDICCA Location-based Risk Score and clustering similar risks into the desired number of territories.


It is interesting to study the movement of individual risks comparing the two approaches (traditional approach with zip code building blocks versus clustered Location-based Risk Score): Figure F shows the comparison. The grey-shaded cells are the risks that remained in the same level in both approaches. Looking at column 3, only 18% of the risks previously categorized in level 3 remained in level 3 after leveraging the TNEDICCA approach. 4% and 78% moved to levels 1 and 2, respectively. Virtually all the movement involved risks moving from a higher-risk to a lower-risk category. Insurers do not have the luxury of 3 million TX crashes (like TNEDICCA), so they employ smoothing approaches. As higher-risk areas will have more crashes, less smoothing is required, which is the likely cause of the category movement bias.

TNEDICCA: Extending the Accuracy with Telematics Data

The Location-based Risk Score estimates the risk of various road segments and weights them together based on assumptions of where an insured will drive. What if we knew exactly where the person drives? Many insurers already have insurance products that use telematics sensors to capture how, how much, when and where an insured operates their vehicle. With those products we can improve even more!


Figure G displays two trips (A to B and A to C). Considering the risk of each road segment traveled in each route, the per mile risk of trip A to C is approximately 33% higher than that of A to B. Thus, two co-habitants living at point A have different geographic risk if one commutes daily to C and the other to B. The telematics data could replace the “assumed” weighting function to provide a precise weighting function tailored to each insured. In other words, this is like a “virtual insurance tollbooth” that could result in a different “territorial charge” for vehicles in the same household! TNEDICCA has performed a handful of tests of this Telematics-Based Risk score and found that it is usually two to three times more predictive than the Location-Based Risk Score.

Since telematics data is not currently available at point of sale, an insurer would likely set the initial price for a telematics-based policy using the Location-Based Risk Score and slowly replace it as actual driving data is collected. For example, if the “transition” period was 100 days, the company could give 1% weight to the telematics data and 99% weight to the Location-Based Risk Score after one day, 2%/98% after two days, and so on. By the end of 100 days, the geographic risk would be completely determined based on actual driving patterns.

What May Consumer Groups Think?

Consumer groups regularly criticize the application of territorial ratemaking citing disparate impact on minorities who may represent a disproportionate share of residents in higher-risk urban centers.

Implementing the Telematics-based Risk score and basing premium on “where you drive” rather than “where you live” is a step-change social improvement. A person who lives in the city and commutes to the suburbs for work would receive a similar charge as a person who does the reverse commute. In fact, if traffic congestion is considered, the person with the reverse commute may have a lower charge. Additionally, people who live in the city are more likely to have greater access to public transportation. If he/she/they choose to regularly use public transportation instead of driving, that would further reduce premium. Note coverages that apply fully or partially to non-accident claims would not be impacted similarly.

Even the Location-based Risk Score alleviates some of the issue. Replacing traditional territories with a continuous risk surface or smaller homogenous clusters alleviates large premium discontinuities that exacerbates the disparate premium impact.

Summary

Estimating and reflecting geographic risk is challenging. Even large insurance companies may not have enough data in every geographic area which can necessitate smoothing of insurance loss data across large areas. These approaches can result in 1) heterogenous risks being grouped and charged an average rate and 2) large premium discontinuities at territorial boundaries. Randall Brubaker suggested potential enhancements in 1996 while noting some implementation challenges, but data and technology has advanced since then.

TNEDICCA has amassed a large database of reported crashes geo-coded to the exact location of the crash. This data enables TNEDICCA to calculate the relative risk of every road segment. Each individual residential address can have a tailored geographic risk based on the risk of road segments he/she/they are likely to drive.. This approach results in more accurate reflection of risk free of large premium discontinuities.

While some insurers may choose to work with TNEDICCA simply to create smaller more refined territories, some will consider eliminating traditional territories altogether. Of course, the most progressive companies will leverage their telematics data to get the fairest and most accurate measurement of geographic risk possible.