

More than 400 insurance professionals convened in Des Moines, Iowa, at the Global Insurance Symposium (GIS) for three days of pitches, dialogues, and insights centered on the theme, “Thriving in a Changing World.”
During GIS, teams of students from Drake University, University of Nebraska, and St. Joseph’s University showcased their insurance talents in the second annual Case Study Competition. Open to students in more than 100 Gamma Iota Sigma chapters across the U.S., the three teams earned cash prizes and the opportunity to present at GIS, where attendees voted for the winners.
Drake University’s Blair Anderson, Elise Conrad, and Megan Traxel won the competition with their presentation, “Synthetic Data: Utilizing Data in a Changing World.” We invited them to share their presentation with you. Enjoy!
For anyone looking for new developments in the AI space, one of the biggest upcoming changes across several industries – most notably the insurance industry – is the use of synthetic data. Synthetic data is data that’s completely artificially generated by a computer algorithm. It’s mainly used to train machine learning models. This artificially produced data, while not based on real observations, represents trends and distributions that are seen in the real world.
Why should the insurance industry care about synthetic data?
Many people may be wondering why they should care about this type of data, specifically within the insurance industry. What better way to show the importance of synthetic data than with statistics? First, a Gartner analysis predicted that in 2022, 85 percent of algorithms would be erroneous due to biases in the data. Whether due to discrimination by gender, race, age, etc., only 15 percent of insurance algorithms were viable. Gartner and its clients predicted this statistic for 2022; who’s to say this has become better or worse in 2023? In contrast, by using synthetic data in a crime-prediction algorithm, one study found researchers could reduce racial bias from 24 percent to 1 percent.
In addition to being more equitable, synthetic data has proven to be accurate and effective. Insurance companies have used it to price small premium policies without using loss information. And a machine learning model was found to be 97 percent effective when using synthetic data. While these statistics only show the beginning of what synthetic data is capable of, it has many other benefits that affect companies and their customers.
Benefits of using synthetic data
The first, biggest benefit of using synthetic data is the possibility that it will save businesses time and money. Gathering accurate and relevant data takes energy that a lot of companies just don’t have the time to do anymore. Along with that, it’s quite expensive to continuously gather data only to have to restart the cycle when your original data sources expire. Synthetic data sources solve both of those problems. It eliminates the time-sensitive factor since data always update to trends, and it saves money by not having to gather the data.
The second benefit is that synthetic data has the power to investigate rare events in the insurance industry. In this industry, there are many different scenarios to explore – How will the market behave in the event of a once-in-a-lifetime flood? What will health insurance pricing look like in the event of a pandemic? – but not enough data to make accurate predictions and decisions. With synthetic data, the model can fill in the holes that are missing. With such a diverse dataset to explore, it’ll eventually help eliminate biases that have been built into our society.
The last benefit is the privacy concerns that customers are facing. People are starting to be more selective about what information they choose to share with companies. Potential customers are turning away because they don’t want to share certain information. By using synthetic data, customers will feel relieved that they won’t have to give up sensitive information. This also will help companies avoid almost 70 percent of privacy violations. Imagine the great marketing campaign an organization can lean into when it no longer has to gather sensitive, private data from its users.
Drawbacks to be aware of when using synthetic data
No pioneering technology is without its drawbacks. The first drawback is that if the data a company uses to create the synthetic data contains outliers, the outliers could either be made more significant or not shown at all. Outliers are always important to keep in mind, hence why the human aspect is so important when analyzing synthetic data. While analyzing this new data, keep in mind the original data that was fed into the algorithm.
The second drawback bounces off of the first drawback. The quality of the data that gets fed into the synthetic data model will be reflected in the overall results. This means that, if dirty data is being used to create the model, the model will be useless and not help the company at all. This drawback can also lead to problems of continuing the biases that we are trying to leave behind.
The third disadvantage is, as with anything related to technology and data, there’s the possibility for data to get leaked or hacked. The best way to counterattack this disadvantage is to have cyber security measures in place. The most important thing to remember is to protect your customers and yourself.
What is the impact of synthetic data on the insurance industry?
The three largest impacts to insurance from using synthetic data begin with new data completeness and privacy considerations for consumers. In today’s age, data is constantly tracked, sold, and used in ways many people don’t know about. No one knows where their data is going, or how it is being used, and this can often lead to consumer concerns, and cause people to withhold information on applications or claim filings. With synthetic data, not only is important information better protected, but it’s also correcting for information gaps that might otherwise make a predictive model less effective.
The second impact the industry would see is much more cost-efficient pricing methods. The process of data collection to build pricing models can be both expensive and intensive. All of the efforts that go into data collection often end with somewhat incomplete data, and a build-up of costs. Using synthetic data that’s completely generated by an algorithm, is much faster, and the cost per data point decreases drastically.
Lastly, one of the biggest impacts the insurance industry could see is an industry that’s much more DEI-conscious. From the new privacy concerns to the reduction in data bias, synthetic data could completely change the view and relationship between customer and company. Personal information that might be considered very private and sensitive to some consumers would be better protected, as well as being better understood by both parties. Through the added benefit of data completeness, and by using data that isn’t based on real observations, the industry could reduce biases in underwriting and pricing.
Real world application of synthetic data
Synthetic data isn’t a new idea. Humana, which is the third largest health insurance company in the United States, has been using synthetic data for a while and has seen excellent success. They want to help other insurance companies to start using the tool, as well. They have created a synthetic data exchange platform that has over 1,500,000 synthetic records for insurance companies to use. Humana wants to see the insurance industry succeed, so they don’t feel the need to keep these synthetic records to themselves. Along with collaborating with other insurance companies, Humana has worked with Nokia and Microsoft on their synthetic data use. It’ll be interesting to see if greater use of synthetic data causes more cross-industry relationships to grow.
The following graph shows how Gartner research firm predicts synthetic data will become the main form of data used in artificial intelligence. Synthetic data will completely overshadow real data by the year 2030. This drastic difference won’t only benefit your company, but also your customers. This gap is only going to continue to exponentially widen as we find more ways to use synthetic data and as models become more accurate and less biased.
How are you going to use your synthetic data?
Synthetic data continues to prove itself as being innovative, inclusive, and impactful to all. It’ll positively impact the insurance industry. Positively impact your customers with your data. Contact AgentSync and learn how to make better, data-driven decisions that will save you money.