What are our pension data best practices? When conducting any analyses, the findings are only as strong as the data being used. This is especially important for policy research, as there are incentives for individuals and groups with conflicting interests to selectively filter data to reach preconceived ends. “Data” can be a largely objective source of information, but how figures are used and what data sources are applied in what contexts can have an outsized effect on policy research and the discussions they aim to inform.
The often contentious nature of public finance discussions related to public pensions means this policy area requires particularly transparent and appropriate utilization of financial data when conducting research. Any “public pension” research product should be held to high standards and expectations of data source quality and applicability. As any experienced researcher will already know, the decision of which data source to use for a project will depend on a myriad of factors, but it is important to acknowledge that some data sources are better than others. Some data sources are more appropriately used for certain kind of analysis versus others, and even the best possible data have limitations that may affect whether they are appropriate to use for a given project.
Equable’s primary mission is to provide quality education information related to public sector retirement systems. Our research team at Equable is committed to transparency in data utilization. To meet this commitment we make the data in our research collection efforts open source, and have a rigorous focus on applying the right dataset to each analytical task.
How We Evaluate Our Pension Data Sources
Public pension data can be broadly classified into two categories: primary source data and secondary data (typically aggregated by another source, such as a research institution).
Primary source data are based on the documents and data provided by public retirement systems. These documents include actuarial valuation reports, comprehensive annual financial reports (both from the retirement system and the state/municipal government), GASB disclosure reports, and experience studies. These reports offer the audited financials, actuarially calculated funding data, and the direct research findings and statements from a retirement system. As a result, these can be thought of as the most accurate, unvarnished data for any given pension or other retirement plan. However, while these data are the most credible, they are often the most resource intensive to collect.
Secondary data sources offer data that have been collected and processed by a third party into some form that is typically aggregated or otherwise provided across multiple retirement systems. Frequently secondary databases rely on primary source documents, but in some cases institutions will collect secondary data, transform and/or combine them, and release that data based on their own methodology. There are numerous benefits to using secondary data sources – they require far less time investment and resources to obtain the data. However, secondary data sources require the researcher to accept any limitations that stem from who compiled the data and what methodology they used.
To the extent possible, stakeholders in public sector retirement systems should utilize primary source data. But we recognize that this can be resource intensive and require specific knowledge about how to interpret pension system reports. By contrast, secondary data sources provide opportunities for research and understanding that can inform the discussion around public pensions more broadly.
At Equable, we compile primary source data for our research. We release all of the data associated with each educational research product that we publish. In the future, we plan to make our entire database available for any researchers who are interested in utilizing it as a valuable secondary data source for their own work.
Understanding the Pros and Cons of Commonly Used Secondary Sources for Public Pension Data
- Strength: Data come directly from primary source documents published by the retirement systems.
- Strength: Data are updated frequently, and plans covered have been expanded over time.
- Strength: PDF files of the primary source documents are available through the project website.
- Limitation: Data do not include detailed plan benefits or benefit policies.
- Limitation: Data are plan-level and not separated across tiers of benefits. This means some elements data that will depend on tiers, such as contribution rates, may not be accurate or will involve judgment calls by the data collector to provide a weighted average.
- Limitation: Documentation is difficult to work with and incomplete in some cases.
- Strength: Data are often self-reported from members, meaning that data are comparable to those reported in primary source documents, but can feature other information not normally included in published reports.
- Strength: NASRA has compiled the most comprehensive data on plan design changes, allowing for comprehensive analyses of the policies related to retirement systems. Many of their datasets include narrative information to provide valuable background to researchers.
- Strength: NASRA conducts research across numerous issues related to pension funding, policies, and benefits. This means data across numerous topic areas are available from NASRA.
- Limitation: There is no central dataset. Rather, there are numerous separate, smaller datasets for each respective analysis. Data are spread out over multiple documents instead of being in one place.
- Limitation: Most datasets lack detailed documentation for confirmation of data points. Raw data is not always provided, as datasets are sometimes presented in the form of charts and summary tables included in reports.
- Limitation: Datasets provided do not have consistent coverage of systems, plans, and tiers making compilation across all sources not always possible or practical. There are reasonable reasons for this in the way various datasets are presented, but it is a limitation.
- Strength: The most comprehensive state-specific, and municipality-specific, data for the jurisdictions covered.
- Strength: Data presented also include transformations of the raw, primary data to show alternative figures. For example, uses of the PTO website can see a given California city’s pension funded status as published by CalPERS or based on using a lower assumption about future investment experience (formally, rediscounting the plan’s accrued liabilities using a lower discount rate).
- Limitation: Data are currently limited California, Texas, and some national data points. Data for Michigan will be published in the future.
- Limitation: Raw primary source data is not always readily available to download through the website. Some of the data shown is transformed, calculated, or estimated. This has the advantage of being cleaner for the user, and sometimes with analytical work already done (e.g. rediscounting mentioned above), but there is a limited access to the primary source data.
- Limitation: Data are not readily available to download through their website. Documentation sources are listed for where they draw primary data.
- Strength: Data are collected from primary source documents published by retirement systems and are only reported once all retirement systems in their database have reported for a given plan-year, meaning that data are for a complete fiscal year when released.
- Strength: These data are limited to a small number of variables, making them easier to review for journalists, stakeholders, and others interested in pension systems.
- Limitation: The Pew methodology of waiting until every retirement system has published their reports for a given year means that data are presented on a two-year lag. For example, Pew’s most recent report from June 2020 publishes data for the fiscal year 2018. The benefit is that data is complete for every plan in that fiscal year; the downside is that the data is old and the majority of 2019 data were available as of the spring of 2020.
- Limitation: Data are often only offered aggregated at the state level and are spread across each of their publications, as opposed offered in a central downloadable database.
- Limitation: Pew’s data are based primarily on GASB 67 reports, which mean that for some states they are leaving out large portions of plan liabilities. For example, CalPERS’s PERF A plan — which carries the largest portion of liabilities in CalPERS — does not have to provide a GASB 67 report because it is an agency multiple employer plan. The benefit of this methodology is that Pew data is consistently drawing from a single source. The limitation, means that Pew’s data for CalPERS, and California by extension, does not completely report all liabilities.
- Strength: When the SLEPP database was published it provided the most comprehensive collection of benefit design data available.
- Strength: Benefit data are broken down at the tier-level allowing for a more complete reflection of benefits offered by different plans.
- Strength: SLEPP’s data are organized in a manner that renders separate documentation largely unnecessary. Data are sourced clearly, often with links to respective plan websites and documents when available.
- Limitation: The data are only updated sporadically. They last indicate as having been updated in 2018 but do not have any clear indication that another update is pending. Further, their last update did not include all plan design changes that were adopted in 2017 and 2018.
- Limitation: The data only focus on benefit design, and do not pair any robust financial data at the system, plan, or tier-level.
- Strength: Data are compiled back through 1993 for many variables, and even earlier for others, while other databases only provide coverage from 2001 forward.
- Strength: Data are government-compiled with the same rigor as other Census data, and are updated regularly (including detailed methodologies related to data collection and transformations).
- Limitation: Data are not compiled from typical primary sources, instead relying on a rotating survey of plans. This survey includes an annual sample of plans reporting information and imputation of those plans not sampled. The surveys are sometimes sent to different state and municipal government departments than those that publish actuarial reports. As a result, many of the data points provided by the Census Bureau do not always line up with the reports that states have published themselves (though the variances are rarely so substantive as to render the data source useless).
- Limitation: Data are limited to state-level aggregations in primary data releases, requiring inquiry to Census or use of flat files to access more nuanced, plan-level data.
- Limitation: Census frequently revises their data and methodology, including for pension data. This can result in the misclassification of some data across plans or over time that researchers must be very careful to account for in reviewing provided documentation.