MarketScan is a robust secondary data source that provides claims data from employers, health plans, and hospitals, as well as Medicare supplemental claims. Primarily, it includes information for those with commercial insurance, but also includes limited claims data for Medicare and Medicaid beneficiaries for more than 250 million de-identified enrollee records. Data include beneficiary enrollment and healthcare service, claims for outpatient, inpatient, and Emergency Department visits, and outpatient pharmacy claims for all enrollees.
Interested in exploring how the MarketScan data can further your research? Request a consult and select the Pragmatic Research and Trials Core.
MarketScan is a robust secondary data source of national claims data. Persons in the dataset are largely those with employer-sponsored insurance (ESI), as well as some persons enrolled in Medicare and some enrolled in Medicaid (see below). Data are submitted by employers, most of whom are self-insured, and from payers (insurers). There are approximately 25 million enrolled lives per year in the MarketScan data, with the ability to track patients longitudinally as long as they remain within the same data submitter (employer or insurer).
The data provided are claims data – largely covering outpatient care, inpatient care (including emergency department visits), labs, imaging, and drugs. There is also an enrollment file.
This n also provides a general overview of the data.
No. However, for the subsets of commercial and Medicare enrollees, Merative has created weights to project this population to the national population of individuals with ESI. The MarketScan Commercial Insurance Weights were constructed using the Public Use Microdata Sample (PUMS) of the American Community Survey (ACS) conducted by the U.S. Census Bureau. These estimates are designed to weight individuals in the MarketScan databases to reflect the national ESI population by demographic stratum; health care use varies appreciably across demographic strata.
Inpatient and outpatient claim data elements include beneficiary demographics, limited geographic information, service start and end dates, admission type, principal and secondary diagnosis codes, discharge status, major diagnostic category, principal and secondary procedure codes, DRGs, length of stay, place of service, and quantity of services. The financial data elements include total and net payments, payments to physician, hospital, and total admission payments. Financial data include both the allowed amount paid by the insurer as well as the patient cost-sharing (deductible, co-insurance/co-payment). One the outpatient side, there are up to 4 ICD diagnosis codes per claim.
Outpatient pharmacy data elements include generic product name, average wholesale price, prescription drug payment, therapeutic class, days supply, national drug code, number of refills, and therapeutic group.
Lab results include LOINC code, reference high and low ranges, lab result categories (e.g. abnormal, equivocal, etc.) and service date.
Dental data include provider specialty, procedures, procedure group and service type.
There are two datasets (Set A and Set B) based on the financial data that are available.
Starting in data year 2019, actual cost data became unavailable for a small subset (approximately 15 percent) of the population in the databases. For this part of the population, Merative (the MarketScan data vendor) offers imputed cost data. While use of imputed data is a common industry practice, we understand that, depending on a client’s intended use for the data, including their objectives and specific study concepts and researcher preferences, the addition of some imputed data may not be a preferred solution. Hence, Merative offers clients requesting 2019 datasets and beyond a choice between one of the two datasets:
Set A: Contains 100 percent of the population with actual cost data for ~85% and imputed cost data for the additional 15%. This dataset does not contain an imputation flag to protect the privacy of patients as well as the privacy of our data contributors and suppliers. The methodology used for the imputed cost data is a combination of hotdecking and stochastic regression. No further information on imputed data is available from the data vendor.
Set B: Contains approximately 85 percent of the population and contains actual cost data for all observations.
Strengths:
Limitations:
Note: Once the Secondary Data Resource is established, to sustain its operations there will be a fee for accessing the data resources and the information technology platform for analyzing these data. We will provide advanced notice before beginning to charge a fee.
Currently there are fees for using the MarketScan data that are specified in our licensing agreement. Use of the MarketScan data for grant-funded research (foundation, state, or federal grant) requires payment of an additional licensing fee that depends on the total amount of the grant award. If the award is less than or equal to $250,000, the incremental fee is $10,000 or 10% of the total award per study, whichever is greater. If the award exceeds $250,000, the incremental fee is fixed at $25,000 per study. The incremental fee for commercially funded research will be negotiated on a case-by-case basis with Merative, the provider of MarketScan data. This fee covers one specific project.
To start the process, please complete and submit a consult request form here and select the Pragmatic Research and Trials Core.
We will reach out within 5 business days to schedule a one-hour consultation to discuss your request. If needed, a second consultation with our MarketScan Subject Matter Expert will help you decide if the MarketScan data is appropriate for your project and can answer your research question(s).
.鈥疓idwani R,鈥痀ank V,鈥疉sch SM, et al. JAMA Netw Open.鈥�2025;8(4):e258045. doi:10.1001/jamanetworkopen.2025.8045
Sarah E. Soppe, Whitney R. Robinson, Mark P. Lachiewicz, Mollie E. Wood
JAMA Netw Open.鈥�2025 Mar;鈥�8(3): e250842.鈥�Published online 2025 Mar 17.鈥�doi:鈥�10.1001/箩补尘补苍别迟飞辞谤办辞辫别苍.2025.0842
Stacey L. Rowe, Sheena G. Sullivan, Flor M. Munoz, Matthew M. Coates, Onyebuchi A. Arah, Annette K. Regan. Am J Public Health.鈥�2025 Mar;鈥�115(3): 354–363.鈥疨ublished online 2025 Mar.鈥�doi:鈥�10.2105/础闯笔贬.2024.307899
. Rae M. Cos C. Dingel H. Peterson-KFF Health System Tracker. July 13, 2022.
Risha鈥疓idwani, Veronica Yank, Lane Burgette, Aaron Kofner, Steven M. Asch, Zachary Wagner
Health Serv Res.鈥�2024 Dec;鈥�59(6): e14343.鈥疨ublished online 2024 Aug 13.鈥�doi:鈥�10.1111/1475-6773.14343
Daniel J. Shapiro, Matt Hall, Mark I. Neuman, Adam L. Hersh, Jillian M. Cotter, Jonathan D. Cogen, Thomas V. Brogan, Lilliam Ambroggio, Anne J. Blaschke, Susan C. Lipsett, Jeffrey S. Gerber, Todd A. Florin. JAMA Netw Open.鈥�2024 Oct;鈥�7(10): e2441821.鈥疨ublished online 2024 Oct 29.鈥�doi:鈥�10.1001/箩补尘补苍别迟飞辞谤办辞辫别苍.2024.41821
Zirui Song, Yunan Ji, Dana G. Safran, Michael E. Chernew. N Engl J Med.鈥疨ublished in final edited form as: N Engl J Med. 2019 Jul 18; 381(3): 252–263.鈥�doi:鈥�10.1056/狈贰闯惭蝉补1813621
Carrie E. Fry, Alvin D. Jeffery, Manuel Horta, Yixuan Li, Sarah S. Osmundson, Julia Phillippi, Lori Schirle, Jake R. Morgan, Ashley A. Leech. JAMA鈥疕ealth Forum. 2024 Nov 1; 5(11): e244216.鈥疨ublished online 2024 Nov 1.鈥�doi:鈥�10.1001/箩补尘补丑别补濒迟丑蹿辞谤耻尘.2024.4216
CU Anschutz
Anschutz Health Sciences Building
1890 N Revere Ct
Third floor
Aurora, CO 80045
303-724-8995