HHS drops biggest Medicaid dataset ever for fraud hunt

Today the HHS DOGE team open sourced the largest Medicaid dataset in department history.

This dataset contains aggregated, provider-level claims data for a specific billing code over time.

For example, using this dataset, it would have been possible to easily detect the large-scale autism diagnosis fraud seen in Minnesota.

Download the data yourself:

https://opendata.hhs.gov/

Per Grok:

To search the Medicaid Provider Spending dataset:

1. Download it (10.32 GB) from the HHS Open Data platform at http://healthdata.gov (search for “Medicaid Provider Spending”).

2. It’s aggregated tabular data (likely CSV) by provider, procedure code, month (2018-2024).

3. Use Python for analysis: import pandas as pd; df = http://pd.read_csv(‘file.csv’); # Filter e.g., df[df[‘procedure_code’] == ‘code’] or df[df[‘provider_id’] == ‘id’]. For large files, use Dask.

4. To detect anomalies like fraud, group by code/location and plot trends with matplotlib.