Civic Hacking for Public Health
Post date: Feb 7, 2015
A friend of mine has recently become involved with a movement known as “civic hacking.” For the uninitiated, me included, one definition posted by Code for America is “Collaborating with others to create, build, and invent open source solutions using publicly-released data, code, and technology to solve challenges relevant to our neighborhoods, our cities, our states, and our country.”
After reading this it occurred to me: That’s what we do (been doing) in public health. Granted, the term may be new and trendy, but the concept of collaboration in science is well established. For example, see this article I coauthored on collaboration in autism research. Another example is this “hackathon” event to develop graphical resources for infectious disease epidemiology. The end product of a collaboration may be a publication or presentation (particularly in academics), or some kind of tool or resource, with the intention of moving the field forward. Often the code used in the analysis (particularly in a methodologically demanding piece) can be obtained from the authors or as an online appendix to benefit others working on similar problems. And sometimes, perhaps not all that rare, the research may be self-serving (and just as valuable).
Regardless of whether you’re doing this for altruistic or individual reasons, there are some great minds working on real public health problems that at the end of the day may very well impact you. As a public health professional, most likely you have already personally worked with publicly released data (e.g., NHANES) or have used an existing tool that mines big data (e.g., Google Flu Trends; see chart below). For the non-public health professional, the knowledge that these public data exist and are freely available may be the impetus to begin working in this field.
As I began to brainstorm the possibilities using my friends coding expertise and my knowledge of public health, I realized there is a plethora of public source of health and health related data that we can tap into, and at no cost. Further, the greater population of civic hackers may not be aware of these data sources. Some of these need to be downloaded as stand-alone datasets, others provide an API for tapping into them on the fly. I’ve already previously written about geocoding and mapping to census geographies in the journal Epidemiology (R code available for download as an eAppendix to both articles); one example perhaps of epidemiologic civic hacking (wow that’s a mouthful). I also intend to write a blog post here in the future that is more of an all-encompassing how-to guide building on the methods detailed in each of the two research letters I link to above.
Therefore, this post serves as my initial list of these data (at least the major sources), and while I’m using it to keep track for my own needs, I thought this list might benefit other civic hackers. This is intended as a starting point, or for use to generate ideas, and probably won’t be maintained over time (unless there is demanding interest). The categories are for convenience, and are not mutually exclusive. Please also note that many of these data use sophisticated sampling methodologies that must be taken into account for valid analysis and inference -- when in doubt consult an epidemiologist or biostatistician. There may also be IRB requirements. When in doubt, contact the owners of these data for specific data-use requirements.
Lastly, if you’re interested in collaborating on something, let’s talk.
- Clinical Trials Yale University Open Data Access: Individual patient level analysis data, must request access and be approved.
- Clinical Study Data Request: Individual patient level analysis data, must request access and be approved.
- Government-sponsored National Library of Medicine Databases, Resources & APIs: Data sets, APIs, and tools for clinical data, biological and genetic data, medical literature, and a variety of other interests.
- National Cancer Institute, Division of Cancer Epidemiology and Genetics: Glycemic index data, radiation fallout data, cancer SNPs.
- American FactFinder: Population, housing, economic and geographic information aggregated by census geographies.
- TIGER products: Geographical boundary data that can readily be linked to other census information.
Population-based Health Surveys
- National Health and Nutrition Examination Survey (NHANES): Health and nutritional status of adults and children in the United States; individual level respondent data.
- Behavioral Risk Factor Surveillance System (BRFSS): Health practices and risk behaviors linked to chronic diseases, injuries, and preventable infectious disease; individual level respondent data.
- The National Longitudinal Study of Adolescent to Adult Health (Add Health): Longitudinal study of social, economic, psychological and physical well-being of adolescents maturing to adulthood; individual level repeated measure data.
- The National Health Interview Survey (NHIS): Cross-sectional survey of civilian, non-institutionalized population to monitor health of U.S. population on broad range of demographic and socioeconomic indicators; individual level respondent data.
- Harvard University Program on Survey Research: From website, "gateway to a large collection survey data including individual researchers as well as large data archives."
- Community Health Data Base: Aggregated data specific to Southeastern Pennsylvania.
- LGBTData.com: List of population-based health surveys that have included sexual orientation.
- PublicAPIs: API list for access to a variety of data and programming tools.