Social Network Analysis in Epidemiology: Part 2
A prerequisite to modeling social networks is familiarity with the terms and concepts used in the field. This glossary is based on two sources: 1) Network modeling for epidemics workshop at the University of Washington (2015), and 2) Borgatti, Everett, Johnson (2013): Analyzing Social Networks.
Networks as applicable to public health are essentially comprised of two objects: people and the relationships between the people. These relationships may be sexual, emotional, transactional, and so on. In social network science, people and their relationships are referred to as follows:
A key question we want to address in social networks is how do relationships form? Selection is the process by which actors choose each other, and may be based on some shared characteristic (termed homophily; "birds of a feather flock together"). On the other hand, friends of friends may become friends, known as transitivity (for person A, B, and C there will be an edge connecting A—B, A—C, and B—C). In both cases of homophily and transitivity, triangles in the networks may be formed (A, B, and C all have an edge so they are fully connected), but fortunately can be disentangled statistically. Selection may also occur within subgroups or clustering and represent cliques or factions.
We can describe number of relationships in a social network by using measures that reflect cohesion:
In addition to describing properties of the edges, we can describe properties of nodes in terms of how important they are to the network through measures of centrality. Highly central nodes may have large number of relationships, or may disconnect the network (break connections) by removing them.
The basis for the type of analysis presented in these blogs posts are exponential random graph models (ERGMs), and are a class of statistical methods for generalized network inferences that use simulation approaches to test hypothesis about cohesion, where the null hypothesis is a connection based on chance alone (i.e., a random network). These models are analogous to the familiar regression models from epidemiological analysis and carry some of the same assumptions and caveats. Model degeneracy is the failure of estimated network to produce the observed network (and can manifest as failed convergence or lack of model fit). When creating these models, one needs network data: the individuals in the network and their relationships. In an ideal world, we would have network census data, where we know each and every person in the network and their relations to everyone else. In the practical epidemiology world, we most likely have egocentric data, where we don't know individual relationships, but have some global statistics such as the number of relationships, concurrent partners, duration, etc.
...continue to Part 3...