I have always found it both remarkable and puzzling that there are only two widely accepted, geospatially explicit, gridded data sets describing the current* population of the entire globe. Why only two? It’s not as if the population of the world is a niche subject! Knowing where people live is essential to understanding almost any phenomenon that has global scope and is geospatially heterogeneous (i.e., varies spatially).
Unfortunately, the ubiquitous national-scale map, with every country color-coded to correspond to a single indicator value, is the default solution to many global mapping problems. I say “unfortunately” because national scale maps can be as misleading as they are ubiquitous, for the simple reason that most change has varying impacts on people in different locations. For example, national-scale maps that are based on per capita figures can be grossly misleading when they show a densely populated areas like a megacity with the same data value as a deserted wasteland.
If you want to go beyond the cartoon-like level of the national-scale map, and really understand what is happening to people and places on a global scale, you have to involve spatially explicit population data. Yet, if you need a gridded global population data set that provides high geospatial resolution, LandScan 2007 from Oak Ridge National Laboratory and Gridded Population of the World v 3.0 from the NASA-funded Socioeconomic Data and Applications Center at the Center for International Earth Science Information Network are the only two options.
What’s more, many, if not most, users of gridded global population data are only aware of the existence of one or the other of these global population data sets, but not both. People seem to use the first data set they find, or the first one that is readily available. Even rarer than seeing both data sets mentioned is seeing an analysis that shows the contrasting implications of using both data sets, or a methodological discussion that explicitly articulates the competing considerations in choosing one over the other. To encourage such conscious analysis is the purpose of this post.
The fundamental insight is that LandScan and GPW are different; they have somewhat different objectives, they make different assumptions, they use different methodologies, and they are designed to measure two different indicators. It would not be fair to say that one or the other is “better” than the other, it’s more a question of what tool(s) are best for the job at hand.
The LandScan dataset estimates “ambient” population density — where people are likely to be at noon. LandScan pixels are 30 arc-seconds (or about 1 km wide at the equator). LandScan is updated every year on a one-year lag (thus, LandScan 2007 is available now).
The CIESIN GPW dataset estimates residential population density — where people live and are likely to be at night. GPW pixels are 2.5 arc-minutes, or 5x bigger than the LandScan pixels. GPW is updated every five years.
LandScan uses population reports from official publications, then disaggregates the population according to an algorithm that is based mostly on land use, topography and transportation information. The algorithms used are regionally tailored and reflect local culture. Landscan moves people within administrative reporting units based on suitability models driven by terrain features, infrastructure density, and the like (i.e., people like to be near but not in water bodies, tend to avoid high slope areas with northern exposure, and spend time on or near roads). So people are not placed in lakes even though the population may be reported in an administrative unit that includes lakes. Or population may be dispersed over barren land much more sparsely etc. Locations of central places are noted so greater densities appear at those locations. The net effect is that LandScan tends to put people in cities and along roads.
CIESIN opts for researching the most highly resolved census data possible. For US and other developed countries the maps are more detailed because the reporting is more detailed. As a result they usually have more polygons per country than LandScan does. Their methods do take into account some land use considerations. People are excluded from water bodies and reported urban population is placed within delineated urban areas (and visa-versa for rural population). After that, they spread the people into pixels evenly for each administrative unit.
Obviously, the nature of the problem at hand is important in choosing whether you care more about ambient or residential population density, or whether you must be concerned with both. Take the example of vulnerability to natural disasters. For most natural disasters (perhaps excepting tornadoes), it is impossible to anticipate whether the disaster will occur at night or during the day, so an estimate of the people vulnerable for a single spot really ought to be a range, not a single value. Sensitivity analysis is a useful tool to assess the different implications of these values.
TerraViva Novice’s Journey – Celebrating World Population Day | Global Data Hound // Jul 8, 2010 at 11:07 am
[...] this Sunday, I took a look at the 2 arc-minute (i.e. disaggregated) Population Density 2007 dataset previously examined by our Global Data Hound, thinking that it might exemplify the value of knowing where people are. I focused on how this [...]