Global Data Hound

All about global data sets

Global Data Hound header image 1

The Human Impact on Land

July 1st, 2008 · No Comments

Last month we looked at human impacts on oceans; this month we turn to the human impact on land. The NASA-funded Socioeconomic Data and Applications Center (SEDAC) operated by CIESIN has long been a pioneer in supporting the development of data pertaining to the “human dimensions” of global environmental change.  This spring ISciences has partnered with CIESIN to release the 2008 version of the TerraViva! SEDAC data viewer.  A free version of GDV on a CD preloaded with 51 SEDAC data sets is available from CIESIN, and several of the SEDAC data sets are available here at TerraViva.net from ISciences.

Now, however, we focus on one particular set of products from CIESIN, version 2 of “The Last of the Wild” produced by CIESIN in conjunction with the Wildlife Conservation Society (WCS).  LTW2 (as it is known) includes three constituent data sets: the Human Influence Index, the Human Footprint, and the Last of the Wild.  It is a substantial improvement over LOTW1  because the new version uses updated data on human population density, urban boundaries (GRUMP), lower-level administrative units, roads (especially in Africa and Latin America), and navigable rivers.  (CIESIN notes, however, that LTW2 cannot be compared retrospectively to LTW1 because too many of the data layers are different.)

Before digging into the details of the data available via GDV, I should mention that the WCS site devoted to the Human Footprint adds a valuable perspective with photographs and case studies that humanize the data far more than I am about to do.

The Human Influence Index is calculated using a point scoring system that is described in detail at http://sedac.ciesin.columbia.edu/wildareas/methods.jsp.

methods.jpg

To illustrate how the HII is computed, and to provide some insight into how indices like this are constructed, I’ll rate a favorite 1 km x 1 km pixel, the former Ann Arbor headquarters  of ISciences LLC, which is located at 300 North Fifth Avenue in the picturesque Kerrytown neighborhood, right near a Farmer’s Market and our arguably world famous deli Zingerman’s.  As  you can see, our pixel  is right on the border between two pixels, but it looks like it should get an HII rating of 56, out of a maximum of 64.

aahii56.jpg

According to SEDAC’s Gridded Population of the World (available on TerraViva.net) the population density in this mixed use residential and retail area is 2,067 per square km,  or well in excess of the 10 people/sq km that produces an HII subscore of 10.

pop density 300

The pixel is  less than 2 km from the nearest railroad.  The HII uses VMAP 0 Roads and Railways, but, not having time to fire up Arc or import the dataset into GDA, I eyeballed things using Google Earth.  Score: 8.

rr36.jpg

We are within 2 km of a major road–the M-14 runs north of Ann Arbor.  Ditto on the source data.  Score: 8.

major-road.jpg

Stretches of the Huron River, which runs through Ann Arbor, are  popular locations for canoeing, but a religious group that recently asked the city of Ann Arbor for permission to conduct a baptism in its waters was recently advised “not a good idea.” However, the HII doesn’t care about this particular form of water quality; it only asks whether there is a navigable river nearby.  The Huron is not, thank goodness, suitable for barges, so it scores 0.

Although Michigan is surrounded by the longest freshwater coastline in the world, Ann Arbor is about fifty miles from the lakeshore, and here in the Midwest we are a long way from any oceans. Score: 0.

The HII uses night lights as a proxy for human activity.  In TerraViva, we use the Radiance-Calibrated Lights of the World (RCLW) database produced by the NOAA National Geophysical Data Center (NGDC) using the Defense Meteorological Satellite Project.  There is a wrinkle: our presentation of the data displays radiant energy received by the pixel, but the HII index is calculated by assigning the score based on the percentage of days night lights are visible in the pixel.  In other words, they rely on a different view of the same underlying data.  In this case, I know from walking around the pixel, which includes portions of Ann Arbor’s restaurant district and some student ghetto housing, that there are lights on all the time.   Score: 10.

AA at night

It’s interesting that both this night lights image and the Gridded Population of the World give a more pronounced image of Ann Arbor as a “peninsula” jutting out from metropolitan Detroit than the Human Influence Index does.  The HII may be an accurate reflection of a human influence that is more subtle than mere habitation. I have driven through the areas north and west of Ann Arbor on many occasions, and although they are lightly populated relative to the city, they are crisscrossed by a rectangular grid of parallel county roads that effectively subdivide the habitats into ~ 1 km square.

Our pixel is within an urban polygon, according to SEDAC’s Gridded Urban/Rural Mask Polygons (GRUMP).  Score: 10.

grump.jpg

Our land cover type is urban. Score: 10.

The total HII score for our pixel is, thus, 56, as predicted.

Now let’s look at how the Human Footprint is computed.  We know that some areas of the world are more easily influenced by humans than others. Thus, an HII score of 25 that occurs in a rain forest pixel is more “impressive” evidence of human influence than one that occurs in a temperate forest biome like ours.  Thus, the Human Footprint is calculated by ranking all the pixels that share the same biome. Here is what our pixel looks like in the Human Footprint.

footprint.jpg

As we would expect from the discussion above, our pixel is among the most heavily influenced in its biome.

The Last of the Wild is simply the bottom 10% of the Human Footprint. What’s really striking (and rather sad) is that the nearest relatively wild areas are found in the northern portion of the Lower Peninsula, several hours away by automobile.

wildaresa1.jpg

→ No CommentsTags: About ISciences · The Anthrosphere

A Global Map of Human Impact on Marine Ecosystems

May 12th, 2008 · No Comments

In this month’s TerraViva! spotlight is a global map of human impact on marine ecosystems that was created by the National Science Foundation’s National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara. The map is based on a study that was published in the February 15, 2008 issue of Science (Halpern, B.S., et al. A Global Map of Human Impact on Marine Ecosystems, 15 February 2008, Science 319, 948 (2008).) The study is significant because it is the first-ever spatially explicit global assessment of human impacts on marine ecosystems.

The strategy for building the human impacts map was to estimate the global impact of 17 different types of (at least partially) anthropogenic drivers –such as oil rigs, invasive species, and fisheries — on 14 types of marine ecosystems — such as beach, coral reefs, and deep waters.The driver layers were rescaled to unitless values between 0 and 1. Then an expert survey provided weighting variables by assessing the vulnerability of each ecosystem type to each driver on five different ecological criteria. The weighted 0-1 driver impacts for each 1 km sq cell were summed and the final number represented the relative cumulative impact of human activities on all ecosystems in a particular 1 km sq cell.The techniques used to create each data layer are described in detail in the supporting online material published in Science, which, for a global data hound, is the equivalent of a bunch of juicy steaks. Often, substantial cleverness was involved. For example, oil rigs at sea were identified using the Stable Lights of the World data set from NOAA/DMSP. Invasive species were modeled as a function of the amount of cargo traffic at a port, based on peer-reviewed scholarly articles establishing a relationship between the variables and port traffic estimated using several port databases from the American Association of Port Authorities, Australia, Lloyd’s, and the National Geospatial Intelligence Agency. (I was thrilled to find out about the additional port databases as I had to rely on Digital Chart of the World’s port layer for a project a couple of years ago and was frustrated with its age and lack of detail about ports). Fisheries impacts were based on half-degree global commercial catch data developed by the Sea Around Us Project.  In short, if you need global ocean data, this is a great place to start!

Here is a global view (at 20 arc minutes) of the summary data:

NCEAS Global Map of Human Effects on Oceans

This is a useful synoptic view of the situation. What it seems to show is that there are substantial human impacts on the oceans in most of the world and that there are particularly severe impacts in highly developed coastal fisheries. It is interesting to compare this view of ocean impacts with this view of “the human footprint” on land, produced by SEDAC:

footprint.jpg

and with this view of Ocean Primary Productivity 1997-2002 from NASA.
OPP 1997-2002

It’s not surprising that human impacts are high in areas where there is high ocean primary productivity, but note that the east and west coasts of Africa and South America have relatively high primary productivity with, as yet, less intense human impacts..  One wonders whether human impacts on these coastlines will increase over time.

We used an alpha feature in Global Data Analyst to prepare the following map, which combines the Human Impact on Marine Ecosystems with the SEDAC Human Footprint to provide a unique view of human impact on the world.

Global Human Footprint on Land and Oceans

This version of the integrated map combines two separate color schemes.  In both land and sea areas, reds mean high human impact, while at land green means less impact and at sea pale yellow means less impact.  The overall impression of “busy-ness” caused by the clashing color schemes is not far from the truth, as we know from pictures of human debris washed up on isolated beaches hundreds of miles from the nearest human habitation.

debris on Kure Atoll

(Kure Atoll, NW of Hawaii, 2006)

Most of Earth, including both land and oceans, is significantly influenced by human activity.

P.S. The map files for the TerraViva! viewer can be downloaded from ISciences in two versions: a smaller one (49 MB) that contains maps at resolutions of 2 arc minute, a 5 arc minute, and 20 arc minutes , and a larger one (289 MB) that also contains a 30 arc-second map.

→ No CommentsTags: About TerraViva! · New Global Data Set List · The Anthrosphere

Welcome to TerraViva.Net

April 22nd, 2008 · No Comments

Today, Earth Day 2008, we release a new version of our TerraViva.net website. We are doing this so that we can improve access to a rich world of global data sets about Earth. Although many of these primarily scientific data sets are available freely at government and other sites, they are out of reach for many of those with a serious interest in our Earth; the storage formats are complicated and often incompatible, and the software needed to view and study the data is expensive and complex. So we have identified some of the key data sets, we have harmonized them, and we have made them easily accessible using our package of the TerraViva! software tools.

Before I get started on a tour, I will spend a few minutes setting up some vocabulary that we use. Geostatistics are data that are organized by administrative unit. Administrative units are nations, provinces, districts, and so on: the fundamental building blocks by which we govern ourselves, and hence document socioeconomic and environmental statistics. Thematic maps are maps about a particular theme or topic—e.g. land use, population density, and so on. The underlying data for these is often raster or gridded data sets. We can use geostatistics to create dynamic maps, i.e. thematic maps on the fly. (The technical term is “chloropleth maps”; you may also think of them simply as administrative unit outline maps with the areas colored according to the value of a particular variable.) Thematic maps and geostatistics may be recorded in time, resulting in time-series data sets. Such time-series, global data sets are absolutely the foundation of a great deal of work in human, atmospheric, oceanic, and biological science.

TerraViva! enables you to work with all these things together. This is important because most of the major issues concerning our Earth are at the intersection of the physical, biological world and the human world. For this reason, TerraViva! has been used by government agencies, NGOs, and academics for some years now.

We want these capabilities to be available to more people, so with this new release of TerraViva.net we are giving you a set of tools for accessing global data so you can digitally explore and learn more about the world you live in:

· We are providing information about what global data sets are available with our TerraViva! GeoServer

· We are providing viewing tools to understand and analyze the data: a free Windows client and as a service that you can access in your browser.

· We are providing the opportunity to download the data in our viewer format (just click on the “download data” button right below the data set name).

GDV comes standard with an up-to-date political boundary data set (more on that in a future blog entry) and fundamental information about the nations of the world drawn from the CIA World Factbook, World Resources Institute, the Department of Energy and other sources. Today, close to 40 additional data sets are already listed inside Global Data Viewer, and you can download them with a single mouse click. And we will be adding more as we go forward.

Our hope is to make downloading global data as easy as downloading music. We may not be quite all the way there yet, but this is a significant step forward.

If you don’t want to download the free software, or you’re away from your personal computer, you can use the browser-based Interactive MapViewer. And remember that we offer more than just maps–we also have geostatistical data sets with literally hundreds of national-scale variables for all the countries of the world.

The premium version of the software, Global Data Analyst, comes with powerful additional features like spatial query, multivariate regression, masking, variable creator, shape import, map converter, and data download. Global Data Analyst is perfect for a team that is gathering a wide range of data and wants to make it available to its end-users in an “analyst-friendly” way that does not require the analyst to know a GIS mapping system.

This might be a good moment to say a word about “how is this software different?” from tools that may be familiar to some of you, such as ArcGIS and Google Earth. We love ArcGIS and use it all the time. But it is not just a viewer and not just an analysis tool, it is an (incredibly) feature-laden development environment. The learning curve can be steep.

We love Google Earth too. Generalizations are hazardous, but Google Earth is more about place-based data and zooming and hopping around from placemark to placemark, while we are more about studying earth features and supporting quantitative analysis. With Google Earth you can see the color of vegetation; with TerraViva! you can evaluate the intensity of that color…and the dimensions of the neighboring terrain…and the density of nearby population…and the local climate conditions…and the magnitude of night time lights…

Our software was developed in response to a unique set of goals, and the package of features is an interesting one. We think Global Data Viewer is a competent complement to the data viewing and analysis tools that are already widely known; we invite you to use it to get started exploring global data.

→ No CommentsTags: About TerraViva! · About Global Data Hound · About ISciences · Glossary

Gridded Global Population Data Sets

January 28th, 2008 · No Comments

 

For almost any mapping problem that has a human dimension, it is important to know where the people are. In this blog post I will tell you about some of the more useful gridded global population data sets I have encountered. The two most important global gridded population data sets are:·

Both are available in TerraViva! format. You can download them from inside the Global Data Viewer or via the GeoServer (LandScan, GPW population and population density).

Simply knowing that there are two global gridded population data sets available is very important because a lot of people seem to be only aware of one or the other. They are created using different techniques and it can be important to choose between them which one is more appropriate for your task at hand.

LandScan pixels are 30 arc-seconds (or about 1 km wide at the equator). The challenge is that for much of the world, there are no existing census studies that accurately locate the population in terms of a 1 km x 1 km grid. More typically, population data is available for a district or city that might have tens or hundreds of kilometers of area. That raises the question, how do you decide what is the population of a particular 1 km x 1 k pixel? LandScan takes its cues from a 30 arc-second map of global land use. As you can see in this image of Bulgaria, one effect of this allocation strategy is to put people near or on roads.

Bulgaria — ORNL population data

GPW pixels are 2.5 arc-minutes, or 5x bigger than the LandScan pixels. But one advantage of GPW is that the underlying population data is typically derived from smaller administrative units — e.g., districts instead of provinces, sub-districts instead of districts. This table gives some of the gory details: for example, the GPW map of Bulgaria is based on 261 administrative units, while the GPW map of the U.S. is based on 60,884 administrative units. (And in case you were wondering “why Bulgaria?” this is the answer — the administrative boundaries show up clearly in this image).

bulgaria-popden.png

A lot of the time what you really want to know is “where are the cities?” and SEDAC’s Gridded Rural Urban/Rural Mask (GRUMP) does a great job of that.

Bulgaria Gridded Urban/Rural Mask

The night lights product created by NOAA from the Defense Meteorological Satellite Program (DMSP) provides another spin on the question “where are the cities?” (Maybe someday we’ll know how to make cities that aren’t giant light-polluters, but that day is a long way away.)

bulgaria-nightlights.png

In future blog entries I’ll discuss some of the cool things you can do with global population data sets in Global Data Viewer and Global Data Analyst, and then discuss some of the global population data sets that we’d like to have.

→ No CommentsTags: The Anthrosphere

CIESIN’s new Global Data Center

September 29th, 2007 · 1 Comment

Our friends at CIESIN recently released a major new update to their global data set offerings. From the press release:

CIESIN RELEASES NEW WEB PORTAL FOR GLOBAL-SCALE DATA

CIESIN has released a new state-of-the-art Web portal, the World Data Center (WDC) for Human Interactions in the Environment, at

http://sedac.ciesin.columbia.edu/wdc/.

CIESIN’s WDC provides leading-edge search and data visualization tools and easy access to global-scale data and associated information on key themes related to human-environment interactions, including population, climate, conservation, poverty, hazards, health, and sustainability.

Features of the site include:

  • A new mapping tool that lets users customize, save, and share maps based on SEDAC- and distributed data sets for the user’s region and theme of interest
  • The ability to search for global data sets on specific topics using the GeoNetwork-based distributed catalog search tool developed by FAO (Food and Agriculture Organization of the United Nations)
  • An extensive map gallery of images, easy to download for use in documents and presentations
  • Thematic portals that include updated news and information on recently-released data sets, data sources, and articles or reports based on important data

CIESIN’s WDC is one of 51 data centers of the The World Data Center system of the International Council for Science (ICSU).The CIESIN WDC, established in 1995, was the first WDC to focus on data at the intersection of the natural and social sciences.

This is important and exciting news for a variety of reasons, and I will have more to say about it later. For now let me observe that it is well worth checking out … CIESIN has been a pioneer in developing and providing global data sets. We work with their stuff all the time.

I’ve been wanting to start including some images in this blog, so as an “appeteaser” let me start with this one from the CIESIN WDC data set on Human Appropriation of Net Primary Productivity:

HANPP/NPP

This shows human appropriation of net primary productivity as a percentage of local primary productivity, so you can think of it as a measure of the stress that humans are putting on the local ecological resources (or, if you will, as a measure of the efficiency with which humans have turned the local ecology to our purposes). If you read the fine print, there are some humonguous caveats, which is typical for this genre of conceptually innovative global map:

The method assumes a homogenous per capita consumption rate within each country, which although obviously incorrect, represents a starting point. The authors note that terrestrial HANPP does not directly capture other forms of environmental impact, such as freshwater abstraction, use of fossil fuels, pollutant emissions, and appropriation of NPP from freshwater and marine systems. Finally, unlike earlier studies, the authors did not include the components of NPP that are lost due to land transformations (e.g. shifting cultivation and land clearing for development).

The point is not that the HANPP map is exactly “right” as a literal representation of reality, but rather that it is directionally illuminating and thought-provoking. That’s often the case for any global map that does more than count known entities. Cartography has a strong element of art.

→ 1 CommentTags: New Global Data Set List · The Anthrosphere

Pixel Sizes and Map Scales for Global Gridded Data Sets

September 28th, 2007 · No Comments

One of the most challenging issues for rookies entering the arena of global gridded data sets is that grid cell size and spatial resolution is expressed in a variety of confusing ways. Sometimes grid cell size is described using the lat/long system and uses the degree, the arc-minute, and the arc-second. Other times, grid cell size is expressed in meters. Unless you are a whiz at mental arithmetic, it can be very difficult to understand the relationships. Compounding the issue is that cell sizes vary depending on the projection and the latitude (usually, cells are bigger at the equator).

I could, and probably should, say much more, but the reason for this particular post is to call to your attention a very handy spreadsheet that was put together by my officemate Steve Metzler here at ISciences and that ISciences has now made available using Google Apps. The spreadsheet translates the most commonly used arc-degree, minute, second grid cell sizes into equatorial length in meters and calculates the resulting number of rows, columns, and pixels in a geographic projection. Click here for the pixel size spreadsheet; you’ll need to supply a Google id, or sign up for one.

A few key take-aways for future reference:

  • a one-degree grid cell at the equator is about 111 kilometers;
  • a thirty arc-second global map has cells almost 1000 meters wide, and includes almost a billion pixels; and
  • a global map at six inches scale (the best that’s usually available in Google Earth, from commercial air photography) would have more than 33 quadrillion pixels.

→ No CommentsTags: Glossary

Some Other Global Data Ontologies

September 13th, 2007 · No Comments

It’s always a good idea to consider whether an existing ontology will meet your needs. The proliferation of ontologies simply makes things more complex for those who are trying to work across multiple information systems.

As a point of reference, there are, of course, some other global data ontologies already in existence.

For example, NASA’s Global Change Master Directory is organized as appended below. What’s wrong with that? Nothing, except that the GCMD was organized for a different purpose than ours: to manage the primarily scientific data flowing from the U.S. government’s Global Change Research Programme. ISciences and its customers are closer to the “pointy end” of the stick that connects science and policy.

Another global data ontology that many of you will be familiar with is the list of layers in Google Earth, which you can view in the “layers” panel in the Google Earth software. This is a great example of an ad hoc ontology that has grown a bit out of control — there is a hodgepodge of categories with many uncategorized stragglers and “only children” The GE list of layers doesn’t look anything like it would if Google had embarked on a totally rational, top-down planned approach to developing its product. The upside, of course, is that they’ve grown the product rapidly by being opportunistic and not letting their own ontology get in their own way.

The GCMD ontology

   
  Agriculture thumbnail Agriculture
forest science, soils …
  Atmosphere thumbnail Atmosphere
precipitation, air quality …
  Biosphere thumbnail Biosphere
ecosystems, vegetation…
  Biological Classification thumbnail Biological Classification
animals/invertebrates, plants…
  Climate Indicators thumbnail Climate Indicators
air temperature, drought …
  Snow and Ice thumbnail Cryosphere
frozen ground, sea ice …
  Human Dimensions thumbnail Human Dimensions
land use, population …
       

   
Land  thumbnail Land Surface
erosion, topography …
 
Oceans thumbnail Oceans
ocean temperature , salinity …
 
Paleoclimate thumbnail, photo credit USDA Pollen Lab Paleoclimate
ice cores, land records …
 
Solid Earth thumbnail Solid Earth
geochemistry, seismology …
 
     
Sun-Earth  Interactions thumbnail Sun-Earth Interactions
auroras, solar activity …
 
Terrestrial Hydrosphere thumbnail Terrestrial Hydrosphere
ground water, water quality …

→ No CommentsTags: New Global Data Set List

First Crack at our New Global Data Ontology

September 13th, 2007 · No Comments

The vision of an Atlas of the Living Earth inspired this first crack at our new global ontology: a world organized by scientific “spheres” of interest.

  • The Lithosphere
  • The Atmosphere
  • The Hydrosphere
  • The Biosphere
  • The Anthrosphere
  • The Cryosphere

After some discussion, the next version looked like this.

  • The Exosphere
  • The Lithosphere Geosphere
  • The Atmosphere and Climate
  • The Hydrosphere
  • The Biosphere
  • The Anthrosphere
  • The Cryosphere

I have looked around, but so far have not found a canonical scientific statement listing all the recognized scientific “spheres”.  I have a feeling that there will not be such a thing, since it would be (by definition) cross-disciplinary, and science is organized bottom-up by practitioners in disciplines: there’s no authority to establish a canonical list, only voluntary interactions among trade guilds.

Exosphere was added because it’s important to remember that life on Earth takes place in the context of astronomy: it is now generally agreed that meteor impacts have been responsible for several mass extinctions, including the dinosaurs, and the source and sine qua non of all life on Earth is, of course, the energy radiated from the (fortunately rather stable) Sun.

Lithosphere was axed because an outside reviewer commented that lithosphere is usually restricted to the area underneath the Earth’s crust. Geosphere more accurately conveys the area of our concern: the skin of the orange and all its wrinkles, folds, and cracks!

“and Climate” was added to Atmosphere because we realized that our “spheres” construct did not provide a logical place for important attributes of the living Earth such as surface temperature. This is a good example of “ontologies in action”: here we are in version 0.0.2, and we are already allowing a deviation from the standard rule for category creation! We’re willing to do that because we know from years of experience in constructing data ontologies that no ontology is perfect. There are always exceptions and stragglers, and the point is whether the ontology effectively serves our needs and our customers’.

We noted the possible relevance of Magnetosphere, but at the moment we are not sure we will need to add it to our categories. As far as I can think of, the only time the physical contours of the magnetosphere are relevant in daily life is in the task of navigation, and we are not a navigation company!

→ No CommentsTags: New Global Data Set List

The Vision Motivating Our Global Data Ontology

September 13th, 2007 · 1 Comment

We kicked around a few ways to encapsulate our vision, and the one that’s stuck best (with me, at least) is the idea that TerraViva! should be something like a digital Atlas of the Living Earth. I like “Atlas” because it has a lot of on-target connotations. Atlases are usually

  • large
  • data-rich
  • useful (as reference)
  • illuminating (for insight)
  • accessible to both laypeople and specialists
  • helpful, if not indispensable, to analysts
  • scientifically sound
  • aesthetically pleasing
  • valuable

Those are all attributes of our data collection today, and those are all attributes that we intend to preserve and heighten.

One additional point that I like is that the term “atlas” conveys the right order of numerosity. There are, usually, tens or hundreds of plates in an atlas. We are talking about approximately the same number of data sets for TerraViva!

“Living Earth” is good for several reasons. First, that’s the literal meaning of TerraViva! More importantly, “Living Earth” accurately conveys our area of interest. We focus on data sets that fall under both headings of”Living” and “Earth” — the intersection of the social, environmental, and political with the physical world.

When we talk about “Living”, we’re not talking about just plants and animals — we’re also talking about humans — and when we talk about humans, we’re not just talking about them in isolation from the natural world. We’re talking about human society in its full richness and complexity and in its full environmental context in the natural world. The projects we’ve done for our federal, corporate, and NGO customers reflect our ability to handle projects where environmental and security concerns are intermingled.

“Earth” is a good term because we have a special interest here in global data sets.

So, with “Atlas of the Living Earth” as our touchstone, what should our global data ontology look like?

→ 1 CommentTags: New Global Data Set List

We Need a Global Data Ontology

September 13th, 2007 · 1 Comment

The first stop on our quest for a new list of global data sets for TerraViva! is a global data ontology.

An ontology, according to Wikipedia,

seeks to describe or posit the basic categories and relationships of being or existence to define entities and types of entities within its framework.

A more scholastically reputable definition can be found at the website of Tom Gruber of the Stanford University Knowledge Systems Lab.

It is, I think, fair to say that twenty years ago, the term “ontology” was obscure and of interest only to philosophers and librarians. It’s rather amazing that today ontologies are an integral part of the new economy, and that ontological features like tag-based folksonomies are recognized by leading commentators like Dilbert.

As Gruber comments, the key thing about ontologies is what are they for.

In this case, we need an ontology to help us figure out which global data sets we want in our product.

It may be obvious, and it is definitely true, that there is no such thing as the “perfect” or “correct” ontology, at least in a situation like this. The ontology is good if it serves our needs. Other ontologies might do just as well, and sometimes it can be useful to deploy more than one ontology when attacking a problem. (The cross-tabulated ontology is a standard tool in management consulting–see Stephen Covey’s well-known crosstabulation of daily tasks into important v. urgent, or Gartner’s notorious Magic Quadrant).

In this case, we started out with several different ontologies in the mix.

First, we created a simple spreadsheet that classified data sets into ones that we already have on hand, ones that we know about, and ones that we want.

The data sets that we have in TerraViva today are already organized into a simple set of categories (and yes, that’s all that an ontology is).

  • Climatic
  • Image
  • Landcover
  • People
  • Plants and Animals
  • Topographic

The problem with this ontology is not that it’s inaccurate–it accurately describes s the current roster of standard data sets –it’s that it’s ad hoc and incomplete. We want the new ontology to be more complete and more “planful” — a better response to the set of issues that we want our product to address. To get there, we need a vision that motivates our ontology.

→ 1 CommentTags: New Global Data Set List · Glossary