NGDS exchange methods and metadata

Metadata

All NGDS data records are organized and accessed using metadata. Metadata describes NGDS resources to support finding resources, evaluating them for a user's intended purpose, and obtaining the data. Each metadata record describes a resource that is considered an individual product with an author, editor or compiler, created with some particular purpose. The metadata record may describe a variety of 'distributions' or procedures for accessing the resource, such filling out an order form, clicking on a link, or linking to a web service with an appropriate client.

After hosting data using their own computing resources, NGDS data contributors submit one or more metadata records describing their data to an NGDS node. Metadata from NGDS nodes is harvested by the NGDS portal, which provides a single entry point to search the entire NGDS catalog (the catalog contains metadata describing information resources distributed across the various nodes in the network). Metadata records provide information necessary to obtain data through various distributions (file formats, web service protocols), allowing you to select the most appropriate way to access the software you are using. Metadata records in this catalog can be used to locate and access the data resources they describe.

Metadata records published by NGDS nodes are harvested by the central portal, which provides a web site for searching across all NGDS content, enabling users to discover NGDS data resources and learn how to access them. For more information about metadata, see the USGIN Metadata Tutorial.

Once the desired NGDS data resource is located using metadata that appears in the NGDS catalog, a user has several options for accessing that resource. Access options depend on the structure of the data resource. Typically, highly structured resources offer broader options for access and automated analysis.

Data Access Options

NGDS is uses the World Wide Web as a platform for accessing and exchanging data. Typical web-based data exchange methods are as follows:

  • File-based approaches: Data is accessed in discrete digital packages called files. A file is located at specific URL (Web location). The content of a file can be structured to enable machine processing, but the file is only accessible as a unit.
  • Web application-based approaches: Data is made available to users via Web-browser-based software. These applications typically provide a human user with functions to explore data with various kinds of visualizations, and in some cases to select particular data and download in files. These applications are typically tightly coupled to specific data sources.
  • Service-based approaches: Data is available through web services, which provide interfaces for machine-to-machine interaction for finding, selecting and accessing data. Web services require more sophisticated client and server software stacks and rigid quality control, but allow a greater level of automation in the data access and utilization process, fully taking advantage of interoperable interchange protocols and formats.

What are Web Services?

A web service is a way to communicate data over the internet. Using web services allows NGDS data to be used in a variety of free-and-open-source applications, making it more freely available and easier to use.

NGDS purports to use web services with well-documented community specifications for service protocols and data interchange formats. Though web services require more sophisticated client and server software stacks and rigid quality control, web services have a number of advantages that meet the requirements of NGDS:

  • Web services are web-accessible
  • Data available as a web service can be consumed by web applications
  • Web services are read-only and preserve data ownership
  • A number of web service protocols are free-and-open-source, meaning that web services using these protocols can be hosted and accessed without proprietary software

Most NGDS web services provide functions to get data or metadata. The NGDS Node-in-a-Box (NIAB) software stack supports the creation and publication of Open Geospatial Consortium (OGC) compliant web services from data assembled in Excel spreadsheets.

Web Service Protocols in use with NGDS

  • WMS: OGC Web Map Service (WMS). Provides raster images of georeferenced features for display in map views.
  • WFS: OGC Web Feature Service (WFS). Provides structured data for georeferenced features including descriptive attributes.
  • ESRI Map Service: Provides georeferenced rasters and features described by attributes. Requires ESRI ArcGIS server software.
  • CSW Applications: OGC Catalog Service for the Web (CSW). Helps applications access metadata records used to locate NGDS information resources.
  • OPeNDAP: Provides raster datasets as numeric array data.

As implemented in NGDS, these services provide the user read-only access to data published through the services. Transactional implementations allowing users to change content of the services are possible, but these are outside the scope of current NGDS specifications.

Data Interoperability

NGDS facilitates access to data conforming to the three aforementioned exchange methods through publication using interoperable protocols and formats. This enables greater automation of information exchange, processing, and analysis, allowing human users to focus on understanding and learning from the data.

Data is categorized in one of three interoperability tiers; the higher the tier, the greater the level of interoperability. These tiers roughly conform to whether data can be accessed through a web service or must be downloaded or viewed as a file.

Tier 1: Unstructured data (text, images, or sound)

The simplest and most common access to resources is provided by simple Web links that result in a file download. Information contained in files can be accessed by users who have software that can recognize and open these files. This is the standard model for files accessible on the web, supported by HTTP servers and desktop web browser software.

Unstructured data requires user interpretation before it can be used for analysis. Users can utilize the information if they can understand the encoding and language, but the system provides no support for this understanding, and little or no automation is possible. Audio files must be transcribed; text files must be parsed and mined for data that is then broken down and structured in ways that can be processed by computers; images must be scanned, interpreted, and often georeferenced. Preparing Tier 1 data for analysis can be a painstaking and time-consuming process.

Tier 2: Structured data that does not conform to an NGDS schema

Tier 2 interoperability indicates that information content is structured (consistently organized) in a spreadsheet or database file such that it is amenable to computer processing; that said, Tier 2 data does not use a shared, documented interchange format. Data in this tier must be transformed by the data consumer on a case-by-case basis for integration with other datasets, requiring them to study each new data source to figure out how to extract the information they need. Obtaining data in a structured format is a step towards interoperability because once the format is understood, computer programs can be instructed to extract the desired information.

Tier 3: Structured data that conforms to an NGDS schema

Tier 3 data is structured data that conforms to an NGDS information exchange. Data that is published according to the exchange specification (content model, interchange format, service protocol) is interoperable with any other data published using that exchange. This is referred to as Tier 3 interoperability.

Creating Interoperable Data

Tier 3 data structured using a content model is among the most highly interoperable data found on the web. Tier 3 data can be accessed and analyzed automatically through a number of web services. NGDS uses content models based on USGIN-adopted standards and protocols to structure data to the Tier 3 format.

USGIN: Powering NGDS

The goal of USGIN is to facilitate the development of interoperable geoscience data-sharing networks that improve public access to geoscience information and aid scientific queries. For more information visit the USGIN website.

USGIN proposes:

  • Protocols and standards.
  • Profiles for the utilization of USGIN protocols and standards.
  • Development tools to build distributed data-sharing networks using open-source software and existing World Wide Web infrastructure.

USGIN Architecture

To promote interoperability, USGIN adopts existing standards, protocols, and formats whenever possible. These include:

  • Standards for metadata: ISO 19115, ISO 19119, and ISO 19139
  • Protocols for data exchange:
    • Open Geospatial Consortium (OGC), Web Map Service (WMS), Web Feature Service (WFS), and Catalog Service for the Web (CSW)
    • ESRI Map Service
    • OPeNDAP
  • GeoSciML-Portrayal schemas for geologic map data.

NGDS is based on protocols and standards adopted by USGIN. All NGDS content models are structured according to USGIN standards.