2 results
4 - Data Management Architectures
-
- By Terence Critchlow, Pacific Northwest National Laboratory, Ghaleb Abdulla, Lawrence Livermore National Laboratory, Jacek Becla, Stanford University, Kerstin Kleese-Van Dam, Pacific Northwest National Laboratory, Sam Lang, Pacific Northwest National Laboratory, Deborah L. McGuinness, Rensselaer Polytechnic Institute
- Edited by Ian Gorton, Deborah K. Gracio
-
- Book:
- Data-Intensive Computing
- Published online:
- 05 December 2012
- Print publication:
- 29 October 2012, pp 48-84
-
- Chapter
- Export citation
-
Summary
Data management is the organization of information to support efficient access and analysis. For data-intensive computing applications, the speed at which relevant data can be accessed is a limiting factor in terms of the size and complexity of computation that can be performed. Data access speed is impacted by the size of the relevant subset of the data, the complexity of the query used to define it, and the layout of the data relative to the query. As the underlying data sets become increasingly complex, the questions asked of it become more involved as well. For example, geospatial data associated with a city is no longer limited to the map data representing its streets, but now also includes layers identifying utility lines, key points, locations, and types of businesseswithin the city limits, tax information for each land parcel, satellite imagery, and possibly even street-level views. As a result, queries have gone from simple questions, such as, “How long is Main Street?,” to much more complex questions such as, “Taking all other factors into consideration, are the property values of houses near parks higher than those under power lines, and if so, by what percentage?” Answering these questions requires a coherent infrastructure, integrating the relevant data into a format optimized for the questions being asked.
Data management is critical to supporting analysis because, for large data sets, reading the entire collection is simply not feasible. Instead, the relevant subset of the data must be efficiently described, identified, and retrieved. As a result, the data management approach taken effectively defines the analysis that can be efficiently performed over the data.
Standards-based data interoperability in the climate sciences
- Andrew Woolf, Ray Cramer, Marta Gutierrez, Kerstin Kleese van Dam, Siva Kondapalli, Susan Latham, Bryan Lawrence, Roy Lowry, Kevin O'Neill
-
- Journal:
- Meteorological Applications / Volume 12 / Issue 1 / March 2005
- Published online by Cambridge University Press:
- 12 April 2005, pp. 9-22
- Print publication:
- March 2005
-
- Article
- Export citation
-
Emerging developments in geographic information systems and distributed computing offer a roadmap towards an unprecedented spatial data infrastructure in the climate sciences. Key to this are the standards developments for digital geographic information being led by the International Organisation for Standardisation (ISO) technical committee on geographic information/geomatics (TC211) and the Open Geospatial Consortium (OGC). These, coupled with the evolution of standardised web services for applications on the internet by the World Wide Web Consortium (W3C), mean that opportunities for both new applications and increased interoperability exist. These are exemplified by the ability to construct ISO-compliant data models that expose legacy data sources through OGC web services. This paper concentrates on the applicability of these standards to climate data by introducing some examples and outlining the challenges ahead. An abstract data model is developed, based on ISO standards, and applied to a range of climate data–both observational and modelled. An OGC Web Map Server interface is constructed for numerical weather prediction (NWP) data stored in legacy data files. A W3C web service for remotely accessing gridded climate data is illustrated. Challenges identified include the following: first, both the ISO and OGC specifications require extensions to support climate data. Secondly, OGC services need to fully comply with W3C web services, and support complex access control. Finally, to achieve real interoperability, broadly accepted community-based semantic data models are required across the range of climate data types. These challenges are being actively pursued, and broad data interoperability for the climate sciences appears within reach.