A keen attention to structure resides at the core of Structured Dynamics' competitive advantage. Knowing and recognizing structure leads to better ways to organize and learn from information, as well as to reason over it and use it computationally. Structure resides at multiple levels, since all structure is grounded in what can be represented as lists, graphs or fractals.
Structured Dynamics uses the RDF (Resources Description Framework) data model to represent all canonical information. RDF or the OWL ontology language is used for all reference structures, with domain structures capturing the domain at hand. These represent the target foundations for mapping schema and transforming data in the wild into an operable, canonical form. Any structure, even the most lightweight lists and metadata, can contribute to and be mapped into this model.
Virtually any form of data in the wild, including databases converted in various manners, can be represented in RDF or key-attribute pairs (lists) and related to other datasets as this example wall of structure shows:
Unstructured text and various structures within knowledge bases can be related to conventional structured data by recognizing these kinds of information within them:
Semantic standards, informed by reference knowledge bases organized into knowledge graphs, provide the logical, reasoning and disambiguation basis for appropriately matching information sources to one another.
Described below are some of these structures, in rough descending order of completeness and usefulness, for making data interoperable and reasoning over it. Please note that, if desired, we can process and make any of these structures available as linked data.
In both semantics and artificial intelligence — and certainly in the realm of data interoperability — there is always the problem of symbol grounding. In the conceptual realm, symbol grounding means that when we use a term or phrase we are referring to the same thing. In the data value realm, symbol grounding means that when we refer to an object or a number, we are referring to the same metric.
UMBEL is the standard reference ontology used by Structured Dynamics. It contains 35,000 concepts (classes and relationships) derived from the Cyc knowledge base. The reference concepts of UMBEL are mapped to Wikipedia, schema.org (used in Google's knowledge graph), DBpedia ontology classes, GeoNames and PROTON. Similar reference structures are used to ground the actual data values and attributes.
Other reference structures may be used, so long as they are rather complete in scope and coherent in their relationships. Logical consistency is a key requirement for grounding.
Knowledge bases combine schema with data in a logical manner; well-constructed ones support computations, inference and reasoning. The primary knowledge bases that we use are Wikipedia, Wikidata, DBpedia, UMBEL and Cyc. However, many specific domain knowledge bases also exist and can be mapped to this structure..
Knowledge bases are important sources for symbol grounding. It addition, because of their computability, they may be used with artificial intelligence methods to both extend the knowledge base and to refine the feature estimates used in the AI algorithms.
Domain ontologies, constructed as graphs, are the principal working structures in data interoperability. Though they are grounded in the reference structures, the domain structures are the ones that specifically capture the concepts and data attributes of the target information domain. More effort is focused at this level in the wall of structure than any other.
Domain structures provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness. Further, these domain structures can be layered on top of existing information assets, which means they are an enhancement and not a displacement for prior investments. And, these domain structures may be matured incrementally, which means their development is cost-effective.
Data and schema in the wild need to be mapped and transformed into these canonical structures. What is known as data wrangling is an aspect of these mappings and transformations. Mappings thus become the glue that ties native data to interoperable forms.
Mapping is the critical bridging function in data interoperability. It requires tools and background intelligence to suggest possible correspondences; how well this is done is a key to making the semi-automatic mapping process as efficient as possible. Mapping structures are the result of the final correspondences. Mapping effort is a function of the scope and diversity of the structures involved, not the volume of instance data.
A broad variety of structures occur in the wild — from database schema and taxonomies to dictionaries and lists — that need to be represented in a common form and then mapped in order to support interoperability. The common representation used by Structured Dynamics is the RDF data model.
Scripting and tooling are essential to help process all of these structures efficiently and to test for effectiveness and errors. Overall measures of effectiveness like precision and recall are essential when training machine learners.