scones Tagger
Structured Dynamics' scones (Subject Concept Or Named
EntitieS) tagger provides information extraction of
domain-specific subject concepts and entities from
unstructured text. It also provides disambiguation of this
information based on the context of the source information.
The scones system
uses a combination of heuristics, statistical methods and
machine-learning algorithms to separately identify subject
concepts and named entities within the target text. Then,
using existing domain ontologies and entity dictionaries, the
system further identifies and weights candidate extractions.
Uniquely, the system also triangulates the extractions
between concepts and entities to further aid the
disambiguation task (identifying the correct entities or
concepts).
The tagged information can be extracted and used in any of
the formats supported by the structWSF Web services
framework, including XML, CSV, various RDF serializations and
JSON. As an option, if Web pages are the source, scones can also reinject the
tagged information back into the Web page as RDFa.
Source content can be submitted as individual snippets,
cut-and-pasted content, or entire documents or Web pages.
Optionally, scones
can be integrated into a semi-automated workflow that also
enables users or subject matter experts to make final tag
determinations before writing to file.
In its standard baseline configuration, scones uses as references the
UMBEL subject concepts
ontology and entities from Wikipedia. In production use,
these references are best supplemented with domain-specific
ontologies for concepts and specific entity dictionaries
relevant to the enterprise.
The scones system
also includes methods for creating the specific entity
dictionaries that are a valuable complement to the
methodology.
