Semantic web tools you can use now
Tap into the sea of machine-readable data available
By Elisabeth Horwitt | Computerworld US | Published: 14:40, 24 March 2011
Vince Fioramonti had an epiphany back in 2001. He realised that valuable investment information was becoming increasingly available on the web, and that a growing number of vendors were offering software to capture and interpret that information in terms of its importance and relevance.
"I already had a team of analysts reading and trying to digest financial news on companies," says Fioramonti, a partner and senior international portfolio analyst at investment firm Alpha Equity Management. But the process was too slow and results tended to be subjective and inconsistent.
The following year, Fioramonti licensed Autonomy Corp's semantic platform, Intelligent Data Operating Layer (IDOL), to process various forms of digital information automatically. Deployment ran into a snag, however: IDOL provided only general semantic algorithms. Alpha Equity would have had to assign a team of programmers and financial analysts to develop finance-specific algorithms and metadata, Fioramonti says. Management scrapped the project because it was too expensive.
The breakthrough for Alpha Equity came in 2008, when the firm signed up for Thomson Reuters' Machine Readable News. The service collects and analyses online news from 3,000 Reuters reporters, and from third party sources such as online newspapers and blogs. It then analyses and scores the material for sentiment (how the public feels about a company or product), relevance and novelty.
The results are streamed to customers, who include public relations and marketing professionals, stock traders performing automated black box trading and portfolio managers who aggregate and incorporate such data into longer term investment decisions.
A monthly subscription to the service isn't cheap, Fioramonti says. According to one estimate, which Thomson Reuters would not comment on, the cost of real-time data updates is between $15,000 and $50,000 per month. But Fioramonti says the service's value more than justifies the price Alpha Equity pays for it. He says the information has helped boost the performance of the firm's portfolio and it has enabled Alpha Equity to get a jump on competitors. "Thomson Reuters gives us the news and the analysis, so we can continue to grow as a quantitative practitioner," he says.
Alpha Equity's experience is hardly unique. Whether a business decides to build in-house or hire a service provider, it often pays a hefty price to fully exploit semantic web technology. This is particularly true if the information being searched and analysed contains jargon, concepts and acronyms that are specific to a particular business domain.
Here's an overview of what's available to help businesses deploy and exploit semantic web infrastructures, along with a look at what's still needed for the technology to achieve its potential.
The key standards
At the core of Tim Berners-Lee's as-yet-unrealised vision of a semantic web is federated search. This would enable a search engine, automated agent or application to query hundreds or thousands of information sources on the web, discover and semantically analyse relevant content and retrieve exactly the product, answer or information the user was seeking.
Although federated search is catching on, most notably in Windows 7 which supports it as a feature, it's a long way from a webwide phenomenon.
To help federated search gain traction, the World Wide Web Consortium (W3C) has developed several key standards that define a basic semantic infrastructure. They include the following:
- Simple Protocol and RDF Query Language (SPARQL), which defines a standard language for querying and accessing data.
- Resource Description Framework (RDF) and RDF Schema (RDFS), which describe how information is represented and structured in a semantic ontology (also called a vocabulary).
- Web Ontology Language (or OWL), which provides a richer description of the ontology and also includes some RDFS elements.
The final versions of these standards are supported by leading semantic web platform vendors such as Cambridge Semantics, Expert System, Revelytix, Endeca, Lexalytics, Autonomy and Topquadrant. Major web search engines, including Google, Yahoo and Microsoft Bing, are starting to use semantic metadata to prioritise searches and to support W3C standards like RDF.
And enterprise software vendors like Oracle, SAS Institute and IBM are jumping on board, too. Their offerings include Oracle Database 11g Semantic Technologies, SAS Ontology Management and IBM's InfoSphere BigInsights.
Semantic software uses a variety of techniques to analyze and describe the meaning of data objects and their inter-relationships. These include a dictionary of generic and often industry-specific definitions of terms, as well as analysis of grammar and context to resolve language ambiguities such as words with multiple meanings.
The purpose of resolving language ambiguities is to help ensure, for example, that a shopper who does a search using a phrase like "used red cars" will also get results from Web sites that use slightly different terms with similar meanings, such as "pre-owned" instead of "used" and "automobile" instead of "car."
For more information about semantic technologies, including search, see Part 1 of this story, "The semantic web gets down to business." It explores the technology's potential uses and paybacks, illustrated with real business cases, including ones involving the use of sentiment analysis. It also provides some best practices and tips from the trenches for anyone planning, or at least considering, a deployment.
W3C standards are designed to resolve inconsistencies in the way various organisations organise, describe, present and structure information, and thereby pave the way for cross-domain semantic querying and federated search.
To illustrate the advantage of using such standards, Michael Lang, CEO of Revelytix, a maker of ontology-management tools, offers the following scenario: If 200 online consumer electronics retailers used semantic web standards such as RDF to develop ontologies that describe their product catalogs, Revelytix's software could make that information accessible via a SPARQL query point. Then, says Lang, online shoppers could use W3C-compliant browser tools to search for products across those sites, using queries such as: "Show all flat screen TVs that are 42-52 inches, and rank the results by price."
Search engines and some third party web shopping sites offer product comparisons, but those comparisons tend to be limited in terms of the range of attributes covered by a given search. Moreover, shoppers will often find that the data provided by third party shopping sources is out of date or otherwise incorrect or misleading, it may not for example, have accurate information about the availability of a particular size or colour. Standards-based querying across the merchants' own websites would enable shoppers to compare richer, more up to date information provided by the merchants themselves.
The W3C SPARQL Working Group is currently developing a SPARQL Service Description designed to standardise how SPARQL "endpoints," or information sources, present their data, with specific standards for how they describe the types and amount of data they have, says Lee Feigenbaum, vice president of technology at Cambridge Semantics and co-chair of the W3C SPARQL Working Group.