Fork me on GitHub

schema.to

Semantic markup is cool again

Meet schemato, the unified validator for the next generation of metadata.

"Semantic web?"

If you've used the internet in the last few years, chances are you've been the beneficiary of embedded semantic markup in some way. Sites and applications like Parse.ly Dash rely heavily on semantic technologies to easily parse the important metadata from websites that implement them. This data describes the document it's a part of - it's "data about data".

What metadata looks like

Until recently, the internet was set up primarily for consumption by humans - it's easy to look at a web news article and point to, for example, the author's name. However, that's much harder for a computer. Embdedded metadata aims to make it easy for machines to parse the important information from websites using standardized labels, like
<div rel="rnews:creator">Emmett</div>

Metadata protocols

As with any web standard, there is more than one. rNews is an RDFa-based standard that was recently developed to provide a comprehensive metadata model to online publishers, and is implemented by the New York Times. Schema.org defines a broad set of metadata fields for various use cases in HTML5 Microdata. Parse.ly also uses a proprietary JSON metadata standard, with which publishers can ease integration with Dash.


schemato provides comprehensive validation of the rNews, Schema.org, Facebook openGraph, and parsely-page metadata standards. It's also extensible, so you can write your own validator module. Paste a URL below to see how any site's implementation stacks up.

This validator reads the latest official versions of the metadata standards it tracks (usually in the Turtle RDF format) and compares the validated document's metadata content against these standards. The standards define a number of classes, each of which is permitted to contain certain data members. This validator performs cross-checks to ensure that any data members present are valid, unduplicated, and in the right places.
schema.to is written and maintained by Emmett Butler and powered by W3C's semantic web tools for Python, written by Ivan Herman - specifically pyMicrodata and pyRdfa. If you'd like to contribute to the project, fork it on github, email hello@parsely.com.

It's easy to extend schemato to support additional standards. Take a look at schemato's minimal amount of rnews-specific code:
from validator import RdfValidator
from schemadef import RdfSchemaDef

class RNewsValidator(RdfValidator):
    def __init__(self, graph, doc_lines, url=""):
        super(RNewsValidator, self).__init__(graph, doc_lines, url=url)
        self.schema_def = RNewsSchemaDef()
        self.allowed_namespaces = ["http://iptc.org/std/rNews/2011-10-07#"]

class RNewsSchemaDef(RdfSchemaDef):
    def __init__(self):
        super(RNewsSchemaDef, self).__init__()
        self._ontology_file = "http://dev.iptc.org/files/rNews/rnews_1.0_draft3_rdfxml.owl"
        self._representation = "rnews_schemadef"
        self.parse_ontology()