Skip to content

Portal Ingest Using RDF

Sean Kelly edited this page Apr 16, 2019 · 2 revisions

The Informatics Center maintains the public portal for the Early Detection Research Network (EDRN). The Plone-5-based portal creates pages within its database for later search and retrieval by ingesting information using the Resource Description Format (RDF).

This document describes how the RDF ingest works and how to extend and maintain it.

Theory of Operation

The portal is based on the Plone content management system (CMS). Like other CMS's, Plone lets you define custom content types. These content types let you describe the morphology of objects you want to search, edit, render, etc., within the portal. Such content types might include

  • Publication (with a title, abstract, author list, journal, year of publication)
  • Site (with a name, mailing address, principal investigator)
  • Person (belonging to a site, with a phone number and email address)
  • Biomarker (with a name, aliases, indicated diseases, affected organ, related publications, and so forth)
  • Etc.

However, due to an aversion within the Informatics Center to letting Plone's own validation, edit forms, and database from automatically letting us curate the above information, we have chosen to write our own applications to do so, and then publish that information in RDF for display within each portal.

The RDF ingest mechanism attempts to be minimize the amount of code needed within each CMS by enabling you to write the content type and then tag its fields with RDF predicates that tell what parts of RDF describe what fields in each content type. In addition, to help prevent two more more ingests from happening at the same time, the ingest framework writes a timestamp into the Plone Registry telling when the last ingest was started. If this value is nil, then no ingest is deemed to be currently running.

Tagging Content Types

In order to make a new content type ingestible by RDF, do the following:

  1. Define the content type using Plone's Dexterity content type framework as usual, but derived from IKnowledgeObject instead of model.Schema.
  2. Add a tagged value "typeURL" that gives the URI to the RDF content type. This lets the RDF ingest framework match a set of RDF statements to a specific type.
  3. Add a tagged value "fti" that tells the name of the Factory Type Information for the new type. This lets the RDF ingest framework construct an object of type matching the type URL.
  4. Add a tagged value "predicateMap" that maps RDF predicate URIs to pairs of (fieldName, boolean-if-reference). The fieldName should match the name of a field in the content type. The boolean-if-reference should be true if the field is a reference to another object (and therefore the RDF predicate's object is a URI to the matching object) or false if the field contains a literal value (and therefore the RDF predicate's object is a literal value).

Lastly, create a container content type to hold objects of the new content type that derives from IIngestableFolder and create a subclass of Ingestor whose context is the container class you just created that implements the getContainedObjectInterface that returns the class of the new ingestible content type.

Example

In this example, we define a sample content type for a Person and the matching PersonFolder. First, the Person and its tagged values:

class IPerson(IKnowledgeObject):  
    title = schema.TextLine(title=u'Name', required=True)  
    description = schema.Text(title=u'Description')  
    email = schema.TextLine(title=u'Email Address')  
    phone = schema.TextLine(title=u'Phone Number')  
IPerson.setTaggedValue('fti', 'jpl.example.person')  
IPerson.setTaggedValue('typeURI', u'urn:jpl:example:Person')  
IPerson.setTaggedValue('predicateMap', {  
    u'http://xmlns.com/foaf/0.1/mbox': ('email', False),  
    u'http://xmlns.com/foaf/0.1/phone': ('phone', False)  
}) 

Note that if any of the fields referenced other IKnowledgeObject instances, we would have to use "True" instead of "False" for those fields in the prdicateMap tagged value.

Now for the PersonFolder:

class IPersonFolder(IIngestableFolder):  
    pass  
class PersonIngestor(Ingestor):  
    grok.context(IPersonFolder)  
    def getContainedObjectInterface(self):  
        return IPerson

That's it. Note that for custom ingest cases, it's possible for the ingest class (PersonIngestor in this example) to override or augment methods inherited from Ingestor. For example, to set the title field to a person's name (assuming that the FOAF surname and given name predicates are defined), we might add this method to PersonIngestor:

def getTitles(self, predicates):  
    first = last = None  
    lasts = predicates.get(URIRef(FOAF_SURNAME))  
    firsts = predicates.get(URIRef(FOAF_GIVENNAME))  
    if lasts and lasts[0]:  
        last = unicode(lasts[0])  
    if firsts and firsts[0]:  
        first = unicode(firsts[0])  
    if first and last:  
        return [u'{} {}'.format(last, first)]  
    name = [i for i in (last, first) if i]  
    if name:  
        return name  
    else:  
        return None

Naturally you may want custom views; for classes derived from IIngestableFolder, derive from IngestableFolderView to gain the convenience methods isManager (returns True if the logged-in user has management permissions) and contents (returns catalog brains of the folder's contents). For the contained content type, deriving from grok.View or compatible view classes is sufficient.

Plone Registry

The ingest framework makes use of the following keys in the Plone Registry:

  • eke.knowledge.interfaces.ISettings.objects: a list of paths in the portal of IIngestableFolder subclass objects that should be ingested
  • eke.knowledge.interfaces.ISettings.ingestStart: a datetime object telling when the last ingest was started, or None if no ingest is currently running
  • eke.knowledge.interfaces.ISettings.ingestEnabled: a boolean that globally enables or disables ingest; note that IIngestableFolder objects also have their own boolean toggle for ingest enabling fine-grained control over what gets RDF-ingested

Triggering Ingest

Just visit http://portal.com/portal/@@ingestKnowledge. This can be done with a browser (must be logged in with manager permissions) or from a Zope clock event. Replace with the real portal URL.