Overview

Overview

This document describes the GlyTouCan Registration system and flow, as well as details describing the plugin system.

Background

Originally the Glycan Registration process was a singular process which executed every enrichment processing by default. This became bloated, prone to issues, and difficult to manage. The majority of functions were compartmentalized, however there was no concrete framework to add new functionality.

Improvement

A new modular, step-wise method was introduced with version two; this workflow provided the user more transparency in the registration process.

Batch Method

The original registration method processed all enrichment from a glycoct sequence in one process. If an error occurred within this process, the entire transaction was rolled back and an error message was displayed. This method proved to be not very friendly for users and was difficult to process multiple sequences at once.

Herein introduces a new method to store input specifically for the user at the beginning, with minimal validation. At regular intervals or at the user’s request, a registration batch process is executed which enriches the data with required fields and can quickly convey results for massive numbers of structures in a simple user interface

Data Flow

A separate RDF triplestore will be used to store draft data for the initial user content. This data is not public through the repository website, and should be considered unpublished information. When a user inputs data to register(such as a glycan sequence string) via the website or client/REST API, the information is stored immediately in RDF. A log entry is also created to relate the information to the user.

All of the queries are also available via the API.
The module execution flow will be explained below.

Pre-Register Data

The following is the data fields accepted and the associated Class that will be stored. Each type will have it’s own validation checks and enrichment. The field column is a link to the registration process in detail.

Data field	Class
Glycan Sequence	GlycoRDF GlycoSequence
Glycan Name	Saccharide Alias
Glycan Motif	GlycoRDF Motif class
Pubmed ID	`dc:references <http://identifiers.org/pubmed/IDHERE>` as described in guidelines for the ToGo Project
Taxonomy	`glycan:taxon <http://identifiers.org/taxonomy/IDHERE>` as described in guidelines for the ToGo Project

Logging

The logging process is fairly complicated, and will be described in a separate article. In general information such as ID, contributor, description, and type of the action will be stored.

Staging Process

As explained above, the information received will be stored in a draft-version graph section of the triplestore. When the batch process is initiated, all generated content and logging will also be stored in this draft section. The user will be able to view this information in combination with live data to show how it would be once the data is committed in the Draft View functionality.

This simply combines the draft data with public data currently stored in the repository.

Once the user is ready to publish the information, the preregister data is then transferred to the public graph. Once again the same enrichment modules are executed and all results can be checked from the Log View Dashboard.

Module Execution Flow

The most complicated registration process is the sequence structure.
The following describes the processing that occurs when a structure is first recorded. Modules are executed and results can be reviewed from the dashboard.

Once ready, the structure can be committed by the user.

Once committed the data is destroyed, however the graph can also be completely reset in case there is remaining enrichment data.

The “move structure to production” process executes the first workflow, running the modules again this time into the associated
graphs visible on production.

RDF Process Framework

The GlyTouCan project has created a framework for storing glycoinfo data into RDF. This section will describe the parent class functionality that are inherited. It is a high-level view of the framework intentions.

The main purpose of this framework is to give developers a simple interface to query and store GlyTouCan RDF repository data. By simply inheriting the parent classes, an instance of an RDF DAO is made available. This allows for the module to execute any SPARQL necessary. This way there is no need to worry about the complex details regarding how to connect to the RDF Triplestore, software driver dependencies, transactions, and error handling. The logging system is also made available, for a very simple method to communicate to the end user the status of the program.

As an example the following is the interface of GlycoSequence RDF processing. This is the minimal interface required by a module to process a user input GlycoSequence.

public interface GlycoSequenceResourceProcess extends ResourceProcess {
    public ResourceProcessResult processGlycoSequence(String sequence, String contributorId) throws ResourceProcessException;
}

public interface ResourceProcess {
    public SparqlDAO getSparqlDAO();

    public void setSparqlDAO(SparqlDAO sparqlDAO);
}

Please Note: In most cases developers will be extending the child classes for each type of input. This interface is a bit simple and used by the framework even before sequence validation. Please refer to the “Pre-Register Data” section above for the types of data input and better examples of how to create modules for them. This example is used to show the simplicity of the framework; which allows for flexibility in how the batch process is to be used now and in the future.

More details can be seen in the returned ResourceProcessResult class:

public class ResourceProcessResult {

    Entry logMessage;
    String id;
...

}

The logMessage is an Entry class, which is very similar to a log4j log entry:


public class Entry {

    protected LevelType level;
    protected XMLGregorianCalendar date;
    protected String className;
    protected String resource;
    protected String message;
...
}

The level type is a status level, which is enumerated as follows:

public enum LevelType {

    ALL,
    TRACE,
    DEBUG,
    INFO,
    WARN,
    ERROR,
    FATAL,
    OFF;
...
}

This will be logged directly into the logging system.

The interesting part is access to the SparqlDAO class, which is part of the GlyTouCan batch project. This is provided for access to the RDF repository. It is possible to run any SPARQL from this class, without infrastructure worries.

Note the exception handling also uses a specialized class:

public class ResourceProcessException extends Exception {

    public ResourceProcessException(ResourceProcessResult result) {
        super(result.getMessage());
    }
}

The exception also contains the ResourceProcessResult class. Thus in case of errors the logging interface should be updated for the user.

If no SPARQL-update insert is to be made, it should not be set.

To simplify things, an implementation using the Spring Framework is available from the parent class. It is recommended to inherit this for child module interfaces:

public class ResourceProcessParent implements ResourceProcess { 
    @Autowired
    SparqlDAO sparqlDAO;

    public SparqlDAO getSparqlDAO() {
        return sparqlDAO;
    }

    public void setSparqlDAO(SparqlDAO sparqlDAO) {
        this.sparqlDAO = sparqlDAO;
    }
}

SparqlDAO has methods such as query() and insert() which can be used to execute SELECT and INSERT SPARQL, respectively.

Demonstration

As a demonstration of how to use this framework, please refer to the Glycan Registration for a detailed test case.

Batch Project

For more details on SparqlBeans and the SparqlDAO, please refer to the GlyTouCan batch project

Graphs

There already exists a graph policy specific to the partner program, once the registration is committed by the user.

Written with StackEdit.

GlyTouCan Registration System Flow