Photo provided by wikimedia.

1.2.6 was released yesterday, after major backend upgrades and collaboration with partner organizations.

Main Features

An interesting aspect of Linked Data stored in triples is the storing of relationships between each piece of data. Due to this, as the data expands, it is possible to formulate more ways to view this information. For this release we took advantage of the data formalization steps taken last time (to clean up all of the volunteered data we received), as well as the new data being registered within the glytoucan partner program. This shows the nice boosts in productivity and efficiency achieved by taking the time to organize data into a well-defined ontology.

Database Filtering

The View All page, which allows users to filter down to distinct structures, can now filter based on specific databases where the structure was supplied. It is also possible to view the intersection between these databases; so if you would like to see the structures stored both within your own database and those that intersect with PDB, it is now possible.

databases

Partner Glycan Id Registration

With this release, all of the major backend upgrades are finally complete. Thus the work that started with the STINT program while we were in Sweden is finally coming to fruition. During the program we had designed a method to register structures securely through an API and a client. Both of these subsystems have been matured to the point where it is possible for any registered partner to register structures simply through a command line.

It is also possible to register your own glycan structures id’s and the URL to link it back to. This will allow for easier registration of structures and automatic maintenance of links stored within the repository.

During the partner registration process we also request the database name that should be used for your glycan id. This name will be used within the Database filtering (explained above) to get a quick summary of the linked data between the repository and your public database.

A lot of work was put into this release with help from many developers of partner organizations. Thanks to their input we added some very nice last-minute features into the partner process.

Now that this is complete, it’s possible to look at what other new features should be added for partners and the data they would like to share.

Future work (and Other Technical Mumbo)

Instead of going into details about the backend deficiencies that existed previous to release, I think a more positive point of view would be to look at what this means for glytoucan in the future.

The reason why major backend overhauls require so much time and planning is because it is a difficult balancing act between many aspects: the data that exists currently, the system as it functions currently, the new functionality required, the new data required, the new systems that need to be introduced.

These aspects impact every piece of the entire glytoucan system, including the migration of the code out of the development environment and into the public view.

It is quite a relief to be able to have overcome this milestone. One major reason why is because now that the foundation has matured, it will be possible to easily expand and grow any new features that user’s would like. I don’t want to go into too much detail at this point, as some things are finalized and others are not. Please volunteer any suggestions via the contact information below!

toco in flight

The Toco is the largest and probably the best known species in the toucan family.

Accepting User Data

The core values

There are many philosophies behind how a web-based system should be structured, and one major issue is how much user-supplied information should be incorporated into the system.

At it’s core, GlyTouCan is very simple. When one considers a “repository to identify structures” the keywords here are identify and structure. At the least, the repository would store two items: a standard method of representing a structure, and an identifier. To take this idea even further, if the structure can be represented in a unique fashion the representation itself could be considered the identifier, thus leaving only one field required.

What this means is that GlyTouCan will eventually have to replace the randomly-generated identifiers with a unique representation method (WURCS), similar to IUPAC InChI. This is important to consider for future work.

Clarifying Curation

Considering the above, there would be limited curation required for the repository. This is because the primary role of the repository is to hold a unique identifier. If the underlying structure attached to the identifier is invalid or incorrect, then the importance of the identifier would naturally decrease. Citing this type of identifier is simply the same as referring to an invalid structure.

This methodology is necessary for the more realistic requirements: GlyTouCan does not have a full-time support staff; it should be considered that the amount of GlyTouCan-specific maintenance resources is zero.

With this in mind, the amount of curation required for other user-input data should be handled in the same method. However, zero support is not future-proof, a volunteer-based group of moderators is required.

The GlyTouCan Partner program exists as part of the GlycoInformatics Consortium (GLIC) to provide collaboration and support between Glycomics researchers, including both biological and information technology specialists. GlyTouCan users that are members of a partner organization would be considered to have a Moderator-role. These two roles of regular user and moderator can be utilized for all supplementary data fields.

Supplementary Data

Before describing in detail the accepted data fields, it should be noted how this data is accepted from the system. On each Glycan Entry page, a button links to a Register Supplementary Data Page. This page will display the base structural information, and fields below will allow user input for each supplementary data type accepted.

In all cases, Spam-prevention will be implemented with an Image Captcha method.

Literature

One of the first requests regarding supplementary information was to attach associated publications in which structures are referenced. Sample data for this case was already provided by partner databases, and can be seen on the Glycan Entry under the Literature section.

This utilizes a pubmed id to link to, like the following:

9565568

Permissions

Any registered user will have the ability to submit a pubmed id similar to the glycan structure text input. This will be implemented with the standard GlyTouCan System framework, starting with the API and having the web interface interact with the API server.

There should exist a method to maintain this data with removal functionality. However any registered user should not be able to freely remove these id’s that other users have submitted. A higher-level trusted user - a moderator role - is needed to be able to have other users remove this data.

Therefore any user that is a member of a partner organization will have the ability to remove literature id’s associated to structures. This will be implemented with the client interface, as having a delete button on a web page will be prone to mistakes.

Curation

The amount of curation involved for this is limited. There is no realistic method of checking to confirm if the structure truly exists within the publication, the user will be trusted with this responsibility.

One level of curation is utilizing the NCBI API to confirm the id truly exists and the link will not be broken to a 404 error message.

Status

This document is the first proposal of this functionality. Screenshot images and other details will be added as development progresses. Once implementation and release is complete, this document will be moved to the core support documentation.

Glytoucan Code

Latest Toco Upgrade Release