Skip to main content

Conversion Workflows

Introduction

LINCS has developed a series of conversion workflows to cover the most common starting points for creating Linked Open Data (LOD).

All of the general information about contributing data to LINCS as well as initial steps of expressing interest and completing the data-intake interview process apply to all workflows; see the Publish Data with LINCS and Learn about Contributing pages.

Browse through the following four tabs for an overview of each workflow and to understand how to categorize your data. The rest of the pages in this conversion workflow documentation cover each individual conversion step in order. Each step contains these same four tabs so that you can tailor the instructions to your data.

Structured Data can take the form of spreadsheets (e.g., CSV, TSV, XSL, XSLX), relational databases, JSON files, RDF files, and XML files.

We count data as structured if:

  • the entities are all tagged individually (e.g., one entity per spreadsheet cell or XML element)

And the entities are connected, either:

  • in a hierarchical way (e.g., nested XML elements)
  • with relationships between entities expressed following some clearly-defined schema and data structure (e.g., spreadsheet headings relating columns of entities together)

Data Example

Here are data samples from two projects published with LINCS that began as structured data.

The Canadian Centre for Ethnomusicology data started as several spreadsheets with a row for each artifact.

IDTitleplaceMadeplaceMadeIDmaterialmaterialID
CCEA-L1995.63Bamboo FluteEdmontonhttps://sws.geonames.org/5946768bamboohttp://www.wikidata.org/entity/Q27891820
CCEA1995.21Pair of Taiko DrumsShinanohttps://sws.geonames.org/1852136hidehttp://www.wikidata.org/entity/Q3291230

The University of Saskatchewan Art Collection data began as an XML file with a parent element for each art object.

<?xml version="1.0" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<ObjectIdentifier>1910.001.001</ObjectIdentifier>
<AcquistionDate>1910</AcquistionDate>
<ObjectTitle>Portrait of Thomas Copland</ObjectTitle>
<ArtistName url="http://www.wikidata.org/entity/Q100921439">Victor Albert Long</ArtistName>
<Medium url="http://vocab.getty.edu/aat/300015050">oil paint</Medium>
<Category url="http://vocab.getty.edu/aat/300033618">painting</Category>
</rdf:Description>
<rdf:Description>
<ObjectIdentifier>2018.026.001</ObjectIdentifier>
<AcquistionDate>2018</AcquistionDate>
<ObjectTitle>Grace</ObjectTitle>
<ArtistName url="http://www.wikidata.org/entity/Q19609740">Lori Blondeau</ArtistName>
<Medium url="http://vocab.getty.edu/aat/300265621">inkjet print</Medium>
<Category url="http://vocab.getty.edu/aat/300046300">Photograph</Category>
</rdf:Description>
</rdf:RDF>

Workflow Overview

This workflow is our most customizable and curatable because the entities and relationships are clearly defined in the source data. We typically create a custom conceptual mapping for each dataset, reusing past mappings where possible, and convert the data using the 3M mapping tool.

info

Is your data close to fitting into this category, but needs a bit of cleaning first? Follow along through the data cleaning step for guidance.

Questions

Below are a list of questions that are important to consider before, during, and after the conversion process.

Before Conversion

info

Interested in contributing your converted data to LINCS? Review the Contribute Data to LINCS and Learn about Contributing pages to understand if your data is a good fit for LINCS, what is required of your team, what support LINCS can offer, and how to contact LINCS.

Are the conversion workflows automated?

The projects contributing to LINCS vary in the content and format of their source data. To accommodate all of this important data diversity, LINCS has prioritized workflows made up of independent steps that can be done in multiple different ways depending on the type of data and the needs of the Research Team. This means that we do not have a single automated workflow.

Each workflow contains a mix of steps that are more automated and others that require manual work. Similarly, some steps will feed into the next without any additional effort, while others may require data manipulation.

Not ready to commit to the entire process?

There are parts of the conversion process that the Research Team can begin before fully committing to converting data with LINCS. In particular, cleaning and reconciling your data will help you understand your data better and make it easier for you to work with it, even if you do not convert it to LOD and contribute it to the LINCS triplestore. For more information, see Data Cleaning and Reconciliation.

These conversion steps and associated tools can be used even if you are not contributing to LINCS. For example, you can use these steps if you have data from a domain outside of the LINCS Areas of Inquiry and will contribute to another LOD project.

During Conversion

How does LINCS collaborate?

Once the conversion process has started, the Research Team will be connected with LINCS team members who will help with each step of the process. The Research Team will also need to provide LINCS contact information for their team members so everyone can be kept informed.

Conversion is an iterative process. The Research Team can expect to have regular meetings with LINCS team members to discuss the conversion process and work collaboratively. The more time that the Research Team can make available for these meetings and for the work required throughout the process, the faster the data will be prepared.

When will my data be public?

After your dataset is converted, it will be ingested into the LINCS triplestore as a trial. This ingestion allows the Research Team to view their dataset in ResearchSpace and look for errors. While the dataset is technically public, it is not yet published LOD. The dataset should not be used in publications at this stage.

After errors have been spotted and changes have been made, the final version of the dataset will be uploaded to the LINCS triplestore. The final dataset will now be accessible via the official version of ResearchSpace and is published LOD. This data can be used in publications and shared with others. It will also be available to others who want to use and connect to the data, except in limited, mutually agreed-upon circumstances.

After Conversion

What if I want to edit my data?

Once the dataset is published to ResearchSpace, the Research Team can make changes to the data directly in ResearchSpace. Changes made by the Research Team affect the version of the dataset that is in the LINCS triplestore, which means that the conversion workflow does not need to be repeated.

What if I want to add more data later?

If the Research Team wants to add more data after the conversion process, LINCS can rerun the data conversion without repeating the consultation process if the new data has the exact same structure as the initial data. If the new data does not have the same structure, the conversion process will need to be altered and repeated. Steps like reconciliation will always need to be redone if there are new entities in the new data. Note that the new data can then be merged with the existing project or can be made into a new, separate project.