Data Management Plan
The LINCS project
The Linked Infrastructure for Networked Cultural Scholarship (LINCS) project converts, connects, enhances, and makes available Canadian cultural research by providing a stable core of infrastructure for linked open data (LOD) in the humanities, with a focus on transforming existing data and establishing robust, sustainable long-term storage and access. LINCS enables retrieval and analysis not only within but also across heterogeneous datasets, as well as across open and protected datasets; it also mobilizes researcher datasets to create a strong foundation for scaling up cultural research.
LINCS provides a suite of tools for access to and transformation or creation of data. Components of the LINCS transformation workflow are available through Application Programming Interfaces (APIs) and web-based tools (e.g., LINCS ResearchSpace, LINCS Context Explorer). LINCS prioritizes best practices in data management—recognizing that different datasets have different requirements based on their provenance and the individuals and organizations associated with them—working with partners in the digital research infrastructure ecosystem.
Data inclusion policy
What data does LINCS publish?
LINCS publishes a broad range of cultural data across the humanities and social sciences. Datasets that fall within one of the LINCS Research Areas are strongly encouraged, but LINCS is also open to publishing data in other areas. LINCS is especially interested in datasets that are predominantly Canadian or related to geographically contiguous territorial and border Indigenous nations.
LINCS can create linked open data from structured, semi-structured, and unstructured datasets. Everything from images to texts, maps to music is eligible—so long as the source data is readily available.
Who can publish data with LINCS?
Contributors must be Canadian, based in Canada, or working on a dataset that is predominantly Canadian or related to geographically contiguous territorial and border Indigenous nations. LINCS also invites contributors from underrepresented communities or who are working with datasets related to underrepresented communities within and beyond Canada.
What conditions are placed on the source dataset?
LINCS places three conditions on the source datasets on which the linked open data is based:
- The source data, if there is such, should have stable storage, be supported by a data management plan, and ideally be archived in a national repository such as Borealis. Specifics will vary by time of project. The Data Publication License Agreement advises on the management of linked datasets that are being published with LINCS, source collections from which they are derived, and any third-party data from which researchers draw (see Source Data Management).
- If the data stewards or contributors are not themselves the owners of the data and the data is not open under conditions that allow for such use, the data stewards or contributors must have the source data owner’s permission to publish linked open data derived from the source data.
- For datasets that require ethics clearance according to the Tri-Agency Policy for Ethical Conduct for Research Involving Humans, the data stewards or contributors must have research ethics board approval from their home institutions or from a LINCS host institution.
What is the publication application process?
Contributors who would like to publish data with LINCS are encouraged to reach out before they start the data transformation process, although enquiries are welcome at any time.
Contributors or data stewards are required to sign the LINCS Data Publication License Agreement to consent to the eventual publication of their data with LINCS.
Data and metadata
LINCS data
The following datasets are openly available:
- The LINCS knowledge graph of published datasets expressed using the RDF (Resource Description Framework) W3C standard data format, through multiple interfaces:
- ResearchSpace, a linked data platform of which LINCS runs two instances:
- LINCS ResearchSpace: Published datasets that have been finalized and approved.
- LINCS ResearchSpace Review: Datasets made available with restricted access to dataset owners and administrators after initial processing. Draft data can be reviewed and edited (manually or through batch workflows) in Review before being published to LINCS ResearchSpace.
- Triplestores
- The ResearchSpace Blazegraph triplestores are available for both LINCS ResearchSpace and LINCS ResearchSpace Review instances (for published and the draft data respectively.)
- LINCS Fuseki triplestore endpoints are available for both published and draft data. These endpoints serve the LINCS Portal, Context Explorer, and LINCS Application Programming Interfaces (APIs).
- ResearchSpace, a linked data platform of which LINCS runs two instances:
- Vocabularies based on the SKOS data model through the LINCS Vocabulary Browser
- Code in the LINCS Code Repository
- Documentation on the LINCS Portal
Metadata
Individual datasets that are part of the LINCS knowledge graph are distinguished from one another by one or more specific namespaces for their linked data graphs. Some graphs contain other graphs. For example, the Canadian History dataset contains subgraphs for Historical Canadian Persons and for the historical department of Indian Affairs. The Project Datasets page provides a current list of the named graphs in the LINCS dataset.
Dataset-specific metadata is recorded as linked open data as part of the dataset graph, using standard CIDOC-CRM properties, and it is stored within the triplestore. The content of dataset-specific metadata is determined in the early stages of data ingestion by researchers, contributors, and/or data stewards and the LINCS technical team. Source and ownership details and contribution credits are provided in the metadata (e.g., dataset metadata, vocabulary metadata) for each LINCS dataset. Datasets and vocabularies archived in Borealis Dataverse are assigned a DOI, which makes it possible to cite the data as hosted by LINCS and to cite the archived data in Borealis.
The dataset metadata schema uses the CIDOC CRM properties for metadata specified in CRMdig ontology, along with external vocabulary terms and/or LINCS-minted terms as necessary. Metadata is used by LINCS tools to support access to and description of LINCS datasets, as well as for populating archival metadata records associated with archived LINCS datasets in Borealis Dataverse.
Documentation
LINCS uses CIDOC CRM as its core ontology and the Web Annotation (OA) Data Model for relating entities, assertions, and source materials; LINCS also follows related standards such as the Web Ontology Language (OWL) and Simple Knowledge Organization System (SKOS). Additionally, LINCS uses CIDOC CRM extensions such as FRBRoo for bibliographic data and CRMtex for ancient textual data, as well as a wide range of external vocabularies. The standards currently in use by LINCS come from existing work; further standards may be added as new datasets are added.
Information about LINCS’s metadata standards, ontology choices, and vocabularies are available on the project website; in particular, see the main Application Profiles.
Dataset-specific documentation is found in the Application Profile for each dataset, accessible from the Project Datasets page.
Data management policies
Ethics and legal compliance
Ethical and legal compliance needs vary from one dataset to another. LINCS strives to ensure ethical and legal management of data and intellectual property matters with respect to the data it publishes. With respect to Indigenous and Traditional Knowledges, LINCS works to promote data sovereignty and to adhere to the CARE Principles of Indigenous Data Management.
Data stewards are responsible for ensuring that the production and publication of their datasets, where applicable, adheres to Tri-Agency Policy for Ethical Conduct for Research Involving Humans; particular attention should be paid to the section on linkage of sensitive data. Where some or all of a dataset should not be openly published for reasons of sovereignty or sensitivity, LINCS will advise data stewards on other storage options but does not provide secure storage beyond what is afforded by the project’s protected environment. Data stewards should abide by protocols as implemented by their home institutions or organizations regarding ownership, attribution, and ability to license; primary and secondary uses of the data; and obtaining and documenting consent where informed consent is required. In the absence of home institution policies, the University of Guelph policies will be used. If there are meaningful discrepancies between home institution guidelines and University of Guelph guidelines, LINCS reserves the right to request adherence to University of Guelph policies. LINCS will decline or withdraw publication if, in the opinion of its Board, it would be inadvisable to publish due to any of the considerations mentioned in this paragraph.
All data published or stored by LINCS must also adhere to Digital Research Alliance of Canada (DRAC) requirements for data hosted on its system.
Access and security
Open access
LINCS datasets are protected during the preparation phase, but the final processed linked open data for these datasets is intended to be published openly and freely shared as findable, accessible, interoperable and reusable (FAIR) data; LINCS typically publishes data under the CC BY 4.0 License. Datasets that have been granted a sharing exemption in the LINCS Data Publication License Agreement for reasons related to matters such as data sovereignty, community rights, or accessibility may be excluded from the openly available materials. Ideally some metadata or some proportion of that data will be shared openly.
User permissions
- LINCS operates a LINCS user management system that implements access control and permissions policies. Where possible, user access is coordinated across platforms so that proper permission profiles are assigned to users as they log on to the LINCS tools and platforms.
- Access to some datasets may be restricted or closed on a case-by-case basis.
DRAC security
- LINCS operates within the DRAC cloud and as such must conform where appropriate to and is reliant on the security measures implemented by DRAC. For more information, refer to DRAC Security Policies particularly the Cybersecurity Policy, the Data Classification Policy and the Data Handling Policy.
Tool and software access
The LINCS Portal (lincsproject.ca) acts as a point of access and discovery for all LINCS web services and for services available to LINCS researchers at other sites, as well as for various scripts and software that are not deployed on the web. Documentation is available through the portal for commonly used tools and in the code repository for less used ones. The LINCS Portal administrator works to ensure that documentation in the portal is up to date and created consistently.
LINCS access, analysis, and data transformation tools, scripts, and software are freely accessible as hosted services on the DRAC infrastructure and also as projects in open-source repositories such as GitLab.
LINCS tools and software include:
- Code and programs created by LINCS staff and researchers
- Code and programs are open source with an appropriate license (e.g., GNU Affero General Public License) as described in the LINCS Software License.
- Programs and systems from external sources
- External programs and systems may be either open source or proprietary.
- LINCS researchers may also have access to—and make use of—computing programs and resources at partner institutions.
Sustainability
LINCS was conceived and funded as long-term infrastructure; however, circumstances may make continued operations impossible. LINCS reserves the right to reassign Data Publication License Agreements to another organization or legal entity, if an entity is elected to or is established to represent LINCS. Reasonable efforts will be made to provide data stewards of any changes. If LINCS terminates its data-archiving activities, it will attempt to transfer data and/or software to a similar organization in accordance with the terms of the Data Publication License Agreement.
Resources
Data storage
LINCS takes responsibility for the management and long-term preservation of the linked open data that it hosts and publishes at no cost to individual researchers, relying on the LINCS partnership with University of Victoria Libraries, the Borealis Dataverse national data archiving infrastructure (part of the Canadian Dataverse repository), and other ecosystem partners. LINCS does not host source or image data beyond the few images required for the project landing pages in LINCS tools.
The storage requirements of the LINCS system are growing as the number of hosted datasets grows. Storage requirements are updated in annual reports and allocation requests to the Digital Research Alliance of Canada. Allocations as of January 2024 were:
- Digital collections including short-term backup storage: 30TB
- User accounts and scratch storage: 3TB (estimating a minimum of 500 users with accounts)
- Triplestores: 1TB
- Documentation, linkage to other collections, metadata, local copies of ontologies, software: 1TB
- Total: 35 TB
Users who require large amounts of data storage over an extended period of time can request storage from LINCS. Canadian researchers, librarians, and sponsored graduate students may also apply for a separate processing or storage allocation on their own account through DRAC.
Computation
- LINCS member computation
- Computation by researchers, contributors, and data stewards should comply with the established policies for the DRAC advanced research computing resources. LINCS technical staff will facilitate access to resources for LINCS-related activities, where required.
- Background processing
- Some large datasets will have to be analyzed and processed in ways that require considerable computational resources. LINCS allocate resources for these activities with priority for new datasets.
- Web services
- LINCS management will ensure, within the constraints of resources allocated from DRAC, that there are sufficient computational and network resources for all LINCS web services to function and to serve the research community in a timely manner. LINCS has no control over system and storage outages on the DRAC system but monitors systems and alerts DRAC to problems as they arise. LINCS has no guarantee that there will be future DRAC allocations for LINCS but will take all necessary actions to ensure that resources continue to be made available.
Data management and preservation
LINCS datasets, vocabularies and ontologies
LINCS file and data formats are strongly positioned for data reuse, sharing and long-term access.
The source of truth for published LINCS data is the Blazegraph triplestore for the ResearchSpace Review environment. The source of truth for vocabularies is the LINCS Vocabularies repository
This data is backed up weekly by LINCS on DRAC storage. Backup policies and procedures are documented in the backup section of the LINCS configuration project.
With our partner, the University of Victoria, LINCS works to ensure that vital data is backed up for quick recovery in the short term and preserved in the long term. A mirror site for ResearchSpace is hosted in a separate DRAC datacenter, which contains a copy of the LINCS single sign-on system so that anyone with access to the LINCS ResearchSpace can access read-only ResearchSpace data in the event of complete failure of the primary infrastructure. Snapshots of the production triplestore are archived at least annually and individual datasets are archived for long-term preservation in the LINCS Borealis Dataverse.
Through these and other measures LINCS aims to keep the linked data it hosts accessible in the long term, recognizing that interfaces are finite. See the LINCS sustainability statement for additional information and context.
Dataset versioning
LINCS allows edits and additions to the data in the triplestore. LINCS’s dataset versioning follows the versioning practices set out by Borealis Dataverse. When edits and additions are made to the data, the dataset is versioned using semantic versioning (major.minor numbering; e.g., from version 1.0 to version 1.1, or from version 1.0 to version 2.0). Changes are defined as follows:
- Minor changes include metadata changes; non-breaking changes to individual triples; adding additional data.
- Major changes include replacing existing triples; replacing the entire dataset; deprecating one or more triple.
LINCS recognizes that adding to a dataset can be a significant amount of work and can result in a new dataset that is substantially different from the previously published dataset. Although adding data is technically classed as a minor change, at the point of publication, researchers are welcome to request the addition be entered as a major change, thus triggering a new major number for the dataset (e.g., from version 1.0 to version 2.0).
Only current data is visible in the LINCS triplestore. Current and previous data is accessible in Borealis Dataverse, except where data has been deaccessioned.
Deprecated and deaccessioned data
LINCS may deprecate URIs (dataset entities or vocabulary terms) that are erroneous, problematic, or duplicates of other entities, among other reasons.
When needed, LINCS will deprecate URIs in the published LINCS knowledge graph and in hosted vocabularies. Deprecation does not mean that the entity will be deleted. Incoming and outgoing statements may be removed or moved (i.e., to point to another entity). The URI will remain and will have an outgoing statement of owl:deprecated true
added to describe the deprecation. Other statements may also be added, where needed, such as:
dcterms:replacedBy
newTermdc:modified
deprecation dateskos:scopeNote
"deprecated due to …"
LINCS respects the right to be forgotten, and where required will deaccession objected-to statements connected to the URIs.
Data stewards can ask the LINCS team to deprecate terms. Requests for the LINCS team to deprecate vocabulary terms can be made by any LINCS users using the Vocabulary Browser’s feedback form, or by data stewards as part of the data processing workflow. LINCS will notify the data stewards of any third-party deprecation requests.
LINCS will not deprecate entities in archives. Where deemed necessary, as determined by the LINCS team at their sole discretion, LINCS will deaccession archived versions of datasets if they contain entities that have been deprecated and for which the LINCS team has deemed access to be revoked. Data will be deaccessioned according to the Borealis Dataverse deaccessioning guidelines. A record of the deaccessioned page and an associated persistent identifier will be maintained in Borealis Dataverse.
Researchers will be referred to archived data if they encounter, in the live dataset, deprecated entities for which they require more information. In cases where a dataset has been deaccessioned, LINCS will not be able to provide additional information about what was contained in the dataset.
LINCS processing data
LINCS has devised processes for taking in datasets and processing them where needed to produce enhanced versions of files. Processes performed to create LOD from source datasets are documented in general as part of the LINCS internal transformation workflow process. The dataset Application Profiles found with each Project Dataset description record the decisions made with respect to modelling, vocabularies, and major patterns. If desired, data stewards may create dataset-specific documentation of the processes and decisions underlying their data.
Processing data is stored by LINCS for at least two years. If requested, enhanced files will be returned to dataset contributors or stewards for long-term storage and preservation (see Source data management recommendations).
LINCS code
LINCS code is managed through the GitLab version control repository. LINCS tries to follow best practices in open-source software management.
Source data management recommendations
LINCS relies on the researchers, data stewards, and partners who publish linked data with LINCS—in conjunction with other partners within the digital research infrastructure ecosystem—to manage and preserve any source datasets from which LINCS data is derived. Such source datasets constitute publications in their own right and should be preserved and assigned a Digital Object Identifier (DOI) so that they can be cited in the metadata for the associated LINCS dataset.
Researchers and research projects associated with LINCS are encouraged to work in consultation with their digital scholarship or institutional repository librarians, with our partners at Scholars Portal, or with other organizations nationally and internationally to ensure sound management and preservation of their source data. LINCS strongly advises working with an organization connected with the Federated Research Data Repository initiative.
LINCS can provide general advice on source data management. Many datasets coming into LINCS will already have stable hosting in other locations, such as a repository like the Collaboratory for Writing and Research on Culture (CWRC), which provides long-term preservation for the data it hosts.
Data Management Plans (DMP) for individual projects or datasets may refer to this document, the LINCS Data Management Plan, and may want to account for three types of data:
- Source data to which LINCS-hosted triples are linked or from which they derive. The DMP should account for the stable, long-term hosting of these resources.
- Researcher-created or researcher-modified data created or used during the data transformation workflow with LINCS. Researchers are responsible, where necessary, for long-term archiving of this data created as part of the transformation process, if this interim dataset will be required for reference, citation, and use in creating future data.
- The linked dataset(s) created with LINCS.
For datasets not connected to a long-term repository, the LINCS Project Lead and/or LINCS Research Board Chair will advise on how a dataset can be archived and try to broker support for the process, if needed, with an appropriate partner in the research data management ecosystem.
Some datasets may also require administrative work to obtain licensing permissions and the processing of the data to achieve anonymization and de-identification and to demonstrate provenance. They may also require maintenance and updating. Such matters are the responsibility of the dataset contributors or data stewards.
Document Details
Version: 1.6
Authors: Susan Brown (University of Guelph), Pieter Botha (University of Guelph), Erin Canning (University of Guelph), Natalie Hervieux (University of Alberta), Kim Martin (University of Guelph), Daniele Metilli (University of Guelph), Alliyya Mo (University of Guelph), Sarah Roger (University of Guelph), Zachary Schoenberger (University of Victoria), Deb Stacey (University of Guelph), Jessica Ye (University of Guelph)
Last Updated: 2025-05-01
Released: 2025-09-23