Corpora

Browse
Match Entities
Search
Transform
Visualize
Intermediate
Expert
LINCS Partner

Corpora is a web-based application with a robust database system for Digital Humanities (DH) projects. You can use Corpora to perform Optical Character Recognition (OCR) on uploaded documents, ascribe Uniform Resource Identifiers (URIs) and Corpora content types to entities, build network visualizations, and more.

To the Tool To the Documentation To GitHub

Corpora and LINCS

In collaboration with LINCS, Corpora is being used to ascribe URIs to named entities in the Advanced Research Consortium (ARC) catalogue and transform its data into triples so that it can be ingested into the LINCS triplestore.

Corpora lets users identify and assign URIs to entities in ARC, and will soon incorporate functionality from LINCS’s Natural Language Processing (NLP) tools such as NERVE. Corpora is also associated with the Rich Prospect Browser (RPB), an in-development visualization tool for Linked Data (LD) that allows users to browse between and within linked databases. Once complete, the RPB will be integrated into Corpora in place of the current network visualization tool.

At present, Corpora is tailored to working with bibliographic data in traditional DH projects that are focused on individual artifacts and entities. Corpora is particularly suited to transforming these types of datasets.

Corpora can be used online or the tool itself can also be downloaded, running and saving data locally. While Corpora will make backups of uploaded datasets when used online, it is not committed to long term data storage.

Prerequisites

You need to come with their own dataset
You need to create a user account
- A GitLab or GitHub account can also be used to import a repository directly to Corpora.
A basic understanding of Python and JSON is required to access full functionality

Corpora supports the following inputs and outputs:

Input: PDF, JPEG, MARC, XML, and more
Output: JSON

Resources

To learn more about Corpora, see the following resources:

LINCS (2021) “Corpora Demo”
LINCS (2021) “Corpora Demo” [Video]

Information about the team that developed Corpora is available on the Tool Credits page.

Corpora and LINCS​

Prerequisites​

Resources​

Corpora and LINCS

Prerequisites

Resources