Jessica and Goliath: Learning 3M and CIDOC CRM
- LINCS Project
- October 20, 2022
— Ze Xi (Jessica) Ye, LINCS metadata co-op —
During my graduate courses in the Faculty of Information at the University of Toronto, I gained a high-level understanding of Linked Open Data (LOD) and the CIDOC CRM ontology, a theoretical and practical tool for information integration in the field of cultural heritage. Because I am an Archives & Records Management student, I never expected to understand LOD and CIDOC CRM to a significant degree, and certainly not to the degree that my position as a Metadata Specialist co-op at LINCS requires of me.
LINCS takes Canadian humanities researchers’ data and converts it into LOD. Structurally, we convert it using the CIDOC CRM ontology. After the researchers’ data has been cleaned and mapping patterns have been approved, the data is converted using an open-source tool called 3M, the Mapping Memory Manager. My primary role at LINCS is to set up and run these conversions. To do this, not only did I have to gain a deeper understanding of LOD, I also had to learn CIDOC CRM and 3M.
The training process took almost a month, twice as long as my fellow summer student hires spent on training for their respective roles. I spent half of the month digesting readings and the other half battling 3M. Taking in so much knowledge within such a short time frame was sometimes overwhelming, but even though I finished my workday feeling like my skull had been cracked open by a gaggle of drunk monkeys performing brain surgery, I loved it. I love learning! Always have. And now I’m getting paid to learn? I’m getting paid to acquire this extremely technically and conceptually difficult skill set? I’m getting paid to think about the theoretical quagmire of whether an action is an addition to a previous production event, or whether it is—in fact—an entirely new production event, and how the distinction between these two things comes down to the intent of the data itself? What a wonder, what a privilege! I haven’t taken notes this detailed since first-year undergrad.
Thanks to a training assignment, by the end of the third week I understood the CIDOC CRM well enough to explain how entities and properties related to each other, and I felt comfortable navigating the scope notes of the latest stable version. When given a mapping pattern and source data, I could manually write out a representation of the data following the CRM using Turtle syntax. Of course, doing this manually for thousands upon thousands of entities would be a highly impractical endeavour, which is where 3M comes in. If I could set up the mapping in 3M, then it would apply the CRM ontology and write the Turtle file for me. The trouble, of course, was setting it up. I remember being disoriented the first time I saw 3M, partly because it looked very much like early internet software and partly because the manual used extremely technical language. I had to make sense of how the matching table (which assigned CRM classes and properties to elements in the source data) related to a generator (which assigned Uniform Resource Identifiers (URIs) and rdfs:labels to entities), how to use the generator to create temporary LINCS URIs or to direct it to a pre-existing URI, how to write a generator file so I could get the appropriate custom labels, and more. It was frustrating at times, but the frustrations only made the successes all the more satisfying. When I uploaded the input and target schema files correctly, when I used a variable for the first time, when I pressed “Transform” and there weren’t any errors, all these successes made me feel like a level 20 tech wizard.
Looking back, the first dataset I converted for LINCS, the Map of Early Modern London (MoEML) Personography, was incredibly straightforward compared to the datasets I’m currently converting. Nonetheless, I still feel immense fondness for it as my first conversion, not to mention deep gratitude to Erin Canning (Ontology Systems Analyst) and Natalie Hervieux (Senior IT Analyst) for helping me through the process. In fact, now converting data in 3M is one of my favourite parts of my job. I can spend hours in 3M fixing errors, making edits, and chasing the thrill of finally getting things to work. It is all the more gratifying to know that the work that I’m doing will benefit others in a tangible way. I’m doing this work so that other people, whether it be researchers or other members of the LINCS team, can copy my mappings or build off of them, can use my documentation to create their own mappings or review datasets, can conduct research and find new connections in their work thanks to the data I converted. I learned something difficult, so that it will be easier for others!
3M is a beast and I have conquered it. Sort of. The thing about learning is that it is a continuous process. One of my former bosses told me that, and I nodded along because I desperately wanted to chomp down on one of the croissants he had brought in; half-listening to his career advice seemed to be the prerequisite for acquiring the pastries, but my teenage self should have listened in earnest because he was right. Every day I learn something new about 3M or CIDOC CRM. Every day there’s something else to tweak, some alternative perspective to consider. When I attended the 2022 LD4 Conference on Linked Data, I had the pleasure of listening to LOD professionals discuss problems that I had never considered before, that I didn’t even know were problems requiring solutions. It was eye opening and invigorating to learn that, despite all the knowledge I have acquired so far, it is only a small slice of what there is to know. There are innovative applications, organizations, and people out there doing things I can’t even imagine. Linked Open Data is an exciting, ever-evolving field, and I cannot wait to see what I learn next.