OH Recap: Metadata for Open Textbooks

Are you an author, librarian, or staff member creating or using Open Textbooks at your institution? Learn more about the metadata that helps make these books discoverable in this month’s Office Hours session! Scroll down to read a recap, or watch the video recording.

This month’s Office Hours event, hosted by the Open Textbook Network and the Rebus Community, covered a technical but important topic in the growing world of Open Textbooks – metadata. To help us understand how metadata works, we invited special guests Naomi Eichenlaub (Ryerson University), Sarah Cohen (Open Textbook Network), and Hugh McGuire (Rebus). Laura Dawson (Numerical Gurus) was unfortunately unable to attend the event, but you can read what she has said about metadata in the past.

Watch a recap of the session below, or continue reading for the complete summary. Metadata is a complex topic, and there were a lot of acronyms thrown around during this call. Scroll down to get some clarification on the technical terms mentioned during this event!

Rebus Foundation co-founder Hugh McGuire started the session by introducing the Rebus Community, which is building a new, collaborative model for open textbook publishing. Next, Sarah Cohen introduced the Open Textbook Network, which is active in over 600 campuses and promotes access, affordability, and student success through the use of open textbooks. She said there were currently 425 books in their Open Textbook Library, and that number was growing.

As the universe of Open Textbooks expands, Hugh said, it is more important than ever that we think of how these resources are categorized, and how they can be discovered by faculty and other users: which means using metadata.

Metadata is a bit of a buzzword, but what does it mean? According to the Government of Canada Records Management Metadata Standard, metadata is “structured information about the characteristics of an analog or digital resource which helps identify and manage that resource.” In the context of Open Textbooks, metadata is information about a book, attached to a book file, including the usual things like title, author(s), subject, license, and ISBNs, as well as potentially more complex data around versioning and accessibility.

Okay… but why should I care about it? Because metadata:

provides everyone with useful information about a book and its content;
can be both machine- and human-readable;
makes a book you create discoverable in different repositories, libraries, and catalogues; and
helps people in their search for the right book to adopt.

You may not be involved in determining how information about a book is being shared with different softwares (like libraries or repositories), but it’s important to know that information is being sent and received! Without it, books would be all but impossible to find and collections impossible to navigate, meaning that valuable resources couldn’t reach the people who benefit from them.

Naomi Eichenlaub, a catalogue librarian at Ryerson University, first came into contact with metadata while working on an Open Publishing Infrastructure Project to extend BCcampus’ Open Textbook collection and migrate it to eCampus Ontario’s new Open Textbook Library. During the course of this project, Naomi looked at trends in metadata, trying to find the best schema (a schema is a “framework that specifies and describes a standard set of metadata elements and their interrelationships” (ISO)) that would help integrate BCcampus’ repository. Naomi said they looked at various schemas, and settled with Dublin Core for this prototype. She hopes that this project will allow them to integrate other schemas, allowing them to submit content to different repositories, and in so doing, expand access to all kinds of content (not just books).

Sarah Cohen, managing director at the Open Textbook Network, said that they used Machine-Readable Cataloging (MARC records) in the Open Textbook Library, which can be downloaded by users if needed. The library does not host materials itself, but rather refers to other repositories, so OTN wanted a schema that worked well with Open Public Access Catalogs (OPAC) that most universities have. Sarah said that the challenge was to point to the right location for the content that was being searched, and allow for easy correction of any broken links. They are working with Colorado State University and the Online Computer Library Center to clean these records.

Hugh McGuire, co-founder of the Rebus Foundation, described the World Wide Web Consortium’s (W3C) initiative to create web-native standards for web publications. While this process is a lengthy one, it involves first determining which metadata fields are mandatory (like author, title, license), and which can be optional. Next, the web publication working group will look at ways to link this standardized metadata file to existing schemas. Hugh says that Open Textbooks will be the first use-case for this new specification.

Melinda Boland, a guest at Office Hours from OER Commons, explained that they host and link to over 60,000 pieces of OER in their digital public library. Michelle Brennan, their information services manager, said that they follow the IEEE standard for Learning Object Metadata as a guiding profile to make it easy for content to be searchable and for users to find these resources. Their approach is to build different modules on top of this core that map to different metadata standards in the field.

Thanks to Naomi for sharing this comic on Standards by Randall Munroe (xkcd.com). This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License.

After speakers discussed the importance of metadata, and their different approaches, participants had some questions. Some wondered what kinds of accessibility metadata were being used. Michelle Brennan, information services manager at OER Commons, explained that they use A11-Y, which is a community-driven effort to improve web accessibility.

Others had questions about versioning, and its implications on a book’s metadata. Hugh said that this was something to think about as we work to build a formalized means of handling metadata for books on the web. Melinda Boland, also from OER Commons, said that including a Version History to each book (or web object) is good practice. Participants also wondered how different versions of a book would be indicated to users searching in repositories or catalogues. Jonathan Poritz, professor at Colorado State University, pointed to versioning systems like GitHub and Wikipedia to help track the lineage of an Open Textbook as it undergoes revisions or remixing. Another participant suggested the GITenburg project as an example.

This session revealed that we still have a long way to go in working out best practices for metadata in the Open Textbook arena, and that many conversations need to take place to best lay out a universal standard for all kinds of web-native open content. However, metadata is a fundamental (if complex) building block for Open Education, and we hope to have more discussions about them down the line!

To keep the conversation going, head over to the Rebus Community Forum, or join us at another Office Hours event.

Resources:

Here’s a list of some metadata-related technical terms, and what they mean.
Technical Term	Description
LMRI (Learning Resource Management Initiative)	Co-led by the Association of Educational Publishers and Creative Commons to build a common metadata vocabulary for educational resources. It is for learning objects only, and was recently accepted to schema.org.
IEEE LOM (Institute of Electrical and Electronics Engineers Standards Association Learning Object Metadata)	Specifies the structure of metadata for learning objects in the IEEE standard.
DCMI (Dublin Core Metadata Initiative)	Supports innovations in metadata design and best practices.
Schema.org	The closest thing to a standard for web content. It includes different schemas that help structure the web.
A11-Y	A numeronym for accessibility, the A11-Y project looks to make web accessibility easier for developers to implement.
MARC (MAchine-Readable Cataloging)	A data format introduced by the Library of Congress, it is now popular in most libraries.
NSDL_DC (National Science Digital Library)	A variant on the Dublin Core standard.

Stay up to date!