Could Libraries be a New Source for LLMs, AI Training Data?

University and public libraries could play a central role in determining how to get more training data for LLMs in the future. There are books in libraries in editions without digital copies. There are books on diverse subjects with longer insights than are available in articles on the web.

In seeking additional relevance for libraries, it is possible to use the buzz of AI and its data need to get students and library members to curate information informally, by flipping. People often find something interesting in books in the library. Whatever they find interesting—linked to an epistemological set—can be noted. For example, under invention, some known invention but without a popular inventor could be noted. It may also be something about how, where or why. There could be other interesting stuff, ranging from simple to complex.

The notes, to be typed or written, can then be submitted to the library admin, which can then become a new collection of interesting information, from the books in that library. These same notes can then be useful to train LLMs, based on categories.

For example, a category like taste, for some decoration choices, or something on courage, for some infantry and so forth. These categories and examples can be used to feed LLMs, so that they are able to identify similarities in other data they have.

Students can use the library for some credit, per semester. Members of public libraries can also have fees waived or have some perks of borrowing, or spaces, if they decide to participate. Every month, some shelves will be assigned to library users, to flip through every book there, noting whatever they find interesting. The same will apply to students, with shelf assignments per semester.

The books on the shelves and those stored away would be flipped, without exception. There could be other perks for people who come to public libraries, especially those who may not be employed or underemployed—in ways for some alternative remuneration.

There could also be pictures that maybe added or their caption. There could be information on what book, edition, the page and author, so that when other people try to find it, it is easy. The process can be called organized serendipity, excavating more exceptional details.

There are going to be more diverse ways to train LLMs in the future, this could be included, especially for how they are specific and how they can help increase the probabilities against hallucination.

Don’t like ads? Become a supporter and enjoy The Good Men Project ad free

—

This Post is republished on Medium.

—

Photo credit: iStock