Collections as Data, Tell us about it! - Forum #2, May 7-8, 2018 hosted in Las Vegas by the University of Nevada, Las Vegas

This week, about 30 scholars, archivists, cultural heritage workers and data enthusiasts gathered in the desert to weigh in on the progress of the IMLS funded Collections as Data grant objectives. After a series of presentations, we were guided through small group exercises to examine project personas, frameworks, and functional requirements. Below is the structured summary of the public facing (all presentations were live streamed) portions of the program. I'm looking forward to sharing this work with my colleagues at UNC-Chapel Hill, keeping in touch with the Collections as Data project team and my fellow attendees!

Neon sign from the Yucca Motel (currently closed) placed in the "Neon Barnyard", also known as the Neon Museum in Las Vegas, NV)
Who is "Collections as Data" For:

Dot Porter (UPenn):
Curators: Looking at collections in new ways helps to make value propositions to donors and researchers -- should we be trained in programming, maybe have a support group

Shawn Averkamp (NYPL):
Researchers: The metadata we use gives illusions of completeness (it’s not all there), granularity (folder vs. item), consistency (results change from one search to another), and authority (we’ve decided what’s most important to capture) -- if we don’t look at data frameworks critically, we are doing a disservice to our researchers.

Bergis Jules (UC-Riverside):
Activists: It can be dangerous for activists to have their data collected (social media mining companies, police working with social media companies) most users don’t know the risks, so many complexities could be tempted to bypass ethics, and high level of vulnerability for marginalized people.

What is the coolest thing about your Collections as Data work?

Micki Kaufman (CUNY)
3D Modeling: Mass data scraping exercise, charting 40 subjects mapped across time from the National Security Archive at George Washington

Inna Kouper (Indiana University)
Measuring representation in large repositories: How well does Hathi Trust capture human knowledge (15 million records) over 2000 years. Used library classifications to see languages, topics, countries of origin -- to see gaps and strengths in the collection.

Greg Cram (NYPL)
Exposing collections: U.S. Copyright Office’s virtual card catalog (45 million) cards coming online (blocks of text to be broken apart), allows us to find rights holders. NYPL menu collection going online allows for crowdsourcing transcription and geolocation, everything is downloadable

Laurie Allen (UPenn)
Data Rescue/Endangered Data movement around current presidential administration: also allows us to break open the library and let the light in

How have you implemented Collections as Data?

Meghan Ferriter (LoC)
Resource Sharing: Memory Labs offer tools and training to students and lifelong learners.

Mary Elings (UC-Berkeley)
Strategic Partnerships: in order to solve complex legacy issues - unique problem as early adopters, so many heavy text based digital project from 20-25 years ago

Helen Bailey (MIT)
LibGuide: sharing library APIs, also shared a nuanced examination of their process for maintaining data in highly customized systems and allocating resources in a measured way

Veronica Ikeshoji-Orlati (Vanderbilt)
Data Ecosystem Framework: thinking through the resource suck of managing current project vs. legacy projects, prioritizing content that exhibits intellectual labor, and capable of reuse. Also hosts working groups and seminars to share tools and talk to humanists about building and sharing datasets - strong sense of sustainability and minimalism.

Tools:
Facial Recognition Scripts (Padilla)
RDF Open World Framework (making collections from different repositories searchable; NYPL, Princeton, others) (Averkamp)
VisColl (xml files that describe individual leaves of a manuscript) (Porter)
Omeka (no faceted searching) (Porter)
Mallet (topic modeling) (Kaufman)
Parallax (stimulate 3D) (Kaufman)
JSON (Ferriter)
Jupyter notebooks (Ferriter)
Glitch (Ferriter)
OpenRefine (used to clean up data) (Ikeshoji-Orlati)
Xquery (Ikeshoji-Orlati)

Readings:
John Unsworth, Scholar Primatives (Padilla)
Chela Scott Weber, OCLC Research Position Report (Data and Special Collections) (Padilla)
Santa Barbara Statement: Collection as Data, Forum 1 (Padilla)
Algorithms of Oppression: “outcomes and results > intent”, Safiya Noble (Averkamp)
Artists in the Archive podcast: Episode 5 ? (Allen)
Johanna Drucker, Captured (Capta) data from researchers
Ithaka Survey: library trends toward open access, data management, and instruction

Collections/Projects:
Jerome Robbins Collection @ NYPL (digitized and featuring Carmen DeLavalade) (Averkamp)
Digital Walters (LoC subject headings) (Porter)
OPenn (local keywords) (Porter)
BiblioPhilly (discrete keywords) (Porter)
LC for robots (APIs for libraries) (Ferriter)
Lomax Collection Visualization (Ferriter)
Congressional Data Challenge (Ferriter)
“Inside Baseball” program coming this summer (Ferriter)
Free Speech Movement data hackathon (Elings)
ArchExtract (supported arrangement and description of large archival collection) (Elings)
New Netherlands Project (data cleanup) (Elings)
Algorithmic Justice League (Bailey)
Charles Baudelaire (Ikeshoji-Orlati)




Comments

Popular Posts