Needles in a Digital Haystack: Improving Digital Archive Research

Posted by Cole Crawford
Digital Futures Discovery Series, Events

March 26, 2019
3:30pm-4:30pm
Discovery Bar, Cabot Science Library

Please join us for the March installment of the 2018-2019 Digital Futures Discovery Series, a year-long program led by Harvard's Digital Futures Consortium that explores the ongoing transformation of scholarship through innovative technology.

This month's presentation showcases the groundbreaking techniques scholars are developing to tackle the increasingly challenging task of conducting digital research. Manually searching through more than 46 million digital records is an intractable research task. This is what lay ahead of Ben as a Digital Humanities Associate Fellow at the United States Holocaust Memorial Museum, as he sought to aggregate the death certificate reference cards of individuals who perished in concentration camps during the Holocaust—cards scattered throughout the International Tracing Service digital archive. By automating the retrieval of data through template matching and machine learning, Ben used every-day technology to power his algorithm and produce results with 100% precision and accuracy. To find needles in a digital haystack, Ben built a magnet, and in his talk he’ll share how he did it.

Benjamin Charles Germain Lee is currently in the Ph.D. Program in Computer Science & Engineering at the University of Washington. He graduated summa cum laude from Harvard College in 2017, where he received the Thomas T. Hoopes Prize for “extraordinary undergraduate research,” was named a Harvard Undergraduate Science Research Fellow and a John Harvard Scholar, and was later named a visiting fellow in the Department of History.

Digital Futures Consortium

  • Digital Futures Consortium

A GitHub Pages port of https://www.digitalfuturesconsortium.org/