This event has passed.

Transforming Classical Chinese Texts into Searchable Databases with AI

Name: Transforming Classical Chinese Texts into Searchable Databases with AI
Start: 2024-11-07T12:00:00-05:00
End: 2024-11-07T13:00:00-05:00
Location: CGIS South Room S354

November 7, 2024 @ 12:00 pm – 1:00 pm

Speaker: Guenther Lomas, Founder, Sigtica

As artificial intelligence becomes integral to the digital humanities, it offers innovative methods that transform research capabilities and uncover new insights into historical texts and cultural narratives. This talk will demonstrate how AI-powered pipelines can process large volumes of unstructured classical Chinese texts, such as genealogies and Qing dynasty government employee records, including those from the Da Qing jin shen quan shu, into organized, searchable databases.

The pipeline addresses a longstanding challenge in classical Chinese studies: the labor-intensive manual data entry process. It is designed to efficiently process millions of pages from historical Chinese texts, tackling complexities like layout identification and precision in text extraction. Central to this effort is customized Optical Character Recognition (OCR), which enhances data extraction accuracy and identifies key fields using Named Entity Recognition (NER) models. The result is clean, tabular databases that improve accessibility, allowing researchers to analyze Chinese historical content with unprecedented efficiency. Furthermore, this methodology holds potential applications for other languages, including Japanese, Korean, Arabic and Latin, broadening its impact.

By exploring these methodologies and their implications, this presentation aims to show how integrating advanced technological tools enriches scholarly inquiry in the digital humanities, providing deeper insights into patterns and narratives within Chinese history and beyond. This approach promises to revolutionize data collection, paving the way for alternative research practices across various linguistic contexts.

Lunch will be provided. Registration required

Details

Date: November 7, 2024
Time:
12:00 pm – 1:00 pm
Event Category: Special Event
Website: https://forms.office.com/r/BD82VZ6r2L

Organizer

: Digital China Initiative

Venue

CGIS South Room S354

1730 Cambridge St
Cambridge, MA 02138 United States

Transforming Classical Chinese Texts into Searchable Databases with AI

Details

Organizer

Venue

CGIS South Room S354

Subscribe to the Events Newsletter

Event Navigation