English version of this page

daiR: OCR with Google Document AI in R

Optical character recognition (OCR) promises to open vast bodies of historical data to scientific inquiry, but OCR can be cumbersome when documents are noisy. The past 18 months have seen the launch of new OCR processors with vastly improved accuracy. In this seminar, Thomas Hegghammer will give an overview of the latest tools and present a new R package that offers access to the most powerful of them all, Google Document AI.

Arabisk tekst som bilde

The R package can be found at dair.info



Emneord: R, OCR, Political Science, MENA, Data Science
Publisert 8. apr. 2021 09:43 - Sist endret 6. sep. 2021 10:53