The Document and Pattern Recognition Lab (dprl) started in Summer 2007. We research technologies for extracting and searching graphics and text in documents and videos, with an emphasis on math notation and chemical diagrams. Along the way, we've also done work on Video CAPTCHAs, recognizing music notation, text detection, and evaluating structural pattern recognition systems. Details can be found in our pages listing publications and software/data.
Our work involves multiple areas of Computer Science, including Information Retrieval (IR), Pattern Recognition and Machine Learning (ML), and even some Human-Computer Interaction (HCI)
We have created state-of-the-art math recognition modules and math formula search engines, ran the ARQMath labs for CLEF, and ran the CROHME handwritten math recognition competitions at ICDAR and ICFHR. We also created the MathDeck math-aware search engine, as well as the ChemScraper PDF molecule extraction tool in collaboration with the Denmark Lab (UIUC) and NCSA. For details, see our overview of past projects.
University students of all levels (BSc, MSc, and PhD) with a wide variety of interests and backgrounds have worked in the lab. See our Members and Thesis/Project pages for details.
Support. The dprl has been supported by a number of organizations, including the NSF, Xerox, Google, and the Alfred P. Sloan Foundation. Please see our support page for details.
Present & Past Projects
NSF-funded AI Center aiming to democratize molecule making
Math-aware search project funded by NSF & Sloan Foundation
Online tool for extracting molecules from PDF (*in development)
CROHME 2019 + TFD competition at ICDAR 2019 (handwritten math recognition, typeset formula detection)
NTCIR-12 MathIR Lab (Math-aware search tasks)
AccessMath math formula search and navigation in video
DTW-based within-speaker audio search in math lectures
min math-aware search interface and multi-modal editor
Video CATPCHA
More Information About the Lab
dprl, Fall 2018
Left-to-right: Parag, Puneeth, Wei, Behrooz, Mahshad, Thomas, Abishai, and Dr. Zanibbi.
Inside the Lab
Wei, Mahshad and Thomas taking a break, trying out our ping-pong set, Fall 2018.
dprl Potluck
Picture of the food at our first potluck! The dishes originate from China, France, Honduras, India, Iran, Canada and the USA. DPRL carrot art: Behrooz Mansouri.
Math-Aware Search Researchers
SIGIR 2016, Pisa, Italy Left-to-Right: Richard Zanibbi, Kenny Davila, Moritz Schubotz, Iadh Ounis, Bela Gipp, and Lingcai Gao
The dprl and CUBS (Univ. Buffalo) hosted ICFHR 2018 in Niagara Falls, USA. Richard Zanibbi was a Co-Chair; Mahshad Mahdavi, Thomas Choi, and Kenny Davila assisted with running the conference. Details can be found on the ICFHR 2018 website.
ECIR 2019, Cologne, Germany for Structural Similarity Search for Formulas using Leaf-Root Paths in Operator Subtrees by Wei Zhong and Richard Zanibbi
dprl, Spring 2021
rows LR, top-down: Abhisek Dey, Prof. Zanibbi, Matt Langsenkamp, Behrooz Mansouri, Yancarlos Diaz, Robin Avenoso, Ayush Kumar Shah
dprl, May 2022
Left-to-right: Ayush Kumar Shah, Matt Langsenkamp, Richard Zanibbi, Behrooz Mansouri, Abhisek Dey, and JP Ramissini
ICDAR 2023 (San Jose, Aug.)
Dr. Zanibbi was a Program Co-Chair alongside Gernot Fink, Rajiv Jain, and Koichi Kise. Ayush Kumar Shah participated in the Doctoral Consortium and presented his work on recognizing math from images.
ICDAR 2024 (Athens, Greece)
Ayush Kumar Shah presenting his journal track (ICDAR/IJDAR) paper on the ChemScraper born-digital and visual parsers. This work was a collaboration between computer scientists at RIT and chemists at UIUC through the MMLI NSF AI Institute. Paper: Shah, A.K., Amador, B., Dey, A., Creekmore, M., Ocampo, B., Denmark, S. and Zanibbi, R. ChemScraper: leveraging PDF graphics instructions for molecular diagram parsing, IJDAR 27: 395-414.
Math IR Book
Richard Zanibbi, Behrooz Mansouri (dprl alumnus), and Anurag Agarwal have completed a draft manuscript for Foundations and Trends in Information Retrieval. The final book will appear early in 2025. (draft arxiv link)