(Ayush Kumar Shah, Bryan Amador, Abhisek Dey, Ming Creekmore, et al., 2023-24) Tool for extracting and recognizing molecules in PDF files, generating CDXML (ChemDraw), SMILES, and SVG output.
GitLab repo: https://gitlab.com/dprl/graphics-extraction/-/tree/icdar2024
(R. Zanibbi, 2023) Avoids the need to repeatedly add/remove print, input, and assert statements to check values and types, and provides functions to record and report execution times when our program requirements keep changing, and bugs abound. GitLab repo: https://gitlab.com/dprl/msg_debug
(B. Mansouri et al., 2022) Created over three years for CLEF 2020-2022, this collection contains over 200 topics (i.e., queries with evaluated search results) for both retrieving answers to questions posted in Math Stack Exchange, and retrieving relevant formulas using a formula from question posts as a query (i.e., contextual formula retrieval). In the third year of the task, there was also an open-domain question answering task. Search results from submitted systems were evaluated by undergraduate and graduate students from RIT.
(B. Mansouri and Matt Langsenkamp, June 2022) MathFIRE. An OpenSearch retrieval model implementation using faiss for fast retrieval of formulas embedded using TangentCFT. GitLab page
(B. Mansouri, Oct. 2019) TangentCFT. An embedding-based formula search engine. Tangent-CFT embeds representations of formula appearance and semantics in fixed length vectors using fastText. Retrieval is performed using cosine similarity over the vectors. The system obtains very high coarse/partial similarity scores on the NTCIR-12 Formula Browsing Task, and when combined with Approach0 exceeds the state-of-the-art (ICTIR 2019 paper). GitLab page -- **Old** GitHub page
(W. Zhong, Jan.2019) A new formula search engine using paths in operator trees (representing operations in a formula), with support for multiple subexpression matches. Released as a companion to Wei's ECIR 2019 paper describing the system. The systems obtains state-of-the-art results for queries without wildcards in the NTCIR-12 Formula Browsing Task.
(K. Davila, Nov. 2017) Generating keyframe summaries of lecture videos containing only whiteboard contents. The system works with single-shot recordings of lecture videos. Released as a companion to Kenny's ICDAR 2017 paper on the same work. This work was later used to support keyframe-based video navigation, and cross-modal visual math search (for the Tangent-V (visual) search engine; details: K. Davila's PhD dissertation).
Please Note: the files below are quite large, in part so that others have a better chance to replicate our results at NTCIR-11 (2014; NTCIR-11 paper)
(S. Zhu, Apr. 2016) The code below was used to produce the results published in Siyu Zhu's 2016 CVPR paper, A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification, which obtained state-of-the-art results on the ICDAR 2015 Focused Scene Text Detection task at the time of publication.