Natural history collections are invaluable resources for scientific research, education, and public engagement. However, the sheer volume and diversity of specimens often make it challenging to efficiently search and retrieve specific items. The goal of this project is to develop a sophisticated search interface that leverages advanced machine learning techniques to embed images from natural history collections, enabling users to search the database using images or natural language queries.
Scope of Project
This project involves collaboration between research groups at Georgia Tech and the University of Florida, supervised by Dr. Arthur Porto, Dr. Jose Fortes, and PhD student Breanna Shi. The scope centered around the design and implementation of an AI infrastructure for biology as part of the ongoing BioCosmo project. Key components of this infrastructure include a data ingestor, a machine learning model, a vector database, and a frontend interface.
Thomas Deatherage, one of the main researchers from the summer 2024 cohort addressed challenges related to computational resources and server management infrastructure. Based on the collective insights, we all agreed to integrate a new feature leveraging an LLM layer as a query controller. The tasks were then assigned to collaborators, and consensus on the tools and communication strategies to be used.