Project Updates – Fall 2024 – Natural Florida History Museum

WEEK 1

Today, we had a kick-off meeting for the collaboration between research groups at Georgia Tech and the University of Florida under the supervision of Dr. Arthur Porto, Dr. Jose Fortes, and Breanna Shi, PhD Student. The discussion focused on the design and implementation of an AI infrastructure for biology as part of the ongoing BioCosmo project. Key components of this infrastructure include a data ingestor, machine learning model, vector database, and frontend interface.

Thomas Deatherage, one of the main researchers from the summer cohort, provided a summary of the milestones and deliverables achieved thus far, and briefly addressed challenges related to computational resources and server management infrastructure. Based on the collective insights, we all agreed to integrate a new feature leveraging an LLM layer as a query controller. The tasks were then assigned to collaborators, and consensus on the tools and communication strategies to be used.

WEEK 2

Meeting Link

Weekly Progress:
- Thomas: Fixing bugs; preparing for tomorrow’s meeting.
- Vy: Learning Vue Framework through online tutorials.
- Roman: Improving Postgres startup in Docker; plans to refactor Vue code into class-based components.
Upcoming Meeting:
- Scheduled with collaborators from University of Florida and Dr. Porto on August 29.
- Focus expected on project hosting and team role clarification.
Future Developments:
- Application using open-source CLIP model with no fine-tuning.
- Future tasks may include model comparison and targeted training if resources allow.
Task Management:
- Discussion on using a tool for asynchronous work.
- Roman suggested GitHub projects over Jira for simplicity.
Weekly Meetings:
- Confirmed for Wednesdays at 6 PM.
- Roman will prepare a slide deck for team presentations on weekly accomplishments.

WEEK 3

Group Weekly Progress

Roman: Worked on frontend refactoring, researched models by Dr. Porto, and addressed startup issues with devcontainers; limited ML work due to communication gaps with post-grad collaborators.
Thomas: Collaborated with UofF on configuring Docker Compose with Salt for deployment and improved backend robustness by adding connection pools.
Vy: Improved the frontend but encountered GitHub access issues and local Docker image problems; received team assistance.

Catch-up meeting with Dr.Porto

The project is currently using OpenCLIP for image embedding and searching but is exploring additional models like BioCLIP and Florence-2 for improved results in species classification.
Dr. Porto highlighted Florence-2’s versatility for tasks like classification and segmentation, suggesting its integration with the VML4Bio dataset for experimentation.
The team will start with zero-shot evaluation, assessing model predictions without fine-tuning, before moving to fine-tuning with datasets. Performance will be measured on tasks such as species classification and trait identification.

WEEK 4

Meeting Link

Thomas Deatherage: Updated on working with pipelines and database; discussed the pros and cons of different database versioning tools
Romouald Dombrovski: Shared insights and updates on working with the Florence model, including its object detection and captioning capabilities; addressed challenges with local execution of the Florence model and database versioning; compared Florence’s performance with other models and shared script results.
Vy Nguyen: Update on experience with inference from Florence-2 on sample images; shared plans to continue working on dataset training.

WEEK 5

Meeting Link

Thomas Deatherage: Merged https://github.com/BioCosmos-AI/BioCosmos/pull/2; should we archive https://github.com/Human-Augment-Analytics/NFHM ?; will begin looking into and experimenting Florence-2 https://blog.roboflow.com/florence-2/ ; focusing on trait grounding and trait referring, per our plan with Dr Porto; Will be absent starting tomorrow (9/18) to Monday (9/23)
Romouald Dombrovski: Read up on Florence-2 in the paper; updated VLM4Bio scripts to be more useable by the BioCosmos team; fixed bugs with index out of range on the classifier script; investigating embeddings generated by Florence-2
Vy Nguyen: After detecting species using Florence-2, I continued to explore using another model (Mask R-CNN with a ResNet-50 backbone) to perform segmentation for trait identification. This segmentation will be used for purposes such as trait counting and identification; update our website

WEEK 6

Meeting Link

Thomas Deatherage: Updated on working with Florence-2 grounding performance evaluation and benchmarking; noted a blocker due to the inability to find segmentation datasets in VLM4Bio and contacted the author on 9/25, awaiting a response; found the Fish-Vista dataset for fish segmentation created by VLM4Bio’s author’s PhD advisor, and has some “almost working” code for it; next goal is trait referral.
Roman Dombrovski: Shared updates on using Dr. Porto’s code to create an evaluation pipeline with NN and cosine similarity for comparing image encoding; ran the pipeline to assess accuracy across individual taxa and all taxa for three models of interest; experimenting with a larger dataset (~200 images/taxa), stratification for uniform representation, different measurement methods (e.g., Euclidean), adding a new model (ArborCLIP), and implementing k-NN with k=5.
Vy Nguyen: Updated on experiences with inference from Florence-2 on sample images; shared results from her experiment on the VLM4Bio bird dataset for trait detection and counting, initially using <OPEN_VOCABULARY_TEXT> but receiving empty label polygons; switched to <DENSE_REGION_CAPTION>, which provided better descriptive details and captured common traits; ran Florence-2 with <DENSE_REGION_CAPTION> as the prompt-task for creating training and validation datasets, and is about to evaluate model performance on the testing image dataset.

WEEK 7

Meeting Link

Thomas Deatherage: Conducted performance evaluation of trait grounding and referral of the Florence-2 model against Fish-Vista; focused on trait grounding and referral as per the plan with Dr. Porto.
Romouald Dombrovski: Evaluated model performance metrics; compared image embeddings for Florence-2, Openclip, Bioclip, and Arborclip across 10,000 images of Bird, Butterfly, and Fish taxa; used cosine similarity and Euclidean distance for species and genus comparisons.
Vy Nguyen: Examined the CUB-200-2011 dataset for bird traits; generated a ground truth dataset using PIL; applied Florence-2 with prompt <MORE_DETAILED_CAPTION> for detailed image analysis; merged results with the ground-truth dataset and evaluated model accuracy, precision, recall, and F1-Score.

WEEK 8

Meeting Link

Thomas Deatherage: Worked on fine-tuning Florence-2 for trait grounding; faced challenges with weights, but plans to compare model versions; focused on setting up Intern VL for grounding and referral tasks and benchmarking against Florence-2.
Romouald Dombrovski: Integrated Intern VL into the model evaluation pipeline; refactored frontend code into Vue components; started testing image retrieval with VLM for bio; plans to further explore UNICOM and image retrieval tasks.
Ben Yu: Assisted with LLM tasks part-time; integrated OpenAI with bio data tools, demonstrated querying iDigBio; plans to fine-tune Intern VL for tool usage.
Breanna D: Progressing with Docker and Vue, focusing on her thesis proposal until December; proposed holding practice sessions to refine her proposal arguments with team feedback.
Arthur Porto: Discussed background similarity’s effect on model performance; suggested improving database performance for LLM tools and encouraged Thomas to explore instruction tuning with LoRA adapters for LLM fine-tuning.
Elan Grossman: Shared insights on running LLMs efficiently on low-memory devices using model quantization; discussed packaging models for local deployment.

WEEK 9

Meeting Link

Romouald Dombrovski: Worked with UNICOM for clustering feature representations on the VLM for bio dataset; compared six models, highlighting UNICOM’s strong generalization compared to Intern VL; plans to train UNICOM using the Arboretum dataset.
Thomas Deatherage: Fine-tuned Florence-2 for segmentation grounding tasks, encountered inconsistent results; plans to integrate Intern VL into the development environment via Docker Compose and explore O Llama as an API wrapper.
Ben Yu: Experimented with “agents” allowing LLMs to execute tasks using external tools; adapted a tool for interacting with iDigBio to generate captions; plans to integrate clip embeddings as a tool for enhanced LLM search functionality.

Week 10

Meeting Link

Romouald Dombrovski:
Discussed the UNICOM codebase, explaining the workflow of clustering embeddings using the Faiss GPU library, and the creation of clustered samples. He streamlined his process by saving embeddings as .mat files and using prototypes (centroids) for training, which structure the process. He clarified terminology related to clustering and explained the training process, which involves both cluster comparison and feature dimension selection.
Thomas Deatherage:
Shared insights on caption generation using intern VL, aligning with Romouald’s clustering code. He plans to use smaller datasets initially to iterate quickly and improve the process. Thomas emphasized understanding the entire pipeline, starting from clustering and moving to full model training, and identified potential for their approach to excel in recognizing long-tail species.
Collaboration and Next Steps:
Both Romouald and Thomas discussed upcoming work sessions, focusing on further reviewing the code and pipeline. They aim to validate their method, particularly for identifying less common species, with plans to refine and scale the process. They also highlighted the importance of Dr. Porto’s guidance in shaping their research direction.

Week 11

Meeting Link

During the week, team members met with Dr. Porto and Moritz to present their progress on setting up the training loop for the UNICOM model. Thomas successfully created a pipeline for generating captions for images in the dataset. Initially, the training loop was not reducing loss as expected, but after some adjustments, the team managed to get it working correctly. This milestone marks a significant step forward in refining their pipeline and progressing with the overall project.

Week 12

Meeting Link

Romouald Dombrovski: Started by summarizing recent progress, including initializing a training loop with UNICOM code and experimenting with clustering techniques. He aims to train a UNICOM model specifically for a large ecological dataset. Later, he reviewed current performance results with Thomas, observing poor outcomes with purely visual data and suggesting testing different clustering methods to improve accuracy. Contributed to the discussion on enhancing clustering, exploring the potential of using different cluster counts and captions in training to improve model performance. Toward the end, he discussed his plans for staying involved with the project, expressing interest in continuing as a volunteer for personal growth and publication opportunities.
Thomas Deatherage: Provided insights on intern VL and UNICOM code, particularly regarding species classification and performance comparisons between the VLM for bio datasets and CLIP models, noting promising results. Shared detailed analysis on how data distribution affects performance by species, emphasizing that training sample size correlates with accuracy. Engaged in a discussion about balancing model training for different cases and explored the limitations of using only visual data. Later, he questioned the rationale in the UNICOM paper for not using pre-trained models and discussed the possibility of clarifying specific parameters within UNICOM’s repository to enhance replication of the methods.
Bree: Raised questions about trends in species performance, highlighting the challenges of fine-grained classification with imbalanced classes. She suggested different visualization techniques to address performance disparities for categories with few samples. Suggested strategies to balance the dataset, such as reducing samples for overrepresented species and implementing tailored approaches for specific tasks. Asked about the UNICOM approach, particularly regarding the use of clusters as pseudo-labels instead of traditional individual labels. Contributed to discussions on project planning, noting the importance of research continuity and recruiting for UI/UX and DevOps roles to support future project needs. Toward the end, she encouraged structuring multiple smaller publications and planned to consult Dr. Porto on publication strategy.

Week 13

Meeting Link

Romouald Dombrovski: Started by discussing attendance and balancing commitments such as PhD applications and volunteer work. Reviewed progress on saving results and comparing cluster sizes using evaluation metrics. Engaged in detailed discussions on generating embeddings, clustering methods, and evaluation workflows, emphasizing individual accuracy and average accuracy across classes. Proposed organizing scripts and evaluating hyperparameter settings as next steps. Toward the end, aligned with the group on prioritizing consistent evaluation methods and adapting Bioclip benchmarks for their analysis.
Thomas Deatherage: Shared insights into evaluation results with OpenCLIP models, noting better performance with larger cluster sizes. Discussed testing accuracy at different taxonomic levels and highlighted the need for consistent dataset splits for reproducibility. Contributed visualizations of recall performance, identifying discrepancies across categories and exploring potential causes like dataset labeling issues. Agreed to focus on distributed training setups on Hypergator and validate results with smaller datasets. Committed to collaborating on unified scripts and adapting Bioclip benchmarks for future work.
Breanna Shi: Joined discussions on hyperparameters and model evaluation methods, raising concerns about consistency and methodology clarity. Provided guidance on avoiding data leakage and aligning evaluation metrics with established standards like Bioclip. Highlighted the importance of unified evaluation scripts and consistent metrics for publication. Suggested focusing on comparable datasets to enhance model evaluation and positioning their paper for reviewers. Encouraged the team to structure their work for publication, balancing exploratory research with focused execution, and planned to consult their advisor on methodological decisions.

Week 14

Meeting Link

Arthur Porto: Shared updates on writing a grant proposal and discovering a new Hugging Face dataset, “Intern VL 2.5.” Proposed repurposing BioClip’s evaluation scripts for intra-modality comparisons and aligning methodologies with benchmarks. Emphasized the importance of grounding models in textual data to reduce hallucinations and improve interpretability. Recommended ICCV and ICLR as publication venues, with tight timelines for submission. Suggested balancing novel methods with benchmark adherence and prioritizing image-based clustering models for the current project.
Romouald Dombrovski: Presented cluster evaluations on the VLM for Bio dataset, highlighting an optimal cluster size of 2,000 and comparing it to BioClip’s superior performance. Discussed challenges with weaker models (ViT-B/32) and computational constraints. Collaborated with Thomas on SLURM scripting for dataset downloads and proposed alternatives like Git LFS. Shared insights into species distribution, revealing sparsity for certain classes and its impact on clustering.Worked on aligning dataset splits and refining evaluation scripts for consistency.
Thomas Deatherage: Analyzed discrepancies in evaluations due to dataset splitting and stratification differences. Highlighted performance issues with specific species, such as birds, due to limited samples and varied conditions. Explored SLURM alternatives for data download efficiency and strategized organizing the Tree of Life dataset. Contributed to discussions on adapting BioClip benchmarks and balancing new methods with established practices.
Moritz D. Luerig: Explored captioning experiments with different tiling configurations and their effects on image distortion and model performance. Investigated computational challenges with larger models, including the quantized 76-billion parameter version. Suggested grounding models in text to enhance clustering accuracy and reduce hallucinations. Collaborated on balancing model size, captioning accuracy, and computational constraints in experiments.
Breanna D. Shi: Provided feedback on aligning evaluations with BioClip’s metrics for consistency. Raised concerns about avoiding data leakage and maintaining methodological clarity. Reiterated the importance of adhering to established benchmarks for publication. Encouraged balancing exploratory research with focused execution and supported publication planning for ICCV and ICLR.
Key Next Steps: Refine evaluation scripts and align with BioClip’s methodology. Finalize dataset preparations and strategize downloading larger datasets like Tree of Life. Prioritize image-based clustering models for the upcoming paper while leaving text-based enhancements for future work.Target ICCV as the primary publication venue, with ICLR as a backup.