LVM Models Plan
As the project currently uses OpenCLIP as the model for image embedding and searching, we are exploring additional models like BioCLIP, Florence-2, and others to improve results, such as species classification.
Dr. Porto mentioned Florence-2, highlighting it as a versatile model capable of various tasks like classification, captioning, visual grounding, and segmentation. The team discussed integrating it for experimentation, especially with the VML4Bio dataset.
Zero-Shot Evaluation:
The team will initially evaluate the models using zero-shot tasks, meaning the models make predictions without task-specific fine-tuning, then move on to fine-tuning with datasets
Model Performance Evaluation
Models’ performance will be evaluated on tasks such as species classification, trait identification, etc…