Achievement

Researchers explored and leveraged the OpenCLIP model on the iDigIBiO dataset consist of biological images . The OpenCLIP model convert inputs (text or images) into feature vectors in a shared embedding space. These vectors capture semantic information and can be used to compare the similarity between different inputs.

Tools

  • Redis: A buffering solution for incoming data from data sources and a system for queuing data in a reliable and scalable manner. The ingestor worker, (e.g iDigBio) can push incoming data to a Redis queue and process it asynchronously
  • MongoDB: Handles the storage and retrieval of transformed data from ingestor workers. It provides a flexible, document-oriented database that is well-suited for managing and querying semi-structured or unstructured data.
  • OpenCLIP Model: Vectorizes data inputs, including text and images. This process involves converting these inputs into high-dimensional vectors that capture their semantic meaning, enabling effective similarity searches and comparisons.
  • PostgreSQL: Stores the vectorized data produced by the OpenCLIP model. As a powerful relational database management system, PostgreSQL provides robust support for storing, querying, and managing structured data, including the high-dimensional vectors generated by the model.
  • Vue.js: Implements features on the front end of the application. Vue.js is a progressive JavaScript framework used for building user interfaces and single-page applications, allowing for dynamic and interactive web experiences.

Presentation Link