WEEK 1

Meeting Link (01/06/2025)

Summary of Meeting
  • Project Roles & Responsibilities 
    • Two primary project tracks: 
      • Bird Audio – Assigned to Andrei Hushcha and Vanessa Prema; plan to explore use of BirdNet or similar tools to analyze audio data. 
      • Bird Computer Vision (CV) – Assigned to Scott Walters, Bina Patel, and Kaushika Mohan; goal is to build classifiers for bird identification from video footage. 
  • Data Sharing 
    • Pranav has videos ready to share and will describe or provide audio files as needed. 
    • Ben and Pranav will coordinate on making files available via Slack (Bird CV channel). 
  • Project Scope & Goals 
    • Identifying birds in both audio and video data to understand ecological impacts, especially in high-elevation Himalayan environments. 
    • Emphasis on open communication between the ecology perspective (Ben, Pranav) and the machine learning/computer vision perspective (Scott, Bina, Andrei, Vanessa, Kaushika). 
  • Importance of Collaboration 
    • Ben reiterated the need for clear, frequent communication. 
    • Team members should ask questions to ensure accurate understanding of objectives and technical requirements. 
Action Items
  1. Pranav will share all relevant data (video files and a sample of audio files) in the designated Slack channel. 
  2. Audio Team (Andrei, Vanessa) will begin exploring how to run BirdNet (or similar) on a subset of the audio recordings. 
  3. CV Team (Scott, Bina, Kaushika) will begin examining the video samples to plan an approach for image classification/detection. 
  4. Ben will post a summary of these action items in Slack, and the team will continue to coordinate in the Bird CV channel. 

WEEK 2

Meeting Link (01/17/2025)

Summary of Meeting
  • Introductions & Backgrounds: 
    • Andrei and Vanessa introduced themselves as the audio‐classification team, sharing their academic and professional backgrounds. 
    • This was Vanessa’s first research project, and she expressed enthusiasm about learning more. 
    • Marco asked about participants’ backgrounds, and they briefly described their current roles and fields of study. 
  • Audio Classification Project: 
    • Andrei and Vanessa mentioned they had an initial discussion the previous day about how to approach the audio classification problem. 
    • They plan to outline a strategy and next steps for the project, including data gathering and defining a methodology. 
  • Advisor/Coordinator Notes: 
    • Breanna indicated this was the first meeting with the comp advisors and researchers together. She will remain available through Slack, but may not attend all subsequent meetings unless needed. 
    • Going forward, the weekly meetings will focus on progress updates from the researchers (Andrei, Vanessa, others), discussion of any issues, and setting goals for the next week. 
Action Items
  1. Audio Team Next Steps: 
    • Andrei and Vanessa will continue refining their plan for the audio classification project and prepare updates for the next meeting. 
  2. Progress Updates: 
    • At the next session, the researchers (Andrei, Vanessa, etc.) will provide a summary of what they accomplished since this meeting. 
  3. Communication with Breanna: 
    • If the audio team or other researchers need further support, they can reach out to Breanna via Slack or request her presence at a future meeting. 
  4. Documentation Exchange: 
    • Marco and Breanna will coordinate on receiving any project documents from the students and sharing them with the advisors as needed. 

WEEK 3

Meeting Link (01/24/2025)

Summary of Meeting
  • General Updates & Background Discussion
    • Pranav Gokhale provided background information on ecological bird studies and the purpose of the research.
    • The team discussed the expected data and analysis methods to ensure alignment.
    • Everyone is now on the same page regarding the study’s goals and expected outputs.
  • Image Analysis Challenges
    • Scott Walters mentioned difficulties in identifying birds in some images.
    • The team acknowledged the need for possible confirmation on certain images.
  • Proof of Concept & Data Selection
    • The team plans to start with a proof of concept on one nesting site before expanding.
    • Pranav suggested selecting smaller folders with higher detection rates for initial testing.
  • Website Management
    • Vanessa Prema is managing the project website and has drafted an introductory blurb based on the provided presentation.
    • Team members were asked to submit any additional content or corrections for the website.
    • Discussion on whether the CV team has a separate webmaster; Scott confirmed they do, but little content has been added yet.
    • Vanessa offered to assist with website updates for both teams if needed.
  • Upcoming Research Talk
    • Pranav will give a talk next Thursday about the research project.
    • He will check if there is a Zoom option available for remote participants.
    • He requested two slides summarizing team members and their roles to include in the presentation.
  • Communication & Slack Updates
    • The team will continue discussions on Slack for any follow-ups.
    • Vanessa thanked Pranav for his engagement and support in Slack discussions.
Action Items
  1. Pranav Gokhale
    • Check if a Zoom option is available for the research talk.
    • Gather two slides summarizing team members and their roles for the presentation.
  2. Scott Walters
    • Continue testing the proof of concept on one nesting site.
    • Identify smaller folders with higher detection rates for initial trials.
  3. Vanessa Prema
    • Continue managing the project website and update it with meeting notes.
    • Collect bios and images if needed for the website.
    • Assist the CV team with their website if required.
  4. All Team Members
    • Review images where bird identification is unclear and determine if additional confirmation is needed.
    • Submit any website updates or content suggestions to Vanessa.
    • Provide team role details to Pranav for the research talk slides.
    • Stay active on Slack for follow-up discussions.

WEEK 4

Meeting Link (01/31/2025)

Summary of Meeting
  • Access to Cornell Lab Account 
    • The team attempted to get an account for Cornell Lab’s bird sound library but could not because they are not Cornell affiliates. 
    • The library in question may have H. bulbar (leaf warbler) recordings, which the team would like to use. 
  • Bird Sound Data Sources & Xeno-Canto 
    • The team currently uses Xeno-Canto for verified bird calls and discussion arose on leveraging BirdNet to confirm species from field recordings. 
    • Field Data vs. Xeno-Canto: 
    • Field data is large (up to 500 GB) but not fully labeled. 
    • Xeno-Canto files are smaller, cleaner, and come with verified labels. 
  • Classification Goals 
    • Primary interest: Distinguish between leaf warbler calls (song vs. buzz) and “not leaf warbler.” 
    • Secondary steps might include investigating other species that have similar calls to improve the model’s generalization. 
    • Discussion on how BirdNet’s performance should be measured and whether confidence thresholds (e.g., 0.9) are justified. 
    • Need to determine the best way to incorporate domain expertise (e.g., suggestions for which other species to include). 
  • Performance Evaluation and Labeling 
    • The group discussed the importance of having a small subset of manually verified labels to measure BirdNet’s accuracy and tune thresholds (precision vs. recall). 
    • The need to split data carefully into training/validation/test sets to avoid overlap from the same audio recordings. 
  • Data Processing Approach 
    • Agreement to use Python (e.g., Librosa or SoundFile libraries) to handle audio slicing and classification rather than manual audio software. 
    • The team will further explore how to automatically trim audio files and assign labels (e.g., 3-second clips with bird calls). 
  • Meeting Summaries and Weekly Reports 
    • Andrei and Vanessa hold frequent meetings and produce summaries; the team can decide which ones to upload to GitHub. 
    • A private GitHub repository will serve as a place to store meeting notes and project files so everyone has access. 
Action Items
  1. Evaluate Cornell Lab Options 
    • Determine if there’s an alternative method or contact at Cornell to obtain needed bird calls. 
    • If unsuccessful, seek comparable data sources or rely on existing Xeno-Canto resources. 
  2. Small Labeled Subset for Performance Checks 
    • Identify a subset of field data for manual verification. 
    • Use this labeled subset to tune BirdNet thresholds and measure performance. 
  3. Python-Based Audio Processing 
    • Explore appropriate Python libraries (e.g., Librosa, SoundFile) to slice audio recordings and manage train/validation/test splits. 
  4. Integrate Meeting Summaries on GitHub 
    • Upload relevant meeting notes and weekly reports to the private GitHub repository for archival and reference purposes. 
  5. Discuss Inclusion of Additional Bird Species 
    • Determine if adding data from similar species (recommended by domain experts) would help improve model generalization. 
Next Steps
  • Follow up on labeling efforts and threshold selection for BirdNet. 
  • Continue refining the data pipeline, especially the Python-based slicing and classification approach. 
  • Schedule a future meeting to review progress on data labeling, model training, and repository updates. 

WEEK 5

Meeting Link (02/07/2025)

Summary of Meeting
  • Vision Process and Model Accuracy
    • Pranav asked about the vision process. Andrei mentioned the current use of BirdNet with a confidence level of 0.9 to analyze data.
    • However, they don’t have model algorithms yet and are working with the data to evaluate its quality.
  • Building a New Model
    • Andrei explained that the next step is to create their own model based on known data (buses and calls) and run it to compare results with BirdNet.
  • Data and Annotations Progress
    • Scott shared that progress is being made in annotating frames for training data. He had initial issues with a poor annotation rate but found a better nesting site that improved the rate of true positives.
  • Computational Resources and Data Storage
    • Scott mentioned the potential need for Amazon S3 for data storage as external hard drives are becoming insufficient for the growing data needs.
  • Project Updates
    • Bina was running late but updated that she’s working on data with Yolo models, and plans to sync up with Scott and Kaushika based on Scott’s annotated data.
Action Items
  1. Pranav: Continue posting videos on Dropbox for sharing with the team.
    • Determine if there’s an alternative method or contact at Cornell to obtain needed bird calls. 
    • If unsuccessful, seek comparable data sources or rely on existing Xeno-Canto resources. 
  2. Scott: Keep annotating the frames for training and move towards fine-tuning the models next week with the annotated data. Explore additional data storage solutions, like Amazon S3.
  3. Bina: Sync up with Scott and Kaushika to work with the annotated data for model training.
  4. All: Stay in contact regarding the meeting schedules and resources, and reach out if there are any questions or need for further support.
    • Determine if adding data from similar species (recommended by domain experts) would help improve model generalization. 
Next Steps
  • The meeting ended with clear action items. The next steps include finalizing the annotated data and exploring storage solutions. The team will continue coordinating their efforts and keep each other updated on their progress.

WEEK 6

Meeting Link (02/14/2025)

Summary of Meeting
  • Confidence Levels and Misclassifications:
    • The group discussed how confidence levels a ect the accuracy of model predictions.
    • There were concerns about how to interpret a 24% misclassification rate, with opinions that the size of the list matters when judging accuracy.
    • The group also discussed the trade-o between using a high confidence threshold (fewer but more reliable samples) and a lower threshold (more samples but potentially more noise).
  • Data Quality and Cleanliness:
    • The data still requires cleaning and refinement to be useful, as it’s not currently reliable enough for accurate analysis.
    • Basic analysis has been done, but the data’s current state means the results are not final or useful yet.
  • Bird Song vs. Call Analysis:
    • Prema manually went through Xeno Canto files, adding columns to distinguish between bird calls and buzzes. However, further analysis is needed, particularly for files that contain both types of sounds.
  • Model Evaluation and Metrics:
    • The team debated whether using absolute numbers or percentages of misclassified species would be more helpful. They agreed that looking at the percentage of misclassified data across di erent thresholds would be more informative.5. Open Source Resources:
    • The team discussed the open-source nature of BirdNet, but it was noted that the Cornell Merlin ID application is not open-source, which might limit access to its data.
  • Data Clustering:
    • The team briefly mentioned clustering analysis but has not made significant progress on this task yet.
Action Items
  1. Investigate Confidence Level Discrepancies:
    • Further analysis will be done on how confidence levels impact model predictions. A histogram of confidence levels associated with zero labels (misclassifications) will be created to understand the distribution.
  2. Data Cleaning and Preparation:
    • The team will continue working on cleaning the dataset to ensure it’s ready for analysis. Given the current state, results are not reliable enough to show yet.
  3. Finalize Bird Song vs. Call Data:
    • Prema will continue working on categorizing bird songs and calls, and the team will later integrate this data into the analysis.
  4. Threshold Analysis for Misclassification:
    • Focus on analyzing the percentage of misclassified species at different thresholds and comparing them to understand model performance better
  5. Clustering Analysis:
    • Once data is cleaned, the team will revisit clustering analysis and its application to the bird call data.
  6. Consult Papers and Open Source Data:
    • Andrei will explore open-source papers and research on BirdNet and related resources to understand better how the data and confidence levels should be interpreted.
Next Steps
  • Focus on data cleaning and analysis over the next week, with plans to move into the model building phase after completing this foundational work.

WEEK 7

Meeting Link (02/21/2025)

Summary of Meeting
  • BirdNet Model Analysis:
    • The team discussed the current state of the BirdNet model, using it to analyze bird species based on confidence levels and local species lists. They are working to figure out what the confidence levels mean and how to use them effectively for species classification.
  • Confidence Thresholds:
    • They compared different confidence levels and observed that as confidence increases, the accuracy of species identification improves. However, there is a trade-off between losing more data and achieving higher confidence.
  • Species Misclassification:
    • Vanessa raised concerns about the misclassification of birds, particularly with species like the Pacific Ren, which is not native to the area but is detected with high confidence.
  • Audio File Analysis:
    • The audio files are processed in 15-minute chunks, with various species detected throughout. They discussed how to handle misclassified sounds, with one option being to only accept species detections with high confidence.5. Elevation Data: They discussed the importance of high-elevation data, which is still being collected. Pranav noted that the species distribution will change drastically at higher elevations, with fewer species detected.
  • Sound Pattern Analysis:
    • Andrei asked whether sounds from other birds around the Humes warbler could be useful. Pranav suggested looking for sound partitioning patterns, as multiple species might sing in overlapping time windows.
Action Items
  1. Set Confidence Threshold:
    • The team agreed on using a confidence threshold between 0.8 and 0.9 for species identification, balancing data accuracy with completeness.
  2. Handle Misclassifications:
    • Species not native to the area, like the Pacific Ren, should be excluded from the dataset.
  3. Continue Data Collection:
    • Pranav will work on gathering high-elevation data, which will provide a more complete picture of species distribution.
  4. Review Audio Windows:
    • Andrei will experiment with different audio window sizes (e.g., 3, 6, and 9 seconds) to detect patterns in bird sounds and determine if interactions between species can be observed.
  5. Build Model for Humes Warbler:
    • Once the data is cleaned and the confidence threshold is set, the team will build a model based on Humes warbler sounds, running it through the collected dataset and evaluating its performance.
  6. Slack Communication:
    • Pranav will be intermittently available on Slack due to an upcoming exam but will continue monitoring updates and providing feedback.

WEEK 8

Meeting Link (02/28/2025)

Summary of Meeting
  • Model & Confidence Levels:
    • The team is working on extracting the top 3 species probabilities (highest, second, and third) from the BirdNet model. If they can’t implement this in Python, they’ll proceed with the 0.8 confidence threshold in the BirdNet app.
  • Accuracy Estimation:
    • Annan suggested estimating the model’s true accuracy using assumptions about misclassification probability. This will help adjust thresholds and provide better accuracy estimates.
  • Data Labeling & Model Training:
    • Annan clarified that removing “unexpected species” from training won’t harm the model, but may a ect accuracy reporting.
  • Weekly Reports:
    • There were issues accessing weekly updates on Teams, which are being addressed.
  • Bird Behavior Insights:
    • Vanessa shared that birds do not overlap in certain areas, which simplifies data filtering.
Action Items
  1. Andrei & Vanessa:
    • Finalize the BirdNet model analysis, and if possible, implement top 3 probabilities in Python.
    • Continue adjusting thresholds and cleaning data based on Annan’s feedback.
  2. Annan:
    • Provide a formula for estimating total accuracy based on misclassification assumptions.
    • Review and give feedback on the weekly reports.
Next Steps
  • Annan will review reports and provide feedback soon.
  • The team will continue refining the model and resolving the Teams file access issue.