Projects – CS 8803 LRV – Large Scale & Real-Time Visual Analysis

Overview

A major component of this course is a semester-long research project. The project has several goals:

Propose a graduate-level research project in writing, creating a clear roadmap of what you will explore and the challenges you expect to address
Execute your proposed research by collecting or creating datasets, building cloud/edge infrastructure, and gathering performance metrics
Present your research at various stages, from the initial proposal to the final results
Incorporate feedback throughout these stages

The project is divided into three phases: proposal, midterm, and final. All three phases include a presentation, while the proposal and final also include written components. Presentations serve as checkpoints, giving you the opportunity to share progress and receive feedback from both instructors and classmates.

Projects are done in groups of 2-3 students.

Proposal (Week 4)

The proposal should be 2-3 pages (single column, minimum 10pt font size) and include the following:

Motivation: Why is this relevant? Describe the application, use case, and real-world implications. What aspect are you focusing on, and why is it worth researching?
Project summary: How will you address the problem? Outline your approach, success metrics (e.g., latency, throughput, cost), what you plan to build, and the measurements you’ll take. A high-level overview is fine at this stage
Relevant work/baselines: What prior work, systems, or algorithms exist? Summarize their contributions, limitations, and assumptions. Explain why further exploration is needed. If you’ll build on these systems, note that (especially if open-source)
Datasets/benchmarks: What datasets will you use? If known, describe them. Otherwise, specify the type of data you’ll need
Resources: What resources will you use (e.g., cloud, laptop, university servers)? This helps staff support your project
Timeline: Propose a semester-long outline. It can evolve, but should give a clear roadmap (especially past the midterm)
Open questions: List any uncertainties or challenges you haven’t resolved.

Presentation: You will prepare an 8-10 minute overview about your project, covering the sections above. Make sure to include the open questions for discussion and feedback

Midterm (Week 10)

The midterm check-in is a presentation to share early results, demonstrate progress, get feedback, and receive help if you’re stuck.

Your 8-10 minute presentation should include the following:

Recap: Briefly remind the audience of your project and what problem you’re tackling
Approach overview: Describe what you’ve done so far-what’s worked, what hasn’t, and any key insights (especially counter-intuitive ones). Spend most of your time here, including any challenges or roadblocks
Next steps/timeline: Overview what remains to be done. Update your original timeline if needed
Open questions: Highlight areas where you’d like feedback or assistance. Questions we don’t get to can be addressed offline (either through office hours or by appointment)

Final (Weeks 14 and 15)

Report and Presentation

Report: Your final report should be written as a conference research paper. Please use the double-column ACM paper template linked below (either in Word or Latex) as a template. It should be 5-6 pages (not including references or division of work attestation). While the exact structure is not fixed, it should include the following:

Introduction: Problem you’re trying to solve, the challenges, why existing solutions are insufficient, and an overview of your solution
Background: Any relevant background the reader should be familiar with. It can also be a deeper dive into the motivation
Design/architecture: This may span multiple sections, but should cover your solution and corresponding insights. If you tried multiple options and some didn’t work, discuss them as well!
Evaluation: your results. Please make all graph text legible without significant effort (e.g., needing 10x zoom). Similar to Design/architecture, it’s encouraged to show and discuss “bad” or “suboptimal” results!
- Note: results should be explained and discussed beyond just “here are the numbers.” For bad/suboptimal results, discuss where you think you can improve or why they turned out as such
Relevant work: An overview of the relevant work in this space, and how your solution differs
Future work/limitations: Discuss known limitations of your work, as well as potential future directions to explore. If relevant, also discuss what new avenues of work are enabled by your work (e.g., a benchmark suite to develop new types of systems for video analysis)
- Note: this section is not optional: we want you to think about how someone can expand on your work, or what you might explore if given more time
Conclusion: A summary of the problem, your solution, and the evaluation
Division of work attestation: Explain the division of work among project members. This includes: dataset gathering, infrastructure setup, software development, presentation preparation, and proposal/final report writing. Describe any use of generative AI for your project here. Please refer to the course policy on the use of generative AI.

We recommend using the research papers you read in the course as examples for the sections, topics per section, and structure.

Presentation: Each group will be allotted a 20 minute slot (15 minutes for presentation + 5 minutes for questions). A general suggestion is to structure it as a shorter version of what you presented with your paper reviews. Do not try to present all of your results: focus on 1-2.

Software

You are required to submit working code as part of your project. This includes a README that describes how to run and replicate the results from your report, as well as any known limitations or corner cases. If you use any open-source datasets, please link them in your report and README.

Submission will be done on GradeScope, which allows you to either upload your code or link a GitHub repository and branch.

Example Projects

Note: these examples will need to be expanded on in your proposal. You are welcome to propose your own project, as long as it is relevant in the visual analysis space. If you are unsure if your proposed project is relevant, please discuss it with the instructors.

For edge-cloud systems, on metered network or cost-constrained environments, deciding how to use network budget is challenging, especially with video and audio involved

Consider different scenarios to model in terms of amount of data that is used, telemetry tracked, how frequently it is updated, etc. For example, with video you will want to consider bitrate, but also resolution, compression techniques, and FPS given the quality of video also matters. You will need to collect profiling numbers to build this model.
Build an algorithm to optimize this when considering different factors like numbers of devices, amount of video, and different network conditions (e.g., degraded networks)
Determine different ways to improve or decrease the amount of data transferred (e.g., filter more on an edge device). Build this into your model

Choose a type of model in visual analysis (e.g., VLMs, ALMs, translation, etc.) and characterize its use for visual analysis applications

Understand their accuracy, latency, power, resource consumption across a wide range of video/audio inputs and queries
What applications or use-cases would we use one (or more) of these models? Propose an algorithm/scheduler that optimizes for different queries and user requirements

Put together a benchmark suite and testing framework for evaluating different visual analysis systems or optimizers using different multimodal inputs (audio, video, etc.)

Consider the different characteristics that need to be studied and analyzed (e.g., length of input, content of audio, etc.)
Consider the underlying system-under-test framework
Evaluate different systems or approaches and report metrics of interest (throughput, latency, accuracy, etc.) using your framework
Alternative: explore dataset curation systems. Use Mixtera as inspiration for video/audio to meet cost/performance/accuracy goals

Explore context engineering for visual analytics

Choose a 2+ VLMs and understand how different video inputs (resolutions, FPS, bitrate, etc.), as well as contexts impact their accuracy for different queries
Based on your findings, propose a context management solution (e.g., a vector database or a RAG system) to provide the relevant context to improve accuracy/latency/throughput

Estimate how video/visual processing impacts carbon emissions

Consider the different stages in video processing and analysis
Consider different scenarios: edge, cloud, etc.
Consider different load patterns
CarbonExplorer can be a good starting point

Resources

Paper and presentation

Compute resources

PACE-ICE
Google Cloud (thank you to whom have provided compute credit resources for this course)

Datasets

VIRAT (traffic surveillance)
BDD100K (dashcamera footage)
SoccerNet (broadcasted soccer games)
YouTube-8M (diverse set of YouTube videos)
UCF101 (action recognition)
ActivityNet (human activity)
Microsoft Azure traces (for generating different load patterns)