Overview

A major component of this course is a semester-long research project. The project has several goals:

  • Propose a graduate-level research project in writing, creating a clear roadmap of what you will explore and the challenges you expect to address
  • Execute your proposed research by collecting or creating datasets, building cloud/edge infrastructure, and gathering performance metrics
  • Present your research at various stages, from the initial proposal to the final results
  • Incorporate feedback throughout these stages

The project is divided into three phases: proposal, midterm, and final. All three phases include a presentation, while the proposal and final also include written components. Presentations serve as checkpoints, giving you the opportunity to share progress and receive feedback from both instructors and classmates.

Projects are done in groups of 2-3 students.

Proposal (Week 4)

The proposal should be 2-3 pages (single column, minimum 10pt font size) and include the following:

  • Motivation: Why is this relevant? Describe the application, use case, and real-world implications. What aspect are you focusing on, and why is it worth researching?
  • Project summary: How will you address the problem? Outline your approach, success metrics (e.g., latency, throughput, cost), what you plan to build, and the measurements you’ll take. A high-level overview is fine at this stage
  • Relevant work/baselines: What prior work, systems, or algorithms exist? Summarize their contributions, limitations, and assumptions. Explain why further exploration is needed. If you’ll build on these systems, note that (especially if open-source)
  • Datasets/benchmarks: What datasets will you use? If known, describe them. Otherwise, specify the type of data you’ll need
  • Resources: What resources will you use (e.g., cloud, laptop, university servers)? This helps staff support your project
  • Timeline: Propose a semester-long outline. It can evolve, but should give a clear roadmap (especially past the midterm)
  • Open questions: List any uncertainties or challenges you haven’t resolved.

Presentation: You will prepare an 8-10 minute overview about your project, covering the sections above. Make sure to include the open questions for discussion and feedback

Midterm (Week 10)

The midterm check-in is a presentation to share early results, demonstrate progress, get feedback, and receive help if you’re stuck.

Your 8-10 minute presentation should include the following:

  • Recap: Briefly remind the audience of your project and what problem you’re tackling
  • Approach overview: Describe what you’ve done so far-what’s worked, what hasn’t, and any key insights (especially counter-intuitive ones). Spend most of your time here, including any challenges or roadblocks
  • Next steps/timeline: Overview what remains to be done. Update your original timeline if needed
  • Open questions: Highlight areas where you’d like feedback or assistance. Questions we don’t get to can be addressed offline (either through office hours or by appointment)

Final (Weeks 14 and 15)

Report and Presentation

Report: Your final report should be written as a conference research paper. Please use the double-column ACM paper template linked below (either in Word or Latex) as a template. It should be 5-6 pages (not including references or division of work attestation). While the exact structure is not fixed, it should include the following:

  • Introduction: Problem you’re trying to solve, the challenges, why existing solutions are insufficient, and an overview of your solution
  • Background: Any relevant background the reader should be familiar with. It can also be a deeper dive into the motivation
  • Design/architecture: This may span multiple sections, but should cover your solution and corresponding insights. If you tried multiple options and some didn’t work, discuss them as well!
  • Evaluation: your results. Please make all graph text legible without significant effort (e.g., needing 10x zoom). Similar to Design/architecture, it’s encouraged to show and discuss “bad” or “suboptimal” results!
    • Note: results should be explained and discussed beyond just “here are the numbers.” For bad/suboptimal results, discuss where you think you can improve or why they turned out as such
  • Relevant work: An overview of the relevant work in this space, and how your solution differs
  • Future work/limitations: Discuss known limitations of your work, as well as potential future directions to explore. If relevant, also discuss what new avenues of work are enabled by your work (e.g., a benchmark suite to develop new types of systems for video analysis)
    • Note: this section is not optional: we want you to think about how someone can expand on your work, or what you might explore if given more time
  • Conclusion: A summary of the problem, your solution, and the evaluation
  • Division of work attestation: Explain the division of work among project members. This includes: dataset gathering, infrastructure setup, software development, presentation preparation, and proposal/final report writing. Describe any use of generative AI for your project here. Please refer to the course policy on the use of generative AI.

We recommend using the research papers you read in the course as examples for the sections, topics per section, and structure.

Presentation: Each group will be allotted a 20 minute slot (15 minutes for presentation + 5 minutes for questions). A general suggestion is to structure it as a shorter version of what you presented with your paper reviews. Do not try to present all of your results: focus on 1-2.

Software

You are required to submit working code as part of your project. This includes a README that describes how to run and replicate the results from your report, as well as any known limitations or corner cases. If you use any open-source datasets, please link them in your report and README.

Submission will be done on GradeScope, which allows you to either upload your code or link a GitHub repository and branch.

    Example Projects

    Note: these examples will need to be expanded on in your proposal. You are welcome to propose your own project, as long as it is relevant in the visual analysis space. If you are unsure if your proposed project is relevant, please discuss it with the instructors.

    For edge-cloud systems, on metered network or cost-constrained environments, deciding how to use network budget is challenging, especially with video and audio involved

    • Consider different scenarios to model in terms of amount of data that is used, telemetry tracked, how frequently it is updated, etc. For example, with video you will want to consider bitrate, but also resolution, compression techniques, and FPS given the quality of video also matters. You will need to collect profiling numbers to build this model.
    • Build an algorithm to optimize this when considering different factors like numbers of devices, amount of video, and different network conditions (e.g., degraded networks)
    • Determine different ways to improve or decrease the amount of data transferred (e.g., filter more on an edge device). Build this into your model

    Choose a type of model in visual analysis (e.g., VLMs, ALMs, translation, etc.) and characterize its use for visual analysis applications

    • Understand their accuracy, latency, power, resource consumption across a wide range of video/audio inputs and queries
    • What applications or use-cases would we use one (or more) of these models? Propose an algorithm/scheduler that optimizes for different queries and user requirements

    Put together a benchmark suite and testing framework for evaluating different visual analysis systems or optimizers using different multimodal inputs (audio, video, etc.)

    • Consider the different characteristics that need to be studied and analyzed (e.g., length of input, content of audio, etc.)
    • Consider the underlying system-under-test framework
    • Evaluate different systems or approaches and report metrics of interest (throughput, latency, accuracy, etc.) using your framework
    • Alternative: explore dataset curation systems. Use Mixtera as inspiration for video/audio to meet cost/performance/accuracy goals

    Explore context engineering for visual analytics

    • Choose a 2+ VLMs and understand how different video inputs (resolutions, FPS, bitrate, etc.), as well as contexts impact their accuracy for different queries
    • Based on your findings, propose a context management solution (e.g., a vector database or a RAG system) to provide the relevant context to improve accuracy/latency/throughput

    Estimate how video/visual processing impacts carbon emissions

    • Consider the different stages in video processing and analysis
    • Consider different scenarios: edge, cloud, etc.
    • Consider different load patterns
    • CarbonExplorer can be a good starting point

    Resources

    Paper and presentation

    Compute resources

    Datasets