Time and Venue
3 October 2025, 3:30 to 5 PM (ET)
Klaus 1116, Klaus Advanced Computing Building (KACB), Georgia Tech
Title and Abstract
Fine-Tuning (supervised and reinforcement learning) for Science
Fine-tuning large language models (LLMs) is key to adapting them for scientific problems, but choosing the right approach and preparing data can be challenging. We will motivate why fine-tuning is necessary, starting with the role of data and using examples from chemistry and other sciences to show how to identify, curate, and when necessary create datasets to address new research questions. This motivation will naturally lead to when different fine-tuning methods are appropriate.
We will then cover supervised fine-tuning with parameter-efficient methods such as Low-Rank Adaptation (LoRA) and other PEFT techniques, which make high-quality training feasible on modest GPU resources. Building on this foundation, we will introduce reinforcement learning approaches, including RL with human feedback (RLHF) and RL with symbolic feedback (RLSF) developed by our group, and touch on recent methods such as Direct Preference Optimization (DPO) and Grouped Reward Policy Optimization (GRPO).
By the end of this session, participants will understand how to move from a scientific problem to a fine-tuned model: deciding when to curate or create data, selecting appropriate fine-tuning methods, and recognizing when reinforcement learning is required.
Instructors

Piyush Jha
Ph.D. Student,
School of Computer Science,
Georgia Tech

Prithwish Jana
Ph.D. Student,
School of Computer Science,
Georgia Tech
Prithwish’s research lies at the intersection of neuro-symbolic AI, formal methods, AI for code, and AI for mathematics. He develops neuro-symbolic techniques for fine-tuning large language models (LLMs), integrating symbolic reasoning tools and formal methods to enhance their reasoning capabilities in software engineering (e.g., code translation and code generation) and in mathematical reasoning (e.g., automated proof synthesis in Lean and proof auto-formalization).