Location: Midtown Room #126
Multi-Resolution Spatio-Temporal Prediction with Application to Wind Power Generation
Zheng Dong (Georgia Institute of Technology)
A-1
Authors and Co-authors: Zheng Dong, Hanyu Zhang, Yao Xie, Shixiang Zhu, Pascal Van Hentenryck
Abstract: Wind energy is becoming an increasingly crucial component of a sustainable grid, but its inherent variability and limited predictability present challenges for grid operators. This paper presents a novel approach to wind speed prediction with uncertainty quantification using a multi-resolution spatio-temporal model. By leveraging information from multiple sources of predictions with varying accuracies and uncertainties, the joint framework provides a more accurate and robust prediction of wind speed while measuring the uncertainty in these predictions. We assess the effectiveness of our proposed framework using real-world wind data obtained from the Midwest region of the United States. Our results demonstrate that the framework enables predictors with varying data resolutions to learn from each other, leading to an enhancement in overall predictive performance. The goal of this research is to improve grid operation and management by aiding system operators and policymakers in making better-informed decisions related to energy demand management, energy storage system deployment, and energy supply scheduling. This results in potentially further integration of renewable energy sources into the existing power systems.
Quasi-Likelihoods for DNA Substitution Models With Site Rate Variability
Greg Ellison (University of Georgia)
A-2
Authors and Co-authors: Greg Ellison & Liang Liu
Abstract: Molecular evolutionary processes give rise to new species by mutating the DNA structure over generations. Understanding precisely when species divergence occurred can grant insight into historical evolutionary processes and clarify relationships among similar organisms. Models of DNA mutation such as the General Time Reversible (GTR) model allow estimation of divergence times and rates of mutation among nucleotides within a probabilistic framework. Using the GTR model, we formulate quasi-likelihoods based on pairwise and triplet comparisons of DNA sequences, allowing for fast estimation of evolutionary distances in the presence of variable mutation rates across sites. We propose a two-step estimator which simultaneously estimates average divergence times and the rate variability. We examine the properties of our approach and compare this estimator to existing full-likelihood methods through extemsive simulation studies.
Knowledge Cascade: Reverse Knowledge Distillation on Nonparametric Multivariate Functional Estimation
Luyang Fang (University of Georgia)
A-3
Authors and Co-authors: Luyang Fang, Haoran Lu, Wenxuan Zhong, Ping Ma
Abstract: The knowledge distillation methods have achieved remarkable performance in compressing the knowledge learned by a large-and-complex model (teacher) to a small-and-simple model (student), resulting in easier deployment while maintaining performance. However, classic knowledge distillation methods highly depend on a well-trained teacher model, which can be computationally expensive to train and may not always be available. Motivated by the challenges in developing teacher models, we propose the knowledge cascade (KCas), a reversed version of knowledge distillation that utilizes the knowledge learned by the student model to help train the teacher model. Although this is challenging since the teacher model often contains more information, we show that KCas is possible by taking advantage of statistical theories. We demonstrate KCas on the nonparametric multivariate functional estimation in reproducing kernel Hilbert space. One crucial problem is the daunting computational cost of selecting smoothing parameters, whose number increases exponentially with the number of predictors. KCas transfers the knowledge of smoothing parameters learned from the student model to the teacher model based on empirical and asymptotic results, significantly reducing the computational burden in high-dimensional and large datasets. Empirical evaluations on simulated and real data demonstrate the effectiveness of our KCas method.
Targeted Machine Learning for Average Causal Effect Estimation Using the Front-Door Functional
Anna Guo (Emory University)
A-4
Authors and Co-authors: Anna Guo, David Benkeser, and Razieh Nabi
Abstract: A commonly employed strategy for evaluating the average causal effect (ACE) of a treatment on an outcome of interest is through finding a back-door adjustment set that blocks all the confounding paths between treatment and outcome. However, the use of back-door covariate adjustment as a basis for inference is often untenable in observational studies due to the presence of unmeasured factors confounding the treatment-outcome relationship. An alternative strategy is to use the front-door criterion, which can allow for ACE identification even in the presence of unmeasured treatment-outcome confounders. The front-door criterion requires the existence of mediator(s) that are not affected by the unmeasured confounding sources and that fully mediate the effect of treatment on outcome. Here, we propose and evaluate a set of estimation strategies for the front-door functional based on the targeted minimum loss estimation theory. The proposed estimators are applicable to various scenarios accommodating binary, continuous, and multivariate mediators. They address the limitations of existing approaches by using data-adaptive machine learning algorithms that encode fewer modeling assumptions while ensuring desirable statistical properties such as asymptotic linearity, double-robustness, efficiency, and guaranteed estimates within the target parameter space. We establish conditions for nuisance functional estimations that are sufficient to ensure the $\sqrt{n}$-consistency of estimators for ACE as the target parameter of inference. Our numerical experiments demonstrate the favorable finite sample performance of the proposed estimators. We illustrate the applicability of these estimators to analyze the effect of early stage academic performance on future yearly income using data from the Finnish Social Science Data Archive.
Enhancing Sample Quality through Minimum Energy Importance Weights
Chaofan Huang (Georgia Institute of Technology)
A-6
Authors and Co-authors: Chaofan Huang and V. Roshan Joseph
Abstract: Importance sampling is a powerful tool for correcting the distributional mismatch in many statistics and machine learning problems, but in practice its performance is limited by the usage of simple proposals such that the importance weights can be computed analytically. To address this limitation, Liu and Lee (2017) proposed a Black-Box Importance Sampling (BBIS) algorithm that computes the importance weights for arbitrary simulated samples by minimizing the kernelized Stein discrepancy. However, this requires knowing the score function of the target distribution, which is not easy to obtain for many Bayesian problems. Hence, in this paper we propose another novel BBIS algorithm, BBIS-MED, that requires only the unnormalized density function. We demonstrate the effectiveness and wide applicability of our proposed BBIS-MED on extensive simulations and a real-world Bayesian model calibration problem where the score function cannot be derived analytically.
Jackknife empirical likelihood methods for the Cox Regression Model
Ali Jinnah (Georgia State University)
A-7
Authors and Co-authors: Ali Jinnah, Lauren Drinkard & Yichuan Zhao
Abstract: In this paper, we propose a jackknife empirical likelihood method to draw inference for the regression parameters in Cox regression model. We develop the jackknife empirical likelihood (JEL), adjusted jackknife empirical likelihood (AJEL), mean jackknife empirical likelihood (MJEL), and transformed jackknife empirical likelihood (TJEL) methods. We profile the set of nuisance parameters to study the parameter of interest. Extensive simulation studies show that the proposed jackknife empirical likelihood methods have better performance than the normal approximation in certain cases. We apply the proposed methods to study the Bone Marrow Transplant Patients (BMT), Larynx, and Myeloma real datasets for illustration.
GENERAL LOSS FUNCTIONS LEAD TO (APPROXIMATE) INTERPOLATION IN HIGH DIMENSIONS
Kuo-Wei Lai (Georgia Institute of Technology)
A-8
Authors and Co-authors: Kuo-Wei Lai, Vidya Muthukumar
Abstract: We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques and use our
results to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.
results to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.
Accelerated and instance-optimal policy evaluation with linear function approximation
Tianjiao Li (Georgia Institute of Technology)
A-10
Authors and Co-authors: Tianjiao Li, Guanghui Lan, Ashwin Pananjady
Abstract: We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model.
Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.
Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.
Optimal Sensor Allocation for Linear Physical Dispersion Model with Nonnegativity Constraint
Xinchao Liu (Georgia Institute of Technology)
B-1
Authors and Co-authors: Xinchao Liu, Dzung Phan, Youngdeok Hwang, Levente Klein, Xiao Liu*, and Kyongmin Yeo
Abstract: Optimal experimental design (OED) is a principled approach for sensor placement to optimally collect data during engineered or natural experiments, such as medical X-ray imaging and methane leakage detection. We consider linear inverse problems with nonnegativity constraints, such as estimating methane emission rates (>0) based on sensor recordings. Our motivating problem is how to optimally deploy sensors for atmospheric inverse modeling based on a linear physical dispersion model (i.e., Gaussian plume model), while the closed-form design criteria (e.g., A-optimal and D-optimal) do not exist due to the nonnegativity constraint of emission rates. A bi-level optimization framework is employed to model the OED problem. A stochastic objective is considered in the outer-level optimization (i.e., total estimation loss) and elastic net regularization is set to the inner-level optimization (i.e., constrained inverse model). We solve the stochastic bi-level optimization problem by deriving a hyper-gradient of the outer objective w.r.t sensor locations while considering the inner KKT system. For the application of our bi-level optimization methodology, we proposed a framework consisting of a global solver (i.e., the approximate A-optimal considering various possible wind conditions) and a local solver (i.e., the proposed bi-level optimization with non-negativity constraints). The combination of a global solver and a local solver gave us the final optimal designs of sensor locations. For numerical examples, two first-order iterative algorithms (i.e., repeated SAA-based and SGD-based) are compared, analyzed, and tested on illustrative examples. The SGD-based algorithm is also tested in realistic examples. The scalability of the SGD-based approach has been demonstrated. Theoretical convergence analysis of the proposed bi-level optimization is also provided.
Spectra Decomposition in Surface-enhanced Raman Spectroscopy (SERS): Extract Pure Spectra Using Deep Learning Method
Yufang Liu (University of Georgia)
B-2
Authors and Co-authors: Yufang Liu, Haoran Lu, Jiaheng Cui, Yanjun Yang, Xianyan Chen, Wenxuan Zhong, Ping Ma, Yiping Zhao
Abstract: The SERS-based analysis involves acquiring virus samples, placing them onto a SERS substrate, and subsequently recording SERS spectra. An essential step in the measurement of SERS spectra for the detection of pure analyte spectra is the decomposition of these spectra. In our current investigation, we address the challenge of extracting the pure spectrum of a component from mixed spectra that include a known diluent. To surmount this obstacle, we employ a deep learning approach to deduce the untainted analyte spectra using the mixture spectra collected at various analyte concentration levels. The proposed approach not only identifies pure analyte spectra but also discerns the influence of analyte spectra and background spectra in different concentration levels.
Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization
Mengqi Lou (Georgia Institute of Technology)
B-3
Authors and Co-authors: Kabir Aladin Chandrasekher, Mengqi Lou, Ashwin Pananjady
Abstract: We consider the problem of estimating the factors of a rank-1 matrix with i.i.d. Gaussian, rank-1 measurements that are nonlinearly transformed and corrupted by noise. Considering two prototypical choices for the nonlinearity, we study the convergence properties of a natural alternating update rule for this nonconvex optimization problem starting from a random initialization. We show sharp convergence guarantees for a sample-split version of the algorithm by deriving a deterministic recursion that is accurate even in high-dimensional problems. Notably, while the infinite-sample population update is uninformative and suggests exact recovery in a single step, the algorithm—and our deterministic prediction—converges geometrically fast from a random initialization. Our sharp, non-asymptotic analysis also exposes several other fine-grained properties of this problem, including how the nonlinearity and noise level affect convergence behavior.
On a technical level, our results are enabled by showing that the empirical error recursion can be predicted by our deterministic sequence within fluctuations of the order n^{−1/2} when each iteration is run with n observations. Our technique leverages leave-one-out tools originating in the literature on high-dimensional M-estimation and provides an avenue for sharply analyzing complex iterative algorithms from a random initialization in other high-dimensional optimization problems with random data.
Refining Multivariate Forecasting with Adaptive Temporal-Contextual Learning
Jiecheng Lu (Georgia Institute of Technology)
B-4
Authors and Co-authors: Jiecheng Lu, Xu Han, Shihao Yang
Abstract: Long-term time series forecasting (LTSF) is important for various domains but is confronted by challenges in handling the complex temporal-contextual relationships. As multivariate input models underperforming some recent univariate counterparts, we posit that the issue lies in the inefficiency of existing multivariate LTSF Transformers to model series-wise relationships: the characteristic differences between series are often captured incorrectly. To address this, we introduce ARM: a multivariate temporal-contextual adaptive learning method, which is an enhanced architecture specifically designed for multivariate LTSF modelling. ARM employs Adaptive Univariate Effect Learning (AUEL), Random Dropping (RD) training strategy, and Multi-kernel Local Smoothing (MKLS), to better handle individual series temporal patterns and correctly learn inter-series dependencies. ARM demonstrates superior performance on multiple benchmarks without significantly increasing computational costs compared to vanilla Transformer, thereby advancing the state-of-the-art in LTSF. ARM is also generally applicable to other LTSF architecture beyond vanilla Transformer.
Graphon Estimation via Moment Embedding and Smoothing Spline
Haoran Lu (University of Georgia)
B-5
Authors and Co-authors: Huimin Cheng; Ping Ma; Wenxuan Zhong
Abstract: Nowadays, network data analysis has drawn much attention. An elegant way to analyze network data is via network limits, graphon. However, the estimation of graphon is theoretically difficult due to the non-identifiable nature of graphon and hidden locations. Also, most existing methods are of high computation cost, some are even NP-hard. In this paper, we propose a novel framework for functional graphon estimation via graph moment embedding and smoothing spline (GESS). Our method solves the non-identifiable issue in graphon with very mild assumptions. Theoretical results show that GESS’s privileges in estimation accuracy and computation efficiency. Simulation study and real data analysis validated the empirical performance of GESS.
Privacy Attacks against Federated Tensor Decomposition Models
Yichen Ma (Georgia Institute of Technology)
B-6
Authors and Co-authors: Michael Biehler, Yichen Ma, Dr. Jianjun Shi
Abstract: In this study, we explore the security of federated dimensionality reduction techniques, often used due to data sharing limitations. While these methods promise privacy, we reveal that many can be vulnerable to model inversion attacks, where attackers can guess the input data by just looking at some of the model’s details. Through various tests and case studies, we delve deep into how susceptible federated tensor decomposition models are to these threats. We also provide a new framework to assess these attacks and emphasize the need for stricter evaluations of such models to ensure user data privacy.
Modeling interactions among medically attended respiratory viral diseases based on sentinel surveillance
Yu Miao (University of Georgia)
B-7
Authors and Co-authors: Yu Miao, Yang Yang
Abstract: Respiratory viral infections pose a persistent global health challenge, and co-infection with multiple viruses is often observed. While some research has explored community-level interactions among respiratory viruses using aggregated data, this study addresses a critical knowledge gap in virus-virus interactions at the individual level. Utilizing sentinel surveillance data from four major Chinese cities spanning a decade (2009-2019), we employ a Bayesian Multivariate Probit Model to elucidate the dynamics of viral interactions. As many viruses share seasonality, to better understand immunological interactions, this model accounts for the innate seasonal fluctuations in viral epidemics. Our model unveils substantial interactions among specific respiratory viruses, offering valuable insights into the intricate interplay of these viruses at the individual host level. Additionally, we identified a significant annual cycle in Respiratory Syncytial Virus type A, type B, and Human Parainfluenza Viruses. Semi-annual periodicity was absent across all the nine examined viruses. These insights can help the development of effective public health interventions targetting multiple pathogens simultaneously. In future research, we will extend our model to incorporate time-varying transmission rates, individual-level (e.g., age and gender) and environmental risk factors (e.g., weather conditions).
Instrument Effect Calibration and Analytes Classification with Functional Regression and Machine Learning
Tao Wang (University of Georgia)
B-9
Authors and Co-authors: Tao Wang, Haoran Lu, Ping Ma, Wenxuan Zhong, Yiping Zhao
Abstract: It has been found that two instruments can yield different numerical values when used to measure identical points. Accurate measurements are crucial for quality control, compliance with regulations, safety, and research validity. In this project, different instruments are used to measure the spectrum of the same analyte. The functional regression is used to find the relative response function between the spectrum measured by two instruments. Based on the functional regression, we reconstruct measurements of one instrument which have no label of the analyte type and make classification on the reconstructed measurements using the classification model trained by measurements of the other instrument. We achieved a classification accuracy of 88% on real data.
Variable Selection for Kernel Two-Sample Tests
Jie Wang (Georgia Institute of Technology)
B-10
Authors and Co-authors: Jie Wang, Santanu S. Dey, Yao Xie
Abstract: We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a non-asymptotic testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.
Sparse Independent Component Analysis with an Application to Cortical Surface fMRI Data in Autism
Zihang Wang (Emory University)
C-1
Authors and Co-authors: Zihang Wang, Irina Gaynanova, Aleksandr Aravkin, Benjamin B. Risk
Abstract: Independent component analysis (ICA) is widely used to estimate spatial resting-state networks and their time courses in neuroimaging studies. It is thought that independent components correspond to sparse patterns of co-activating brain locations. Previous approaches for introducing sparsity to ICA replace the non-smooth objective function with smooth approximations, resulting in components that do not achieve exact zeros. We propose a novel Sparse ICA method that enables sparse estimation of independent source components by solving a non-smooth non-convex optimization problem via the relax-and-split framework. The proposed Sparse ICA method balances statistical independence and sparsity simultaneously and is computationally fast. In simulations, we demonstrate improved estimation accuracy of both source signals and signal time courses compared to existing approaches. We apply our Sparse ICA to cortical surface resting-state fMRI in school-aged autistic children. A functional connectivity analysis of the components reveals decreased posterior cingulate cortex to medial prefrontal cortex connectivity in the default mode network in autistic children, as well as decreased connectivity between the temporal parietal junction and frontal areas of executive function. Sparse ICA selects coactivating locations, which we argue is more interpretable than dense components from popular approaches. Sparse ICA is fast and easy to apply to big data.
Convolutional Non-homogeneous Poisson Process with Application to Wildfire Risk Quantification for Power Delivery Networks
Guanzhou Wei (Georgia Institute of Technology)
C-2
Authors and Co-authors: Guanzhou Wei, Xiao Liu, Feng Qiu
Abstract: The current projection shows that much of the continental U.S. will have significantly hotter and drier days in the following decades, leading to more wildfire hazards that threaten the safety of power grid. Unfortunately, the U.S. power industry is not well prepared and still predominantly relies on empirical fire indices which do not consider the full spectrum of dynamic environmental factors. This paper proposes a new spatio-temporal point process model, Convolutional Non-homogeneous Poisson Process (cNHPP), to quantify wildfire risks for power delivery networks. The pro- posed model captures both the current short-term and cumulative long-term effects of covariates on wildfire risks, and the spatio-temporal dependency among different segments of the power delivery network. The computation and interpretation of the intensity function are thoroughly investigated, and the connection between cNHPP and Recurrent Neural Network is also discussed. We apply the proposed approach to estimate wildfire risks on major transmission lines in California, utilizing historical fire data, meteorological and vegetation data obtained from the National Oceanic and Atmospheric Administration and National Aeronautics and Space Administra- tion. Comparison studies are performed to show the applicability and predictive capability of the proposed approach. Useful insights are obtained that potentially enhance power grid resilience against wildfires.
Causal Graph Discovery from Self and Mutually Exciting Time Series
Song Wei (Georgia Institute of Technology)
C-3
Authors and Co-authors: Song Wei, Yao Xie, Christopher S. Josef, Rishikesan Kamaleswaran.
Abstract: We present a generalized linear structural causal model, coupled with a novel data-adaptive linear regularization, to recover causal directed acyclic graphs (DAGs) from time series. By leveraging a recently developed stochastic monotone Variational Inequality (VI) formulation, we cast the causal discovery problem as a general convex optimization. Furthermore, we develop a non-asymptotic recovery guarantee and quantifiable uncertainty by solving a linear program to establish confidence intervals for a wide range of non-linear monotone link functions. We validate our theoretical results and show the competitive performance of our method via extensive numerical experiments. Most importantly, we demonstrate the effectiveness of our approach in recovering highly interpretable causal DAGs over Sepsis Associated Derangements (SADs) while achieving comparable prediction performance to powerful “black-box” models such as XGBoost. Thus, the future adoption of our proposed method to conduct continuous surveillance of high-risk patients by clinicians is much more likely.
Subsampling in Large Graphs Using Ricci Curvature
Shushan Wu (University of Georgia)
C-4
Authors and Co-authors: Wu, Shushan, Huimin Cheng, Jiazhang Cai, Ping Ma, and Wenxuan Zhong
Abstract: In recent decades, an increasing number of large graphs with millions of nodes have been collected and constructed. Despite their utility, analyzing such graphs is hindered by high computational costs and visualization difficulties. To tackle these issues, researchers have developed graph subsampling approaches that extract representative nodes to provide a rough sketch of the graph that preserves its global properties. By selecting representative nodes, these graph subsampling methods can help researchers estimate the graph statistics, e.g., the number of communities, of the large graph from the subsample. However, the available subsampling methods, such as degree node sampler and random walk sampler, tend to overlook minority communities, as they prioritize nodes with high degrees. To address this limitation, we propose leveraging community information hidden within the graph for subsampling. Although the community structure is typically unknown, geometric methods can reveal community structure information by defining an analog of Ricci curvature on the graph, known as the Ollivier Ricci curvature. We introduce a new subsampling algorithm called the Ollivier Ricci curvature Gradient-based subsampling (ORG-sub) algorithm based on our asymptotic results regarding the within-community and between-communities edges’ OR curvature. The ORG-sub algorithm makes two significant contributions: Firstly, it provides a rigorous theoretical guarantee that the probability of taking all communities into the final subgraph converges to one. Secondly, extensive experiments on synthetic and benchmark datasets demonstrate the advantages of our algorithm.
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
Zhenghao Xu (Georgia Institute of Technology)
C-5
Authors and Co-authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
Abstract: Policy-based algorithms equipped with deep neural networks have achieved great success in solving high-dimensional policy optimization problems in reinforcement learning. However, current analyses cannot explain why they are resistant to the curse of dimensionality.
In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with convolutional neural networks (CNN) as function approximators. Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$.
We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited.
As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\Tilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of environment.
Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, providing an explanation for the efficacy of deep policy-based algorithms.
In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with convolutional neural networks (CNN) as function approximators. Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$.
We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited.
As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\Tilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of environment.
Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, providing an explanation for the efficacy of deep policy-based algorithms.
Understanding the Role of Compensation Coefficient in Active Quickest Detection Problem
Qunzhi Xu (Georgia Institute of Technology)
C-6
Authors and Co-authors: Qunzhi Xu and Yajun Mei
Abstract: In this work, we study the role of compensation coefficient in the active quickest detection problem, which was first proposed in Liu, Mei and Shi but without theoretical justification. We show that when the compensation coefficient is small, the algorithm has the asymptotic optimality properties under some certain scenarios. Numerical studies are conducted to validate our results.
Sequential Predictive Conformal Inference for Time Series
Chen Xu (Georgia Institute of Technology)
C-7
Authors and Co-authors: Chen Xu, Yao Xie
Abstract: We present a new distribution-free conformal prediction algorithm for sequential data (e.g., time series), called the sequential predictive conformal inference (SPCI). We specifically account for the nature that time series data are non-exchangeable, and thus many existing conformal prediction algorithms are not applicable. The main idea is to adaptively re-estimate the conditional quantile of non-conformity scores (e.g., prediction residuals), upon exploiting the temporal dependence among them. More precisely, we cast the problem of conformal prediction interval as predicting the quantile of a future residual, given a user-specified point prediction algorithm. Theoretically, we establish asymptotic valid conditional coverage upon extending consistency analyses in quantile regression. Using simulation and real-data experiments, we demonstrate a significant reduction in interval width of SPCI compared to other existing methods under the desired empirical coverage.
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
Zixuan Zhang (Georgia Institute of Technology)
C-8
Authors and Co-authors: Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao.
Abstract: Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset $S$, and the intrinsic dimension of $S$ can be characterized by a new complexity notation — effective Minkowski dimension. We prove that, the sample complexity of deep nonparametric regression only depends on the effective Minkowski dimension of $S$ denoted by $p$. We further illustrate our theoretical findings by considering nonparametric regression with an anisotropic Gaussian random design $N(0,\Sigma)$, where $\Sigma$ is full rank. When the eigenvalues of $\Sigma$ have an exponential or polynomial decay, the effective Minkowski dimension of such an Gaussian random design is $p=O(\sqrt{\log n})$ or $p=\cO(n^\gamma)$, respectively, where $n$ is the sample size and $\gamma\in(0,1)$ is a small constant depending on the polynomial decay rate. Our theory shows that, when the manifold assumption does not hold, deep neural networks can still adapt to the effective Minkowski dimension of the data, and circumvent the curse of the ambient dimensionality for moderate sample sizes.
Detection-Recovery Gap for Planted Dense Cycles
Shenduo Zhang (Georgia Institute of Technology)
C-9
Authors and Co-authors: Cheng Mao, Alexander S. Wein, Shenduo Zhang
Abstract: Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth nτ and edge density p is planted in an Erdős-Rényi graph G(n,q). We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree polynomial algorithms. In particular, a gap exists between the two thresholds in a certain regime of parameters. For example, if n−3/4≪τ≪n−1/2 and p=Cq=Θ(1) for a constant C>1, the detection problem is computationally easy while the recovery problem is hard for low-degree algorithms.
Asset Bundling for Wind Power Forecasting
Hanyu Zhang (Georgia Institute of Technology)
C-10
Authors and Co-authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph , Shangkun Wang, Pascal Van Hentenryck
Abstract: The growing penetration of intermittent, renewable generation in US power grids results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques to accurately predict wind power at the asset, bundle, and fleet level. Notably, our approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks (fleet-level time series) and proposes new asset-bundling criteria to capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of wind farms, demonstrating the benefits of BPR, which consistently and significantly improves forecast accuracy over the baseline approach, especially at the fleet level.
Novel Empirical Likelihood Method for the Cumulative Hazard Ratio
Dazhi Zhao (Georgia State University)
D-1
Authors and Co-authors: Dazhi Zhao and Yichuan Zhao
Abstract: In clinical studies, how to evaluate the treatment effect is a crucial topic. Nowadays, the ratio of cumulative hazards is often applied to accomplish this task, especially when those hazards may be nonproportional. The stratified Cox model, as an important extension of the classical Cox model, has the ability to flexibly handle nonproportional hazards. In this paper, we propose a novel empirical likelihood (EL) method to construct the confidence interval for cumulative hazard ratio under stratified Cox model. The large sample properties of the proposed EL ratio statistic are investigated, and the finite sample properties of the EL-based estimators under some different situations are explored in simulation studies. The proposed method was finally applied to perform statistical analysis on a real world dataset on the survival experience of patients with heart failure.
A Group Distributional ICA Method for Decomposing Multi-subject Diffusion Tensor Imaging
Guangming Yang (Emory University)
D-2
Authors and Co-authors: Ben Wu, Jian Kang, Ying Guo
Abstract: e propose a group distributional ICA (DICA) method for decomposing multi-subject Diffusion tensor imaging (DTI) data to identify underlying structure networks. DTI is a frequently used imaging modality to investigate white matter fiber connections of human brain. However, there has been very limited work for conducting blind source separation on multi-subject DTI data. Due to the special characteristics of the 3D diffusion tensor measured in DTI, existing methods such as standard independent component analysis (ICA) can’t be directly applied. Motivated by a new DICA framework (Wu et al., 2021)\textsuperscript{1}, the proposed group DICA aims to fill this gap by performing a fundamentally new blind source separation approach that separates the parameters in the distribution function of the observed imaging data as a mixture of independent source signals.
Learning Ability of Interpolating Deep Convolutional Neural Network
Tian-Yi Zhou (Georgia Institute of Technology)
D-3
Authors and Co-authors: Tian-Yi Zhou; Xiaoming Huo
Abstract: It is frequently observed that overparameterized neural networks generalize well. Regarding such phenomena, existing theoretical work is mainly devoted to linear settings or fully-connected neural networks. This paper studies the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs), under both underparameterized and overparameterized settings. We establish the first learning rates of underparameterized DCNNs without parameter or function variable structure restrictions presented in the literature. We also show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN. This result is achieved by a novel network deepening scheme designed for DCNNs. Our work provides theoretical verification of how overfitted DCNNs generalize well.