Data Science

Data Compression

  • Mak, S. and V. Roshan Joseph. (2018) “Support Points”. Annals of Statistics, 46, 2562-2592. R package: support.
  • Mak, S. and V. Roshan Joseph. “Projected Support Points: A New Method for High-Dimensional Data Reduction”. arXiv
  • V. Roshan Joseph and Mak, S. (2021). “Supervised Compression of Big Data”. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14, 217-229. R package: supercompress.

Data Splitting

  • V. Roshan Joseph and Vakayil, A. (2022). “SPlit: An Optimal Method for Data Splitting”. Technometrics, 64, 166-176. R package: SPlit. (Wilcoxon Award).
  • V. Roshan Joseph. (2022). “Optimal Ratio for Data Splitting“. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15, 531-538. R package: SPlit.
  • V. Roshan Joseph (2024). “Comment: Data Fission: Splitting a Single Data Point by J. Leiner, B. Duan, L. Wasserman & A. Ramdas”. Journal of the American Statistical Association, to appear.

Data Twinning

Factor Importance

  • Huang, C. and V. Roshan Joseph. “Factor Importance Ranking and Selection using Total Indices”. arXiv. R package: first.