Gradient descent optimizes over-parameterized deep ReLU networks D Zou, Y Cao, D Zhou, Q Gu Machine Learning 109 (3), 467-492, 2020 | 348 | 2020 |

Generalization bounds of stochastic gradient descent for wide and deep neural networks Y Cao, Q Gu Advances in Neural Information Processing Systems 32, 10836-10846, 2019 | 142 | 2019 |

Generalization error bounds of gradient descent for learning over-parameterized deep relu networks Y Cao, Q Gu Proceedings of the AAAI Conference on Artificial Intelligence 34 (04), 3349-3356, 2020 | 113* | 2020 |

Closing the generalization gap of adaptive gradient methods in training deep neural networks J Chen, D Zhou, Y Tang, Z Yang, Y Cao, Q Gu arXiv preprint arXiv:1806.06763, 2018 | 86 | 2018 |

On the convergence of adaptive gradient methods for nonconvex optimization D Zhou, J Chen, Y Cao, Y Tang, Z Yang, Q Gu arXiv preprint arXiv:1808.05671, 2018 | 80 | 2018 |

How much over-parameterization is sufficient to learn deep relu networks? Z Chen, Y Cao, D Zou, Q Gu arXiv preprint arXiv:1911.12360, 2019 | 48 | 2019 |

Towards understanding the spectral bias of deep learning Y Cao, Z Fang, Y Wu, DX Zhou, Q Gu arXiv preprint arXiv:1912.01198, 2019 | 41 | 2019 |

Local and global inference for high dimensional nonparanormal graphical models Q Gu, Y Cao, Y Ning, H Liu arXiv preprint arXiv:1502.02347, 2015 | 30* | 2015 |

A generalized neural tangent kernel analysis for two-layer neural networks Z Chen, Y Cao, Q Gu, T Zhang arXiv preprint arXiv:2002.04026, 2020 | 26* | 2020 |

Agnostic learning of a single neuron with gradient descent S Frei, Y Cao, Q Gu arXiv preprint arXiv:2005.14426, 2020 | 17 | 2020 |

Algorithm-dependent generalization bounds for overparameterized deep residual networks S Frei, Y Cao, Q Gu arXiv preprint arXiv:1910.02934, 2019 | 14 | 2019 |

Tight sample complexity of learning one-hidden-layer convolutional neural networks Y Cao, Q Gu arXiv preprint arXiv:1911.05059, 2019 | 13 | 2019 |

The edge density barrier: Computational-statistical tradeoffs in combinatorial inference H Lu, Y Cao, Z Yang, J Lu, H Liu, Z Wang International Conference on Machine Learning, 3247-3256, 2018 | 7 | 2018 |

High-temperature structure detection in ferromagnets Y Cao, M Neykov, H Liu arXiv preprint arXiv:1809.08204, 2018 | 6 | 2018 |

Risk bounds for over-parameterized maximum margin classification on sub-gaussian mixtures Y Cao, Q Gu, M Belkin arXiv preprint arXiv:2104.13628, 2021 | 5 | 2021 |

Agnostic learning of halfspaces with gradient descent via soft margins S Frei, Y Cao, Q Gu International Conference on Machine Learning, 3417-3426, 2021 | 4 | 2021 |

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise S Frei, Y Cao, Q Gu arXiv preprint arXiv:2101.01152, 2021 | 2 | 2021 |

Accelerated factored gradient descent for low-rank matrix factorization D Zhou, Y Cao, Q Gu International Conference on Artificial Intelligence and Statistics, 4430-4440, 2020 | 2 | 2020 |

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization D Zou, Y Cao, Y Li, Q Gu arXiv preprint arXiv:2108.11371, 2021 | | 2021 |

Structure Detection in High Dimensional Graphical Models Y Cao Princeton, NJ: Princeton University, 2018 | | 2018 |