![]() We formally define this algorithmic alignment and derive a sample complexity bound that decreases with better alignment. In this paper, we develop a framework to characterize which reasoning tasks a network can learn well, by studying how well its computation structure aligns with the algorithmic structure of the relevant reasoning process. Theoretically, there is limited understanding of why and when a network structure generalizes better than others, although they have equal expressive power. Empirically, these tasks require specialized network structures, e.g., Graph Neural Networks (GNNs) perform well on many such tasks, but less structured networks fail. Neural networks have succeeded in many reasoning tasks. Empirically, our theory holds across different training settings. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) – structured networks with MLP modules – have shown some success in more complex tasks. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. We also find that when the network is well-aligned with the target function, its predictive power in representations could improve upon state-of-the-art (SOTA) noisy-label-training methods in terms of test accuracy and even outperform sophisticated methods that use clean labels. To support our hypothesis, we provide both theoretical and empirical evidence across various neural network architectures and different domains. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise. Our framework measures a network’s robustness via the predictive power in its representations - the test performance of a linear model trained on the learned representations using a small set of clean labels. We provide a formal framework connecting the robustness of a network to the alignments between its architecture and target/noise functions. In this work, we explore an area understudied by previous works - how the network’s architecture impacts its robustness to noisy labels. Noisy labels are inevitable in large real-world datasets. ![]() Taiji Suziki when doing research internships at Vector Institute and RIKEN AIP.ĭuring my research, I also received great guidance from Prof. In addition, I have worked closely with Prof. Petar Veličković, working on how to re-use the learned knowledge and skills in reinforcement learning. I have also interned at DeepMind under the supervision of Dr. I believe having the ability to reason is an important and necessary step to achieving general intelligence.Īcross my Ph.D., I have interned at Microsoft Research under the guidance of Adith Swaminathan on designing better hindsight learning schemes for combinatorial optimization problems. My research focuses on understanding and enriching the reasoning capabilities of current deep learning models. John Dickerson.īefore that, I obtained my Bachelor’s degree in Computer Science and Mathematics from Bryn Mawr College. student in Computer Science at the University of Maryland, College Park, where I am fortunate to be advised by Prof.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |