Program comprehension is a fundamental task in software development and main-
tenance processes. Software developers often need to understand a large amount
of existing code before they can develop new features or fix bugs in existing pro-
grams. Being able to process programming language code automatically and pro-
vide summaries of code functionality accurately can significantly help developers
to reduce time spent in code navigation and understanding, and thus increase pro-
ductivity. Different from natural language articles, source code in programming
languages often follows rigid syntactical structures and there can exist dependen-
cies among code elements that are located far away from each other through com-
plex control flows and data flows. Existing studies on tree-based convolutional
neural networks (TBCNN) and gated graph neural networks (GGNN) are not able
to capture essential semantic dependencies among code elements accurately. In
this paper, we propose novel tree-based capsule networks (TreeCaps) and relevant
techniques for processing program code in an automated way that encodes code
syntactical structures and captures code dependencies more accurately. Based on
evaluation on programs written in different programming languages, we show that
our TreeCaps-based approach can outperform other approaches in classifying the
functionalities of many programs.
License type:
PublisherCopyrights
Funding Info:
This research is supported by the National Research Foundation Singapore under its AI Singapore
Programme (Award Number: AISG-RP-2019-010).