GitHub is presently hosts approximately 0.5 PB of data on open source code. These data include the code itself and the various contributions to it, such as commits, pull requests, issues, comments, and users. A great deal of information can be learned about code and the open source community that creates it.
- How can graphs of code be used to obtain information about software and open source development?
- What are appropriate methods for deep learning on such graphs?
- Best-in-class methods for encoding this type of data