6 game changing AI-on-code papers published in 2021 that you should now about

Are you confused between Ponicode and CircleCI? It’s not you, it’s us. Ponicode was acquired by CircleCI as of March 2022. The content and material published prior to this date remains under Ponicode’s name. When in doubt: Ponicode = CircleCI.

Following up our 2020 top AI trends to keep an eye on, our data science team has kept a close eye on the latest news when it comes to artificial intelligence on code this year, a year filled with publications where creative approaches and unique applications of machine learning are competing to get the spotlight. Since our team is particularly focused on developing AI on code we wanted to share the papers that we think are the most impactful in our field and who consequently will shape the software industry for the coming year(s).

First in our list, this literature review on the use of deep learning in software engineering research. A topic close to our hearts https://www.ponicode.com/blog/ai-and-voice-recognition-trends-to-follow-closely since Ponicode uses Artificial Intelligence since its creation to generate natural code suggestions and accelerate developers. 

If you want to find out where and how AI can now support software engineering then this is the place to start.

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

Deep Learning based Vulnerability Detection: Are We There Yet?” explores the different techniques to generate high accuracy vulnerability detection for software. The promises of data science for this field, one of the most critical for the software industry, seems diminished when confronted to real life use cases but the paper goes beyond checking accuracy and suggests a practical roadmap to optimize the use of deep learning for this use case.

Deep Learning based Vulnerability Detection: Are We There Yet?

One topic that generated a lot of hype in the data science community was Codex, the AI-on-code specific tools leveraging the power of OpenAI’s GPT3. “Evaluating Large Language Models Trained on Code” is the introductory paper presenting the applications and limitations of Codex. It also discusses the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics. One of the highlights of the year for the AI-on-code ecosystem.

Evaluating Large Language Models Trained on Code

Leveraging Artificial Intelligence to reduce the number of flaws in the code is a key use case that could enable the industry to scale up the software manufacturing process like never before. While the second paper we mentioned was focusing on vulnerability detection this one focuses on bug fixing. “Generating Bug-Fixes Using Pretrained Transformers” is written by a team of Microsoft data scientists and introduces DeepDebug, a data-driven program repair approach which learns to detect and fix bugs. The field of automated bug fix is moving forward and we can hope that we will see some impactful startup cornering this issue with powerful solutions in 2022.

Generating Bug-Fixes Using Pretrained Transformers

For our last paper we go a little bit deeper in the AI-on-code field with this document demonstrating the capacity of machine learning to create a representation of code with a language agnostic approach by processing both context and structure as inputs. The capacity to build language agnostic models is a powerful feature that enables new solutions to scale to the whole software ecosystem fast. As we see the trends of languages and frameworks changing more quickly than we can keep up with, the agnostic approach to AI-on-code solution seems the only long term viable approach if we want fast adoption across the industry. An interesting take on code representation as well so make sure to check it out.

Language-agnostic representation learning of source code from structure and context

Bonus: The first workshop on NLP for programming was created last summer and our data science team particularly enjoyed the quality of discussions and presentations made there. You can find on their website all the pieces of information regarding the workshop including replays and an amazing list of research papers. Have fun!

That’s it! We hope that this selection made you as excited for the future of AI-on-code as it did for us. We expect 2022 to be just as intense for research and development in our field and we are taking an active part of it. Give a try to Ponicode and discover the power of artificial intelligence for code quality acceleration (including for data scientists) today!

Green blobred blob