Researchers taught machine learning system to identify developers based on code samples, potentially addressing plagiarism and hacking, but can create privacy concerns

Briefing

Researchers taught machine learning system to identify developers based on code samples, potentially addressing plagiarism and hacking, but can create privacy concerns

August 20, 2018

Briefing

  • Deanonymizing Developers – Two researchers from Drexel University and George Washington University have developed and trained machine learning system to identify programmers by their work after being fed with few samples of code
  • Code Identifiers – Programmed AI to identify 50 features in code samples that can be used to differentiate one developer from another
  • AI Training – Trained with code samples from Google’s annual Code Jam competition
  • Accuracy – Algorithm correctly identified 100 programmers with 96% accuracy, and 600 programmers with 83% accuracy
  • Other Insights – Experienced developers are easier to identify than novice programmers, with algorithm’s accuracy at deanonymizing 62 programmers increasing to 95% when asked to solve hard problems, compared to 90% accuracy with easy problems
  • Implications – Can identify plagiarizing students, hackers, and developers behind censorship circumvention tools, while creating privacy implications for contributors of coding community platforms, such as Github

Accelerator

Sector

Information Technology

Source

Original Publication Date

August 10, 2018

Leave a comment