Alongside its Gemini generative AI model, Google this morning took the wraps off of AlphaCode 2, an improved version of the code-generating AlphaCode introduced by Google’s DeepMind lab roughly a year ago.
AlphaCode 2 is in fact powered by Gemini, or at least some variant of it (Gemini Pro) fine-tuned on coding contest data. And it’s far more capable than its predecessor, Google says — at least on one benchmark.
In a subset of programming competitions hosted on Codeforces, a platform for programming contests, AlphaCode 2 — coding in languages spanning Python, Java, C++ and Go — performed better than an estimated 85% of competitors on average, according to Google. That’s compared to the roughly 50% of competitors its predecessor managed to best on the same subset.
“We selected 12 recent contests with more than 8,000 participants, either from division 2 or the harder division ‘1+2.’ This makes for a total of 77 problems,” a technical whitepaper on AlphaCode 2 reads. “AlphaCode 2 solves 43% of problems within 10 attempts, close to twice as many problems as the original AlphaCode (25%).”
AlphaCode 2 can understand programming challenges involving “complex” math and theoretical computer science. And, among other reasonably sophisticated techniques, AlphaCode 2 is capable of dynamic programming, explains DeepMind research scientist Rémi Leblond in a pre-recorded video.
Dynamic programming entails simplifying a complex problem by breaking it down into easier sub-problems over and over; Leblond says that AlphaCode 2 knows not only when to properly implement this strategy but where to use it. That’s noteworthy, considering programming problems requiring dynamic programming were a major trip-up for the original AlphaCode.
“[AlphaCode 2] needs to show some level of understanding, some level of reasoning and designing of code solutions before it can get to the actual implementation to solve [a] coding problem,” Leblond said. “And it does all that on problems it’s never seen before.”
AlphaCode 2 solves problems by first tapping a family of “policy models” that generate a number of code samples for each problem. Code samples that don’t fit the problem description are filtered out, and a clustering algorithm groups “semantically similar code samples” to avoid any redundancies. Finally, a scoring model within AlphaCode 2 surfaces the best candidate out of each of the ten biggest code samples “clusters” — which constitutes AlphaCode 2’s answer to the problem.
Now, all AI models have flaws — and AlphaCode 2 is no exception. According to the whitepaper, AlphaCode 2 requires a lot of trial and error, is too costly to operate at scale and relies heavily on being able to filter out obviously bad code samples. Migrating to a more capable version of Gemini, such as Gemini Ultra, might mitigate some of this, the whitepaper speculates.
As for whether we can expect to see AlphaCode 2 reach a product at some point — AlphaCode was never released — in a briefing, Eli Collins, VP of product at DeepMind, alluded to the possibility.
“One of the things that was most exciting to me about the latest results is that when programmers collaborate with [AlphaCode 2 powered by] Gemini, by defining certain properties for the code to follow, the performance [of the model] gets even better,” Collins said. “In the future, we see programmers making use of highly capable AI models as collaborative tools that assist with the entire software development process from reasoning about problems to assisting with implementation.”