@KingRandomGuy - Lemmy-B

KingRandomGuy@lemmy.world · 2 days ago

Yeah, in typical Google fashion they used to have two deep learning teams: Google Brain and DeepMind. Google Brain was Google’s in-house team, responsible for inventing the transformer. DeepMind focused more on RL agents than Google Brain, hence discoveries like AlphaZero and AlphaFold.

KingRandomGuy@lemmy.world · 2 days ago

The general framework for evolutionary methods/genetic algorithms is indeed old but it’s extremely broad. What matters is how you actually mutate the algorithm being run given feedback. In this case, they’re using the same framework as genetic algorithms (iteratively building up solutions by repeatedly modifying an existing attempt after receiving feedback) but they use an LLM for two things:

Overall better sampling (the LLM has better heuristics for figuring out what to fix compared to handwritten techniques), meaning higher efficiency at finding a working solution.
“Open set” mutations: you don’t need to pre-define what changes can be made to the solution. The LLM can generate arbitrary mutations instead. In particular, AlphaEvolve can modify entire codebases as mutations, whereas prior work only modified single functions.

The “Related Work” (section 5) section of their whitepaper is probably what you’re looking for, see here.

KingRandomGuy@lemmy.world · 1 month ago

Thanks for the respectful discussion! I work in ML (not LLMs, but computer vision), so of course I’m biased. But I think it’s understandable to dislike ML/AI stuff considering that there are unfortunately many unsavory practices taking place (potential copyright infringement, very high power consumption, etc.).

KingRandomGuy@lemmy.world · edit-2 1 month ago

All of the “AI” garbage that is getting jammed into everything is merely scaled up from what has been before. Scaling up is not advancement.

I disagree. Scaling might seem trivial now, but the state-of-the-art architectures for NLP a decade ago (LSTMs) would not be able to scale to the degree that our current methods can. Designing new architectures to better perform on GPUs (such as Attention and Mamba) is a legitimate advancement. Furthermore, the viability of this level of scaling wasn’t really understood for a while until phenomenon like double descent (in which test error surprisingly goes down, rather than up, after increasing model complexity past a certain degree) were discovered.

Furthermore, lots of advancements were necessary to train deep networks at all. Better optimizers like Adam instead of pure SGD, tricks like residual layers, batch normalization etc. were all necessary to allow scaling even small ConvNets up to work around issues such as vanishing gradients, covariate shift, etc. that tend to appear when naively training deep networks.