There is wide disagreement regarding AI timelines, especially that of human-level AI (AGI). There are several ways to estimate AI progress, some more informative than others. A few of them are described below with links to relevant information. This is going to be a fairly long post.
Forecasting TAI with Biological anchors – This section is a summary of Draft report on AI timelines by Ajeya Cotra from Open Philanthropy. A quantitative model to forecast AI progress is described. Transformative AI (TAI) is defined as “software” that has at least as profound an impact on the world’s trajectory as the Industrial Revolution did. This would be equivalent to a tenfold acceleration in Gross World Product (GWP) from current ~2-3% per year to 20%-30% per year. This means that if TAI is developed in year Y, the entire world economy would more than double by year Y + 4.
One path to TAI would be training an ML model similar in computation power to the human brain (similar in inference compute). The author first generates a subjective probability distribution over how much computation it would take to train such a model using 2020 ML architectures in FLOP. By incorporating trends in algorithmic progress, hardware prices and willingness to spend, one gets the probability distribution of when the computation required to train such a model is affordable in future years.
To estimate 2020 training computation requirements, the author uses 4 hypotheses informed by biology and calculates the probability distribution of computational requirements for each of them separately. Then we update against levels of training FLOP already available and assign probabilities to the different hypotheses, combining them to produce a mixture distribution. All the hypotheses rely on the amount of computation done by the human brain, which is estimated to have a median of ~1e15 FLOP/s and right-skewed. The four hypotheses are:-
Lifetime Anchor: This hypothesis assumes that training computation requirements will resemble the amount of computation done by a child’s brain over the course of growing to be an adult, because we should expect our architectures and optimization algorithms to be about as efficient as human learning. The hypothesis anchors to a median of (1e15 FLOP/s) * (1e9 seconds) = 1e24 FLOP. It is adjusted by ~3 OOM (Orders of magnitude) because of comparisons of data requirement by current models vs humans, efficiency of natural artifacts vs human goods and because human babies are likely born with various priors that we must instead train. This results in a right-skewed distribution with a median of ~3e27 FLOP.
Evolution Anchor: This hypothesis assumes that training computation requirements will resemble the amount of computation done over the course of evolution from the earliest animals with neurons to modern humans, because we should expect our architectures and optimization algorithms to be about as efficient as natural selection. We get a median of ~1e41 FLOP, by assuming about 1 billion years of evolution from the earliest neurons and multiplying by the average population size and average brain FLOP/s of our evolutionary ancestors. Considering it more likely that our models are more efficient than evolution, the distribution has a median of ~1e41 FLOP with a left skew.
The final two hypotheses anchor to human brain FLOP/s (~1e15) to estimate the FLOP per subjective second ('subj sec' is a unit of data corresponding to the amount of information that a typical human can process in one second) performed by a transformative model. Considering the sophistication of current ML architectures, this is estimated to be ~1e16 FLOP / subj sec (~1 OOM larger). Total training FLOP would be (FLOP / subj sec) x (subj sec of training). Subjective seconds of training is assumed to scale as a power law to the number of parameters (based on current architectures) and linearly with the 'effective horizon length', which is how much data (measured in subj sec) the model must process (on average) to tell with a given level of confidence whether a perturbation to the model improves or worsens performance.
Neural Network: This hypothesis assumes that a transformative model would run at ~1e16 FLOP/subj sec and have ~3e14 parameters (estimated from ratio of computation to parameters in current architectures). It then extrapolates the amount of FLOP required to train such a model using the above scaling laws. The author divides this into short, medium and long horizon neural net based on 3 plausible effective horizon lengths (1e0-1e3, 1e3-1e6, and 1e6-1e9 seconds respectively) with medians of 1e32, 3e34 and 1e37 FLOP.
Genome Anchor: This hypothesis assumes that a transformative model will be structurally analogous to natural selection, albeit more computationally efficient — that it will involve searching for a “genome” with high “fitness” by optimizing over a large number of “generations.” It would run at ~1e16 FLOP / subj sec and have about as many parameters as there are bytes in the human genome (~7.5e8). Since signals about the fitness level of a particular candidate genome are sparse, this implies that the effective horizon length would be between ~1-32 subjective years. This hypothesis uses the above scaling laws to estimate training FLOP with median of ~1e33 FLOP.
Some hypotheses (especially the Lifetime Anchor hypothesis) assign non-trivial probability to levels of computation that are already affordable today. The distribution is truncated below the amount of compute used to train AlphaStar (~1e23 FLOP) and decreases probability of FLOP values from 1e23-1e27 log-linearly.
Mixture distribution: The author assigns 20% probability to Short Horizon, 30% to Medium Horizon, 15% to Long Horizon Neural network hypotheses, 5% to Lifetime Anchor hypothesis, 10% to Genome Anchor hypothesis, 10% to Evolution Anchor hypothesis, and 10% to the possibility that the amount of computation that would be required to train a transformative model with 2020 architectures and algorithms is higher (perhaps astronomically higher) than any of the hypotheses predict. The resulting distribution is very wide and places non-trivial probability mass on a range of 26 OOM, from 1e24 FLOP to 1e50 FLOP.
Hardware prices: Modeled as a logistic curve, with the FLOP/dollar doubling every ~2.5 years and leveling off after 6 OOM of progress.
Spending on computation: In the near-term, expected to rise rapidly to about $1B by 2025 (a doubling time of about 6 months), then slow to a doubling time of 2 years as AI labs run into more material capital constraints. Spending would saturate at 1% of the GDP of the largest country (GDP growing indefinitely at ~3%), by anchoring to national and international megaprojects such as the Manhattan Project and the Apollo Project.
Algorithmic progress: Modeled as a logistic curve with a halving time of ~2-3 years (different for all hypotheses). The hypotheses which predict 2020 training computation requirements are higher would have more room to improve over time, so the cap on progress is different for each hypothesis, ranging from 1 OOM (for Lifetime Anchor) to 5 OOM (for Evolution Anchor). Additionally, the probability that the required amount of computation is larger than all hypotheses is assumed to fall linearly over time, from 10% in 2025 to 3% by 2100, to model the possibility of algorithmic breakthroughs.
Sanity check: Back-testing the predictions of the model shows that the model would not have confidently predicted TAI in past years. Also, it is not obviously inconsistent with the capabilities of small animals. The model predicts that replicating 1-hour-horizon bee and mouse behaviors would require ~3e23 and ~3e28 FLOP respectively. GPT-3 (1e23-1e24 FLOP) should be somewhat more capable than a bee on medium horizons and not quite as capable as mice, which seems plausible.
TAI timelines: The model doesn't incorporate the possibility of other paths to TAI, availability of training data, environments and human feedback, possibility of exogenous events like global catastrophic risk, severe economic downturn, government regulation of AI, additional effort and testing for safety and robustness, etc. For these reasons, the model is expected to overestimate the probability of TAI in the short term and underestimate it in the long term. The author tentatively estimates ~12-17% probability of TAI by 2036, ~50% probability by 2052 and ~80% probability by 2100. For the complete probability distributions and clarifications of all the assumptions, refer to the original report.
Expert surveys: Expert predictions about AGI are diverse, ranging from never or centuries to just around the corner and anything in between. Predictions before 2015 are collected here. The most informative recent surveys are Grace(2016), Walsh(2017), Ford(2018), Gruetzemacher(2019). Small changes in question framing lead to very different answers. Extrapolating fractional progress results in longer estimates. There are reasons to expect expert predictions to be poor. Quantitative trend analyses are expected to be more accurate than expert judgments.
Economic evidence: AI is supposed to be a General-purpose technology (GPT), one that can affect the entire economy with potential to drastically alter society. The economy could provide (fairly weak) evidence to inform timelines. Some economists suggest that on a long enough timescale, GWP has grown super-exponentially and is best described by hyperbolic growth. Extrapolations of these trends predict an explosion in economic growth around the middle of this century. Some suggest the opposite. Here is an excellent write-up with relevant links about what the market can inform us.
In conclusion, there is wide uncertainty regarding AI timelines. There is some evidence to suggest it is likely to happen this century, although this is by no means conclusive. The quantitative model can help not only in informing your own predictions but also narrow down the exact reasons for disagreement with other's positions.