
Google’s GenAI model Gemini can create false content about various topics, such as the U.S. presidential election, the Super Bowl, or the Titan submersible implosion, with the right prompt. This has drawn criticism from policymakers, who are concerned about the potential misuse of GenAI tools for spreading disinformation and deception.
As a result, Google is investing more in AI safety, or so they claim. Google DeepMind, the division responsible for Gemini and other GenAI projects, announced today the creation of a new entity, AI Safety and Alignment. This new organization will consist of existing AI safety teams, as well as new groups of GenAI experts.
Google did not disclose how many new hires would be made for the new organization, but it did say that AI Safety and Alignment will have a new team dedicated to ensuring the safety of artificial general intelligence (AGI), which are theoretical systems that can do anything a human can do.
The new team in AI Safety and Alignment has a similar goal to the Superalignment division that OpenAI created last July: to work on the technical challenge of controlling superintelligent AI, which does not exist yet. The new team will collaborate with DeepMind’s existing AI safety research team in London, Scalable Alignment.
Why have two teams working on the same problem? That’s a good question — and one that we can only guess at, since Google is not very forthcoming with details at this point. But it is interesting that the new team — the one in AI Safety and Alignment — is based in the US, close to Google HQ, while the company is trying to keep up with its AI competitors and show a careful and ethical approach to AI.
The other teams in AI Safety and Alignment are in charge of creating and applying concrete safeguards to Google’s Gemini models, both existing and future ones. Safety is a wide-ranging area. But some of the organization’s immediate priorities will be avoiding bad medical advice, protecting children and “stopping the increase of bias and other unfairness.”
The team will be led by Anca Dragan, who used to be a staff research scientist at Waymo and a computer science professor at UC Berkeley.
- Dragan told TechCrunch via email that the goal of their work [at the AI Safety and Alignment organization] is “to help models understand human preferences and values better and more reliably, to acknowledge their own uncertainty, to cooperate with people to comprehend their needs and to seek informed guidance, to resist adversarial attacks and to consider the diversity and changeability of human values and perspectives.”
Dragan’s work with Waymo on AI safety systems could cause some surprise, given the Google self-driving car venture’s recent troubles.
So could her choice to divide her time between DeepMind and UC Berkeley, where she leads a lab that studies algorithms for human-AI and -robot interaction. One might think that AGI safety — and the long-term risks that the AI Safety and Alignment organization plans to investigate, such as stopping AI from “supporting terrorism” and “undermining society” — would demand a director’s full-time focus.
Dragan, however, claims that her UC Berkeley lab’s and DeepMind’s research are both connected and complementary.
She said, “My lab and I have been working on … value alignment in preparation for advancing AI capabilities, [and] my own Ph.D. was in robots understanding human goals and being honest about their own goals to humans, which is how I got interested in this area.” She added, “I think [DeepMind CEO] Demis Hassabis and [chief AGI scientist] Shane Legg were eager to hire me because of this research experience and because of my view that dealing with current-day issues and catastrophic risks are not incompatible — that on the technical side solutions often overlap, and work that helps the long term also improves the present day, and vice versa.”
Dragan has a lot of work to do, to put it mildly.
GenAI tools are facing a lot of doubt — especially when it comes to deepfakes and misinformation. According to a YouGov poll, 85% of Americans said that they were very worried or somewhat worried about the spread of deceptive video and audio deepfakes. Another survey from The Associated Press-NORC Center for Public Affairs Research found that almost 60% of adults think AI tools will make more false and misleading information during the 2024 U.S. election cycle.
Enterprises, too — the big targets that Google and its competitors want to attract with GenAI innovations — are cautious of the tech’s limitations and their consequences
- A survey by Intel subsidiary Cnvrg.io found that about a fourth of the companies that are testing or using GenAI apps had doubts about GenAI compliance and privacy, reliability, the high cost of implementation and the lack of technical skills required to fully utilize the tools.
Another survey by Riskonnect, a provider of risk management software, showed that more than half of the execs were concerned about employees making decisions based on incorrect information from GenAI apps.
These concerns are not unfounded. The Wall Street Journal reported last week that Microsoft’s Copilot suite, which uses GenAI models similar to Gemini in architecture, often makes errors in meeting summaries and spreadsheet formulas. The culprit is hallucination — the term for GenAI’s tendency to make things up — and many experts think it can never be completely fixed.
Dragan acknowledges the difficulty of the AI safety challenge and does not guarantee a flawless model — she only says that DeepMind plans to invest more resources in this area and announce a framework for assessing GenAI model safety risk “soon.”
She said, “I think the key is to … [account] for the remaining human cognitive biases in the data we use to train, good uncertainty estimates to know where the gaps are, adding inference-time monitoring that can detect failures and confirmation dialogues for important decisions and tracking where [a] model’s capabilities are to engage in potentially harmful behavior.” She added, “But that still leaves the open problem of how to be sure that a model won’t misbehave some small fraction of the time that’s hard to find empirically, but may show up at deployment time.”
I’m not sure customers, the public and regulators will be so forgiving. It’ll depend, I guess, on how bad those misbehaviors are — and who’s affected by them.
Dragan said, “Our users should hopefully experience a model that is more and more helpful and safe over time.” Indeed.