Privacy regulators, including the EU data protection authorities (EU DPAs), are expected to increase their scrutiny of data scraping to train AI algorithms. Data scraping involves the collection of large amounts of publicly available information, which may trigger obligations under privacy laws.
While data scraping for AI training has only recently emerged on privacy regulators’ radars, action by the EU DPAs has been robust. In 2022, the Italian DPA fined Clearview AI €20 million for scraping facial images and generating biometric information, rejecting the company’s claim that its legitimate interests provided a lawful basis for such processing. Subsequently, the Greek and French DPAs each also imposed €20 million fines on Clearview AI, and the German and Austrian DPAs have declared Clearview’s activities illegal.
These investigations all focused primarily on the unlawful processing of sensitive data (i.e., biometric information), but the focus of the EU DPAs is anticipated to expand.
Examples of the DPAs’ broader focus on data collection for AI training include the French DPA’s October 2023 guidance on the research and development of AI systems, which notes that the re-use of (public) datasets is possible only if the data were lawfully collected and the purpose of re-use is compatible with the initial collection purpose. Most recently, the Italian DPA announced in November 2023 an investigation into data scraping practices for AI training, to determine whether adequate measures are implemented to prevent the mass collection of personal information.
Scrutiny of data scraping for AI training is not expected to remain limited to the EU. In August 2023, 12 multinational privacy regulators issued a joint statement cautioning companies that engage in web scraping and companies that publish large amounts of personal information to the web on the privacy risks and legal obligations related to scraping information.
Carson Martinez contributed to authoring this blog post.