The Evolution of Fair Use in Copyrighted Material
Denas Grybauskas, Chief Governance and Strategy Officer at Oxylabs, predicts a focus on the transformative use of copyrighted material in US law discussions. The fair use doctrine, which allows for the transformative use of content, will likely be a key point of contention when it comes to using web data for AI training. In jurisdictions like the EU, technological solutions for credit attribution and creator remuneration will be crucial.
The Rise of Agentic Systems for Data Collection
Julius ÄŒerniauskas, CEO at Oxylabs, anticipates advancements in agentic systems for public data collection. AI agents will play a significant role in automating tasks like web scraping, making data access more democratized and cost-effective. The market can expect a surge in tools and features that streamline data collection processes.
LLM Usage for Parsing on the Rise
Juras Juršėnas, COO at Oxylabs, foresees an increase in the use of LLMs for parsing data over the next year. With improvements in data parsing technologies, developers will have access to tools that can handle parsing tasks more efficiently, reducing the need for manual cleaning of HTML data. The market is witnessing a growing number of options for automated parsing tools.
Quality Over Quantity in Data Collection
Rytis Ulys, Head of Data & AI at Oxylabs, emphasizes the importance of focusing on data quality over quantity in 2026. Research has shown that even small amounts of low-quality data can have a detrimental impact on datasets. As businesses prioritize robust data collection practices, the emphasis will be on quality, lineage, and efficiency in data retrieval processes.
Enhancing Understanding of Online Data Collection
Looking ahead, comprehensive agentic systems, increased LLM usage, and a shift towards quality data search are on the horizon. Legal decisions surrounding copyright law will also shape the future of data collection. As businesses navigate these changes, new tools and capabilities will emerge to streamline processes and enhance understanding of web data collection’s significance.