Data Scientists Leaning on External Sources

A new survey of data scientists revealed that the vast majority have significantly improved their models by using external data – although poor-quality data continued to be their number one challenge.

According to “The Importance and Impact of Using External Data to Make Critical Decisions” published by Doorda, 92 percent of respondents noted that the application of external data had improved their outcomes significantly.

“In today’s data-driven business landscape, understanding the role of external data in making critical decisions is crucial,” the market survey notes.

The report provides insights into how external data is key to enhance outcomes for businesses. “While it highlights the potential benefits, it also emphasizes the importance of due diligence in managing external data supply chains, particularly in comparison to the robust governance of internal data,” the report says.

The survey revealed that the majority – 70 percent – of models contain at least 20 percent of externally sourced data. This is key to continuing the improvement and innovation of new and existing data models, the report notes.

Specifically, introducing fresh and trustworthy data is becoming increasingly important with the rise of AI technologies.

This is why it is crucial to regularly investigate external data. The survey noted that nearly 100 percent of respondents say that investigating external data is important.

Despite 92 percent of the 300 data scientists surveyed stating that their models had been significantly improved at some point by adding external data, 40 percent of them also said that they had used data that later proved untrustworthy or unavailable.

The main challenges when integrating external data were stated as poor data quality, incomplete data, and incompatible data.

“There is no single, perfect way of finding good external data, with results spread across a number of source potentials, but existing data suppliers, established marketplaces and internal colleagues are unsurprisingly the main ways of finding out about external data,” the report finds.

Additionally, the survey of data scientists revealed 140 different job titles in play where the respondent’s role is principally data science and data analytics.

The number and inconsistency of job titles shows that data science and analytics are still evolving and raises the question, Doorda says, whether companies realize just how widespread and important data science has become within their own organizations.

“Navigating this evolving data landscape is a complex task, and this report aims to guide businesses by offering insights and recommendations that will help them harness the potential of external data while addressing the challenges it presents,” the survey notes.