Big data is changing the way we do business, socialize, conduct research, and govern society. Data are collected on anything, at any time, and in any place. Organizations are investing heavily in Big data technologies and data science has emerged as a new scientific discipline providing techniques, methods, and tools to gain value and insights from new and existing data sets. Data abundance combined with powerful data science techniques has the potential to dramatically improve our lives by enabling new services and products, while improving their efficiency and quality. Many of today’s scientific discoveries (e.g., in health) are already fueled by developments in statistics, data mining, machine learning, databases, and visualization.
The importance of data science is widely acknowledged, but there are also great concerns about the use of data. Increasingly, customers, patients, and other stakeholders are concerned about irresponsible data use. Automated data decisions may be unfair or non-transparent. Confidential data may be shared unintentionally or abused by third parties. Each step in the data science pipeline (from raw data to conclusions, see figure) may create inaccuracies, e.g., if the data used to learn a model reflects existing social biases, the algorithm is likely to incorporate these biases. These concerns could lead to resistance against the large-scale use of data and make it impossible to reap the benefits of data science. Rather than avoiding the use of data altogether, we strongly believe that data science techniques, infrastructures and approaches can be made responsible by design. Therefore, we started the Responsible Data Science (RDS) consortium in which leading Dutch research groups from multiple disciplines join forces. The problems addressed by RDS are extremely urgent and challenging. Not addressing these problems will lead to a Big data winter where data are widely misused and data science results are deeply mistrusted. We believe that the RDS consortium provides a unique mix of experts and ideas to realize the scientific breakthroughs needed to use data (Big or small) in a truly positive manner.
RDS evolves around four main challenges:
- Data science without prejudice – How to avoid unfair conclusions even if they are true?
- Data science without guesswork – How to answer questions with a guaranteed level of accuracy?
- Data science that ensures confidentiality – How to answer questions without revealing secrets?
- Data science that provides transparency – How to clarify answers such that they become indisputable?
To future-proof responsible data science methods, foundational research is needed focusing on FACT, i.e., questions related to Fairness, Accuracy, Confidentiality, and Transparency.
The methods developed within RDS will be inspired by questions from four thematic areas: Responsible Science, Responsible Health, Responsible Business, and Responsible Government. The results will be evaluated using real world data sets and cases from these four thematic areas. Also organizations such as the Rathenau Institute, Statistics Netherlands (CBS), and the Netherlands Scientific Council for Government Policy (WRR) support this RDS initiative.
The RDS program is unique in that it will:
- focus on one of the most pressing challenges of our time: enabling and ensuring responsible use of data without inhibiting the power of data science;
- provide the technology to safeguard fairness, accuracy, confidentiality, and transparency by design;
- bring together top researchers in the Netherlands from key disciplines like data/process mining, digital humanities, ethics, information retrieval, knowledge representation, law, machine learning, natural language processing, security, statistics, and visualization;
- create a nation-wide multi-disciplinary platform focusing on challenges related to data science and Big data.
We anticipate scientific breakthroughs enabled by these unique features. Results of the RDS program will include new data science concepts, novel analysis and data management techniques, powerful software tools and infrastructures, and RDS education and training. The RDS team is committed to provide the means to use data in a truly responsible manner.