- This event has passed.
Responsible Data Science Seminar Series – Event 08
April 13, 2017 @ 4:00 pm - 5:30 pm
On 13 April, we will hold our monthly seminar series on Responsible Data Science (RDS), a joint collaboration of expert researchers from 11 knowledge institutions across the Netherlands: Academisch Medisch Centrum (AMC), Centrum Wiskunde en Informatica (CWI), Delft University of Technology (TUD), Eindhoven University of Technology (TU/e), Leiden University (LU), Leiden University Medical Center (LUMC), Radboud University Nijmegen (RU), Tilburg University (UvT), University of Amsterdam (UvA), VU Medical Center Amsterdam (VUmc), VU University Amsterdam.
The RDS initiative is driven by the omnipresence of data making society increasingly dependent on data science. Despite its great potential, there are also many concerns on irresponsible data use. Unfair or biased conclusions, disclosure of private information, and non-transparent data use, may inhibit future data science applications.
- 16:00-16:05
- Introduction & Overview on Responsible Data Science by Frank van Harmelen, Professor in Knowledge Representation and Reasoning, Department of Computer Science (VU)
- 16:05-16:30
- Speaker 1: Mark van de Wiel, Professor in Statistics for Genomics at the Dep. of Epidemiology & Biostatistics (VUmc) and Dep. of Mathematics (VU)
Better omics-based predictions through the use of big co-data
Abstract: In typical cancer genomics studies, the number of samples, n, is relatively small, say 50 to 500, compared to the number of features, p, say 10^3 to 10^6. Fortunately, a potentially large amount of prior information on the features may be available. Some examples of such ‘co-data’ are: p-values from an external study, additional molecular measurements or genomic annotation. The statistical challenge is to make responsible use of such co-data to potentially improve predictions and classifications for individuals. We discuss Empirical Bayes as an approach to automatically and objectively include the co-data information in several prediction algorithms, such as penalized regression and the random forest. The emphasis will be on concepts, rather than on mathematical technicalities. The systematic use of co-data can considerably improve predictions and feature selection, which we demonstrate with an application of the methodology to molecular cervical cancer diagnostics. Finally, some extensions to other problems, such as network estimation, are shortly discussed. - 16:30-16:55
- Speaker 2: Melanie Peters, Director of the Rathenau Institute
Society first, data second
Abstract: The possibilities of data are endless. Collecting, analyzing and combining data allows scientists to discover patterns in behavior or discourse that were hard to study before. Much is expected form these new possibilities. Especially in the area of health. However, what people do and write is not always what they need or what the collective needs. For marketeers this type of information may be enough, if the objective is stimulate buying and satisfy individual instant needs. For health professionals and those working in the public interest it is not what their task is about. It is about long term public health. Understanding the public interest and deciding which interventions in health are desired, requires deliberation, a back and forth discussion. This discussion can be informed by data, but not decided by data. How to design health research and health policy? And how to make use of data and empower patients and society?
The Rathenau Institute is a public think tank that informs politicians, decision makers and all of us about science and technology in order to make better informed choices. We do research and stimulate debate about the societal impact of new technologies, such as data. In the last year we published about data in the medical field (“Measurable man”) and about the digital society and human rights (“Upgrade”). - 16:55-17:00
- Wrap-up
- 17:00-
- Networking and drinks
- 17:30-
- Close