Data Science, a new discovery paradigm, is potentially one of the most significant advances of
the early 21st century. Originating in scientific discovery, it is being applied to every human endeavor for
which there is adequate data. While remarkable successes have been achieved, even greater claims have
been made. Benefits, challenge, and risks abound. The science underlying data science has yet to emerge.
Maturity is more than a decade away. This claim is based firstly on observing the centuries-long
developments of its predecessor paradigms - empirical, theoretical, and Jim Gray's Fourth Paradigm of
Scientific Discovery (Hey, Tansley & Tolle, 2009) (aka eScience, data-intensive, computational, procedural);
and secondly on my studies of over 150 data science use cases, several data science-based startups, and,
on my scientific advisory role for Insight1, a Data Science Research Institute (DSRI) that requires that I
understand the opportunities, state of the art, and research challenges for the emerging discipline of data
science. This chapter addresses essential questions for a DSRI: What is data science? and What is worldclass data science research? A companion chapter On Developing Data Science (Brodie, 2018b) addresses the development of data science applications and of the data science discipline itself