What is Data Science and why would you care about Mathematics as a data scientist?

Julien Arino (julien.arino@umanitoba.ca)

Department of Mathematics & Data Science Nexus
University of Manitoba

Canadian Centre for Disease Modelling
NSERC-PHAC EID Modelling Consortium - CANMOD, OMNI/RÉUNIS & MfPH

In days of yore (circa 2010)

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...

Attributed to Dan Ariely (Duke University)

The vocabulary has evolved, big data complex data data science, but Data Science remains a loosely defined concept, although things are becoming better

Data Science (according to Wikipedia)

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

[..] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge.

The data deluge

  • Data science is nothing new (some statisticians argue it is just another name for statistics), but it has become prominent in recent years as a consequence of the unprecedented mass of information generated and collected by our modern societies
  • One speaks of information explosion or data deluge. See some considerations, e.g., here

A wide variety of jobs

We have absolutely insane amounts of data and we try to make sense of it

data science

However, except for the name, the situation has not improved significantly since the days of yore of Ariely's quote: data science is a hodge-podge that contains everything but the kitchen sink

To caricature

  • two main types of data: structured and unstructured
  • two main branches: statistics and computer science
  • two main types of jobs: users and developpers

Math of Data Science?

Recall I said

  • two main branches: statistics and computer science
  • two main types of jobs: users and developpers

So why a course on Math of Data Science?

If you plan to be a user and are not curious about the how and the why and can tolerate errors due to misuse of methods, then you probably don't care about this course

In other cases, many of the concepts used have their roots in math and to understand where the methods are coming from and, even more importantly, to develop new methods, math is often required

Warning!

We barely brush the surface here:

  • Some techniques from linear algebra
  • Some graph theory ideas

There is a lot more to see!!!

Where to go for more information

Faculty of Science @ U of M has created the Data Science NEXUS (link)

  • Education component

    • Data Science Undergraduate Program (Fall 2021)
    • Data Science Master’s Program
    • Master of Business Analytics
  • Training (workshops, COOP, internships, etc.)

  • Events (conferences, etc.)

auto-scaling: true