Last updated: 2023-06-04
This page is designed to provide you with the necessary resources and a step-by-step guide to learn the fundamentals of Data Science in R. It is mainly designed for a social scientist audience, meaning you are training/trained as a psychologist, sociologist, political scientist, etc., and you want to augment your skills to run all your analyses in a powerful programming language while also expanding into more novel data science techniques such as Machine Learning, Web Scraping and more. The materials provided here are designed for people with no previous programming experience or familiarity with traditional social scientific methods.
The entire course is build on the textbook Introduction to Data Science by Prof Rafael A. Irizarry (Harvard University). The textbook has also been adapted for an online course on edX. Here, I provide an augmented version of these materials as an interactive learning course created with the learnr
package datsci
. The datsci
package blends together the textbook materials with several complementary courses available on DataCamp, allowing you to practice and apply your skills on real-world datasets. Together, these resources form a step-by-step syllabus which will allow you to teach yourself R in an interactive and fun way! The materials used here are carefully selected and blended such that they complement and reinforce each other. The combination of online reading materials, comprehension checks, assessments, and practical courses and projects will allow for a smooth and efficient learning experience.
The entire course is spread over 8 sequential modules (datsci_01 - datsci_08): R Basics, Data Visualization, Probability, Inference and Modeling, Productivity Tools, Data Wrangling, Linear Regression, and Machine Learning. In each module, we use motivating case studies. In each case study, we try to realistically mimic a data scientist’s experience. For each of the concepts covered, we start by asking specific questions and answer these through data analysis. We learn the concepts as a means to answer the questions.
The materials are free and open, including the primary textbook Introduction to Data Science. We will provide assessment materials to test your learning success as you work through the syllabus. For students taking my course at the University I will provide free access to the DataCamp materials by the beginning of the trimester. If you are not participating in my University course, you will need to purchase a subscription for access to DataCamp.
This website is written using R Markdown. The source code for the website is adapted from Dr. Matt Crump’s webpage here Link. For anyone interested in adopting these materials, the idea is you can fork the repo for the website. Then edit as desired for your purposes, as did I. I am grateful for the amazing materials provided by Prof. Rafael A. Irizarry (Harvard University) which are licenced under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International CC BY-NC-SA 4.0. More materials by Prof. Rafael A. Irizarry can be found here.