Data Science: Productivity Tools
Keep your projects organized and produce reproducible reports using GitHub, git, Unix/Linux, and RStudio. As part of our Professional Certificate Program in Data Science, this course covers the basics of data visualization and exploratory data analysis. We will use three motivating examples and ggplot2, a data visualization package for the statistical programming language R. We will start with simple datasets and then graduate to case studies about world health, economics, and infectious disease trends in the United States.
We’ll also be looking at how mistakes, biases, systematic errors, and other unexpected problems often lead to data that should be handled with care. The fact that it can be difficult or impossible to notice a mistake within a dataset makes data visualization particularly important.
The growing availability of informative datasets and software tools has led to increased reliance on data visualizations across many areas. Data visualization provides a powerful way to communicate data-driven findings, motivate analyses, and detect flaws. This course will give you the skills you need to leverage data to reveal valuable insights and advance your career.
An up-to-date browser is recommended to enable programming directly in a browser-based interface.
Estimated Effort1-2 Hours / Week
Upon successful completion, participants will earn a professional certificate from Harvard University.
- How to use Unix/Linux to manage your file system
- How to perform version control with git
- How to start a repository on GitHub
- How to leverage the many useful features provided by RStudio
- R is listed as a required skill in 64% of data science job postings and was Glassdoor’s Best Job in America in 2016 and 2017. (source: Glassdoor)
- Companies are leveraging the power of data analysis to drive innovation. Google data analysts use R to track trends in ad pricing and illuminate patterns in search data. Pfizer created customized packages for R so scientists can manipulate their own data.
- 32% of full-time data scientists started learning machine learning or data science through a MOOC, while 27% were self-taught. (source: Kaggle, 2017)
- Data Scientists are few in number and high in demand. (source: TechRepublic)