Data Practices Courseware Data Scienctist Code of Ethics

Welcome!


This content is a collection of knowledge from many experts and practitioners across the data ecosystem. We invite you to explore the content below freely and share, present, or incorporate this work into your own content or commercial enterprises (all content is licensed with CC-BY-4.0).


Experts: We welcome any contributions, refinements, or additions to this body of work. Our aim is to improve the general level of data literacy for any who are interested. Feel free to submit changes via our Github repository, or contact us directly for other options.

1. Project Lifecycle Curriculum

1.1 Kicking off a data project

Effective data collaboration takes more than giving everyone access. Inclusion, flexibility, setting, and prioritization are also keys to fruitful data work.

This workshop will help you to understand the Data Practices Values and Principles, which describes the most effective, ethical, and modern approach to data teamwork, and how to best kick off a modern data project. Signed by some of the brightest minds in data, these points can create positive change wherever data collaboration occurs.

Topics covered:

  1. History of data practices movement
  2. Break down objectives
  3. Consider stakeholders
  4. Refining questions
  5. Planning for output / solution

1.2 Sourcing data

Once you have planned out a data project it’s time to gather your resources. There are many things to consider, including where data, documentation, and additional team members can be sourced.

This workshop will help you to understand how to evaluate data project resources including how to aggregate your existing data, the benefits of open data (both using and generating), the pros and cons of purchased data, and how to ensure that your data project can avoid costly pitfalls caused by problems around data provenance, data prep, or anything else.

Topics covered:

  • Defining constraints for questions
  • Importance of the "right" data
  • Spectrum of Open vs closed
  • Finding Data
  • What to do About Dark Data
  • Data Purchasing or Bartering
  • Other Things to consider

1.3 "Profile and Prepare"

Once you have all of your data, team, documentation, and other resources in place, it’s time to dig in and go to work.

This workshop will help users to understand how to effectively evaluate and define their data. Understanding the shape of your data, what features and limitations may be inherent in the data, what data might be missing or improperly formatted, and many other things can help a data team to get the most out of their data.

After profiling your data you’ll need to clean and prepare your data for actual use. This workshop will help you understand some of the common methods of cleaning data and how both technical and non-technical team members can assist with this process.

Topics covered:

Profile

  • Data Joining / Normalization
  • Profiling Data
    • Characteristics of Data
    • Defining Your Data
    • Discover Content
    • Discover Relationships

Prepare

  • Data Cleaning
  • Backflow and Automation
  • Data Wrangling


1.4 Data Exploration

This workshop will help you develop a deeper understanding of your data through querying, visualizing, or other initial exploration techniques.

Topics covered:

  • Establish / Refine Hypothesis
  • Queries for Everyone
  • Data Sampling
  • Analysis Methods
  • Exploratory Visualization
  • Beyond Statistics
  • The Art of Feature Engineering

1.5 "Analyze and Report"

This workshop will help you to understand how to build models and answer questions in a reproducible way and then deliver the results of that work to your target audience

Topics covered:

Ask

  • Evolve Your Hypthesis
  • Hindsight VS Insight VS Foresight
  • Tools and Processes

Analyze

  • Types of Analysis
  • Governance
  • Introduction to Modeling
  • Notebooks and You!

Report

  • Data Storytelling
  • Matching Reporting to Your Audience
  • Building Diversity of Outputs
  • Data Visualization

1.6 "Data Project Post-Mortem"

This workshop will help you to understand how to build models and answer questions in a reproducible way and then deliver the results of that work to your target audience

Topics covered:

  • Writing a Data Project Questionnaire
  • Qualitative vs Quantitative
  • Things to Evaluate
  • Data Process
  • Example Questionnaires
  • Action Items

2. Culture / Practice Curriculum

2.1 How to build a data-driven culture

The world of modern data teamwork isn’t one that can be created by software and business process alone. Individuals will need to alter their behavior, which is the hardest part about change.

This workshop will show you not only how to help your team evolve, but the reasons that will make it clear as to why they should.

Topics covered:

  • Framing the Problem
  • Pillars of a data-driven company
  • Data driven leadership
  • Decision making
  • Treat data like an asset! (operationalized!)
  • Data governance isn't a dirty word
  • Break down silos
  • Ask Questions
  • "Culture Tweaks"

2.2 A survey of data viz (Coming Soon!)

When it comes to data visualization there are many options available. This workshop will compare and contrast many of the popular data visualization tools and why different tools might make sense for different use cases.

Resources:


2.3 Moving from Computer Science to Data Science (Coming Soon!)

Data Scientist is the hottest new job title, and people with many different backgrounds are choosing to pivot their careers to become data practitioners. This workshop will help those coming from a developer background to understand the similarities and differences between computer science / software development and data science.

Topics covered:

  • Statistics
  • Data processing using code and notebooks
  • Reproducibility
  • Provenance

2.4 Data Science for People Who Don't Code (Coming Soon!)

Too often, many data practitioners dismiss their role in the data ecosystem because they aren’t writing code. Talented spreadsheet users, database admins, database admins, and a host of others all have the ability to contribute heavily to the data work within an organization.

This workshop will cover how to be as impactful as possible, even if you aren’t a technical user.

Topics covered:

  • Introduction to queries
  • Formulas and calculations
  • How to deliver/share your data and insights with a team

2.5 Data Visualization Best Practices

Too often, when we discuss data visualization it is to cover the tactical nature of how to perform a task within a specific piece of software (ex: “how to make a bar chart”). Rarely do we discuss the correct time to use different types of viz, and how to build them using established best practices for the greatest impact.

This workshop will cover the technology-agnostic best practices for visualization design.

Topics covered:

  • Types of visualizations
  • When to use different families of viz
  • Data as art
  • How to avoid visual clutter
  • Data viz pitfalls

2.6 Data Ethics 101 (Coming Soon!)

The pervasiveness of data, especially in business decisions, is exploding. Unfortunately many of those who are new to the data ecosystem don’t always stop to think about whether just because they can do something doesn’t necessarily mean they should. This course will examine the FORTS (Fairness, Openness, Reliability, Trust, and Social Behavior) framework along with establishing the principles for good ethical data work. This work is based on the Global Data Ethics Pledge from Data for Democracy.

This workshop will cover the technology-agnostic best practices for visualization design.

Topics covered:

  • FORTS Framework
  • Ethical Principles
  • Ethical best practices

2.7 SQL for Non-Technical People

Too often data access, and understanding, is limited to those of us without a development background because the idea of writing SQL (Structured Query Language) statements is intimidating. This course is designed to demystify this data access method and help broaden understanding of what kinds of questions can be asked of our data, and how.

This workshop will cover some of the rationale behind databases, commons missteps, and introduce advanced concepts for comprehension.

Topics covered:

  • Why Databases?
  • Database Jargon
  • Accessing Data
  • Common Errors
  • Advanced Concepts

About the Data Practices Community

History

The Data Practices movement originated at the 2017 Open Data Science Leadership Summit hosted in San Francisco. This event gathered together leaders in data science, semantics, open source, visualization, and industry to discuss the current state of the data community. It was discovered that there were many similarities between the then current challenges around data, and the previous difficulties felt in software development that were addressed by Agile.

The goal of the Data Practices movement was to start a similar "Agile for Data" movement that could help offer direction and improved data literacy across the ecosystem. While the first step was the "Manifesto for Data Practices" the intent was always to move past that and apply the values and principles to a series of free and open courseware that could benefit anyone who was interested.


Code of Conduct

Any time you gather a large group of people together there is the potential for conflict. We believe that having a set of guidelines to help put people in the right frame of mind is helpful and can avoid problems down the line. Take a look at our code of conduct for a basis for behavior in this community.