Data Practices:

1.1 Kicking off a data project

[Use arrow keys to navigate, "s" to show speaker notes, and "f" for fullscreen.]

PDF Print

With Notes

Christine Doig

Data scientist

Netflix

Ian Greenleigh

Marketer

data.world

Sharon Brener

Product designer

data.world

Patrick McGarry

Community builder

data.world

Special thanks to Alexander Egorenkov. Author of “How to ask questions data science can solve” medium.com/@codefluently

Values and Princples for Data Practices

What set of values and principles describes the most effective, ethical, and modern approach to data teamwork?

4 core values

12 principles

39 authors

1400+ signatories

datapractices.org/manifesto

Four core values

datapractices.org/manifesto

Inclusion

Maximize diversity, connectivity, and accessibility, amoung data sources, colaboration, and outputs.

Experimentation

Ephasise continuously iterative testing and data analysis.

Experimentation

Ephasise continuously iterative testing and data analysis.

Experimentation

Ephasise continuously iterative testing and data analysis.

Supported by leaders of the data community

39 authors, including:

  • Eric Colson, Chief Algorithms Officer, StitchFix
  • Amy Gershkoff, former Chief Data Scientist, Ancestry.com
  • Fernando Perez, creator of iPython, Assistant Professor, Statistics, UC Berkeley
  • Andrew Therriault, Chief Data Officer, City of Boston
  • Therese Couture, Human Trafficking Data Analyst, Polaris
  • Wes McKinney, BDFL, Pandas

1,300+ signatories, including:

  • DJ Patil, former Chief Data Scientist of the United States
  • Monica Rogati, former VP of Data Science, Jawbone
  • Kirk Borne, Principal Data Scientist, Booz Allen Hamilton
  • Tricia Wang, Fellow, Harvard Berkman Center
  • Jonathan Albright, Research Director, Tow Center for Digital Journalism
  • Gregory Piatetsky, founder, KDnuggets.com

Now what?

Data Teamwork Exercises

How can we bring Design Thinking to data teams?

How can cross-functional teams work together to create better data projects and practices?

Design thinking dicipline

+

Data project workflow

=

Better data teamwork (profit)

Data project workflow

based on data.world user research

Kickoff

Start a new project with a question, problem, or dataset.

Source

Gather data sources, documentation, research, and team.

Profile

Define the data’s shape, features, and any limitations.

Prepare

Clean, munge, and wrangle the data into a usable form.

Explore

Develop a deeper understanding by querying, visualizing.

Analyze

Answer questions and build models in a reproducible way.

Deliver

Share output with stakeholders, report discoveries.

Data project workflow

based on data.world user research

Kickoff

Start a new project with a question, problem, or dataset.

Source

Gather data sources, documentation, research, and team.

Profile

Define the data’s shape, features, and any limitations.

Prepare

Clean, munge, and wrangle the data into a usable form.

Explore

Develop a deeper understanding by querying, visualizing.

Analyze

Answer questions and build models in a reproducible way.

Deliver

Share output with stakeholders, report discoveries.

Kickoff exercises

Start a project with a good question.

Consider stakeholders’ points of view.

Inclusion

Maximize diversity, connectivity, and accessibility, amoung data sources, colaboration, and outputs.

Experimentation

Ephasise continuously iterative testing and data analysis.

Experimentation

Ephasise continuously iterative testing and data analysis.

Experimentation

Ephasise continuously iterative testing and data analysis.

Exercise

Divide up into teams of 5.

Nominate a team lead.

Think about each exercise through a particular point of view (data analyst, sales/marketing lead, product owner, etc.)

Diverge

Create options by brainstorming individually

Converge

Make decisions by discussing as a group

You work for a one year old startup that has built an on-demand childcare services platform.

The board has asked your team to double revenue in the next 12 months.

Where do you start?

Exercise 1: Break down your objective.

Write down ideas and questions that, if explored or answered, could help you meet your objective.

Frame questions as:“How might we…[achieve this goal]?”

How might we double revenue in the next 12 months?

Write down as many sub-questions as you can, one per post-it.

5 minute exercise

Draw a horizontal line. Label the ends (low, high) and the midpoint.Place your post-its according to their impact level, clustering as you go.

Write down as many sub-questions as you can, one per post-it.

10 minute exercise

Exercise 2: Consider your stakeholders.

Who will be affected by each question and motivated to act by its answer? Sales? Marketing? New users.

ap only the highest impact questions to a key stakeholder.

Map your three most impactful questions to their key stakeholder.

Write down a few things each stakeholder cares about.

Marketing

Cares about:

  • Retention
  • Marketing mix
  • Proving value of efforts

Sales

Cares about:

  • Healthy sales pipeline
  • Increasing revenue
  • Number of touches to purchase

10 minute exercise

Exercise 3: Refine your questions.

With your primary stakeholder in mind, clarify any ambiguities in each high impact question.

As a group, discuss your clarified questions and refine further.

Refine your top 3 questions by asking as a group:

Can I imagine any part of this idea or question being misinterpreted by another stakeholder?

Have I defined all ambiguous terms?

Am I being concise?

Is this question really multiple questions?

Is the intent of my question clear?

Is this a question that can be answered with available data?

10 minute exercise

You should have a stack of clear, high impact questions that are relevant to your key stakeholder and your overall objective.

We’re almost ready to kick off a productive data project, we just need a vision of what our end result should look like.

How might we double revenue in the next 12 months?

Marketing

Cares about:

  • Retention
  • Marketing mix
  • Proving value of efforts

Exercise 4: Draw the solution.

Now that you’ve refined your questions with your primary stakeholder in mind, what’s the best way to visualize* answers and output with them.

Express results however you want.

Brainstorm how best to communicate the results.

What’s the best medium for communicating with this audience?

Is the result best conveyed as a chart? A single number? A new process? A story?

What level of detail do they want/need? How technical is your audience? How busy?

What’s the single most important metric? How do we define success (and failure)?

What comparisons make sense?

Do you need to see changes over time?

5 minute exercise

Present and discuss solutions within your group.

What similarities do you see?

What are the strengths of each representation? What are the shortcomings?

What additional context will the stakeholders need to understand the representations?

What do the stakeholders expect?

What representations would be best for the largest number of stakeholders?

10 minute exercise

Present one “solution stack” to the rest of the room.

Original question

Key stakeholder + concerns

Refined question

Proposed representation

Data project workflow

based on data.world user research

Kickoff

Start a new project with a question, problem, or dataset.

Source

Gather data sources, documentation, research, and team.

Profile

Define the data’s shape, features, and any limitations.

Prepare

Clean, munge, and wrangle the data into a usable form.

Explore

Develop a deeper understanding by querying, visualizing.

Analyze

Answer questions and build models in a reproducible way.

Deliver

Share output with stakeholders, report discoveries.

Want to run a workshop like this at your company?

[email protected]



Don't forget to sign the values and principles! https://datapractices.org/manifesto