Data Scienctist Code of Ethics Data Scienctist Code of Ethics

This code of ethics for data sharing is created and proposed for adoption by the data science community to reflect the behaviors and principles for the responsible and ethical use and sharing of data by data scientists.

As a community-driven crowdsourced effort, you can join the the discussion and contribute to the next version of the Community Principles on Ethical Data Sharing.

Overview

The Community Principles on Ethical Data Practices are being developed by people from the data science community in conjunction with data science organizations. These principles focus on defining ethical and responsible behaviors for sourcing, sharing and implementing data in a manner that will cause no harm and maximize positive impact. The goal of this initiative is to develop a community-driven code of ethics for data collection, sharing and utilization that provides people in the data science community a standard set of easily digestible, recognizable principles for guiding their behaviors.

This code is not intended to be all encompassing. Rather, these principles will provide academia, industry, and individual data scientists a common set of guidelines for driving the development of standards, curriculums, and best practices for the ethical use and sharing of data, ultimately advancing the responsible and ethical use of data as a collective force for good.

Background

Since launching in September 2017 at Bloomberg’s Data for Good Exchange (D4GX) conference in New York City, the data science community formed several working groups, led by Data For Democracy, to conduct literature reviews and draft principles for consideration.

As a living document, the principles will be reviewed regularly and D4GX will serve as a convener for voting and updating the Community Principles on Ethical Data Sharing. As a community-driven crowdsourced effort, you can also join the ongoing discussion and contribute to the next version of CPEDS Code of Ethics by signing up here.

Code of Ethics Code of Ethics

Values

  • Fairness

    Understand, mitigate and communicate the presence of bias in both data practice and consumption.

  • Benefit

    Set people before data and be responsible for maximizing social benefit and minimizing harm.

  • Openness

    Practice humility and openness. Transparent practices, community engagement, and responsible communications are an integral part of data ethics.

  • Reliability

    Ensure that every effort is made to glean a complete understanding of what is contained within data, where it came from, and how it was created. Extend this effort for future users of all data and derivative data.

Principles

As data practitioners and data consumers, we aim to...

  1. Consider (if not collect) informed and purposeful consent of data subjects for all projects, and discard resulting data when that consent expires.

  2. Make best effort to guarantee the security of data, subjects, and algorithms to prevent unauthorized access, policy violations, tampering, or other harm or actions outside the data subjects’ consent.

  3. Make best effort to protect anonymous data subjects, and any associated data, against any attempts to reverse-engineer, de-anonymize, or otherwise expose confidential information.

    • This includes all intermediate results, working with individuals or companies to help them maintain the anonymity of all data and parties involved, and supporting the rights to explanation, recourse, and rectification for any data subjects impacted by data work.

  4. Practice responsible transparency as the default where possible, throughout the entire data lifecycle.

    • This includes providing enough context and documentation to enable other trained practitioners to understand and evaluate the use of data.

  5. Foster diversity by making efforts to ensure inclusion of participants, representation of viewpoints and communities, and openness. The data community should be open to, welcoming of, and inclusive of people from diverse backgrounds.

    • This can be achieved by: being conscious of, and owning the results of actions, regardless of intent; promoting the voices of marginalized groups; acknowledging and self-checking privilege; accepting checks of privilege by others in good faith, and using privilege to advocate for equity.
    • The data community will not remain silent when witnessing others behaving in a manner that is not accessible, open, welcoming and inclusive.

  6. Acknowledge and mitigate unfair bias throughout all aspects of data work.

    • This includes but is not limited to providing details and methodologies around data collection, processing and storage, and actively working to identify and disclose bias in algorithms, training data, and test data.

  7. Hold up datasets with clearly established provenance as the expected norm, rather than the exception.

    • As a data collector, be responsible for recording provenance; as a data publisher, be responsible for propagating provenance; as a data scientist, be responsible for reviewing, considering, and declaring what is known about data provenance.
    • Provenance is a living part of data work and can evolve with the project and all reasonable efforts should be made to understand and pass on provenance work.

  8. Respect relevant tensions of all stakeholders as it relates to privacy and data ownership.

  9. Take great care to communicate responsibly and accessibly.

    • This includes: acknowledging and disclosing caveats and limitations to the process and outputs; considering and providing clear opportunities for feedback from all stakeholders; considering and discussing whether something should be done (not just if it can be done); and clearly communicating who may be impacted, and how they are impacted, in order to minimize any potential harm from data work.

  10. Ensure that all data practitioners take responsibility for exercising ethical imagination in their work, including considering the implication of what came before and what may come after, and actively working to increase benefit and prevent harm to others.

View the D4GX Recording

You can view the recording of the livestream of the event on Vimeo, and participate in the ongoing discussion on Slack.

The initial version of the Community Principles on Ethical Data Sharing will be made available here for signing and/or further comments in the near future. Subscribe to updates using the button above.

Working Group Authors

Thought Diversity

Bethany Patrick, Code for America, Civic Data Alliance, Code Louisville Maria Filippelli, DataKind Carl Marcelus, Bloomberg Becki Hyde, Humana, Code for America, Civic Data Alliance Maya Sabatello Lisa Green, Domino Data Lab Joshua Cohen, University of California, Berkeley, Apple University, Boston Review Susan McGregor, Columbia University Graduate School of Journalism Sharon Sputz, Data Science Institute at Columbia University Marlon Harris Nicolas Le Roux Rachael Riley, Two Sigma Shawn Janzen, American University Gil Appel, USC Marshall School of Business Margeaux Spring, Atria Senior Living, Code for America, Civic Data Alliance, Code Louisville

Bias

Dan Gould, Tinder Catie Bialick, Arnold Foundation Marie-Apolline Barbara, Cornell Tech Juan LaVista, Microsoft Jake Metcalf, Data & Society Julia Stoyanovich, Drexel university Brittny Saunders, NYC Ben Wellington, Two Sigma Clay Eltzroth, Bloomberg Maria Filippelli (Moderator) Joe Blankenship Jose Luis Delgado Davara

Privacy and Security

Matt Gee, BrightHive Lutz Finger, Cornell University Jeannette Wing, Columbia University's Data Science Institute Shamus Khan, Columbia University Susan McGregor, Tow Center for Digital Journalism / Columbia Journalism School Bruce Kogut, Sanford C. Bernstein Center for Leadership and Ethics at Columbia Business School Arnaud Sahuguet, Cornell Tech Juan LaVista, Microsoft Gabrielle Berman, UNICEF Stefaan Verlhurt, The GovLab Samantha Grassle, NYC CTO Erin Stein, Overdeck Amanda Stent, Bloomberg

Responsible Communication

Maureen (Mo) Johnson, Unpredict | Double Union (writing/research lead) Bernease Herman, University of Washington’s eScience Institute (writing/research lead) Emily Grimes (writing/research lead) Craig Fryar, Data and Analytics executive; data.world Advisor (writing/research lead) Megan Risdal, Kaggle Meghan O'Connell, Kaggle Susan McGregor, Tow Center for Digital Journalism / Columbia Journalism School Catie Bialick, Arnold Foundation Lisa Green, Domino Data Lab Andrew Means, BrightHive Amanda Stent

Provenance and Ownership

David Morar, George Mason University Bill Howe, University of Washington’s eScience Institute Julia Stoyanovich, Drexel University H.V. Jagadish, University of Michigan Laura Noren, NYU Patrick McGarry, data.world Matt Gee, University of Chicago Ilana Lichtenstein, USyd grad, Australia Anne Washington, NYU Bryon Jacob, data.world Stefaan Verhult, The GovLab Samantha Grassle, NYC CTO Office

Transparency and Openness

Abhijeet Chavan (Moderator) Alondra Nelson, Social Science Research Council and Columbia University Bernease Herman, University of Washington’s eScience Institute Catie Bialick, Arnold Foundation Carl Marcelus, Bloomberg Christine Chung, Meetup H.V. Jagadish, University of Michigan Holly Taylor, Johns Hopkins Bloomberg School of Public Health Julia Stoyanovich, Drexel University Lisa Green, Domino Data Lab Margeaux Spring, Civic Data Alliance Mehdi Jamei, Bayes Impact Mo Johnson, Unpredict | Double Union

Questions and Answers

Lilian Huang, Data for Democracy (Moderator) Gil Appel, University of Southern California Craig Fryar, data.world Advisor Dan Gould, Tinder Lisa Green, Domino Data Lab Christine Henry, DataKind UK

be part of this

Join the the ongoing discussion and contribute to the next version of the Community Principles on Ethical Data Sharing.

JOIN

In order of signing

47 signatories

  1. Joe Boutros

  2. Patrick McGarry

  3. Giulio Valentino Dalla Riva

  4. Jon Loyens

  5. Emily Grimes

  6. Donald Vetal

  7. Natalie Evans Harris

  8. Gabriel Benatti Alvim

  9. Keith McCloskey

  10. Mike Merry

  11. Seraphim Alvanides

  12. Maureen (Mo) Johnson

  13. sss

  14. Daniel de Wet

  15. Christopher Milne

  16. Basil Hayek

  17. Hugo Lopes

  18. Zach

  19. Andrew Tom

  20. Michael Sandberg

  21. Dave Goodsmith

  22. Bernease Herman

  23. JennyRichards

  24. Jeff Parks

  25. Shubham Gandhi

  26. David Lawson

  27. Lori Hood Lawson

  28. Joseph Coelho

  29. Pin Gee Lim

  30. Tamas Szuromi

  31. Jeff Parks

  32. Scott N Kurland

  33. Daniel Nicely

  34. "Myron L. Pulier

  35. Franziska Gonder

  36. Joanna

  37. Guilherme Machado

  38. Astrid Countee

  39. Peter Horniak

  40. David Guardia

  41. Jeffrey Silverman

  42. Tynan Daly

  43. Hamza BEN AMARA

  44. Maddulety Swamy Yettikadi

  45. Andres Velasquez Martinez

  46. David Smith

  47. Clare Bates Congdon