The Community Principles on Ethical Data Practices are being developed by people from the data science community in conjunction with data science organizations. These principles focus on defining ethical and responsible behaviors for sourcing, sharing and implementing data in a manner that will cause no harm and maximize positive impact. The goal of this initiative is to develop a community-driven code of ethics for data collection, sharing and utilization that provides people in the data science community a standard set of easily digestible, recognizable principles for guiding their behaviors.
This code is not intended to be all encompassing. Rather, these principles will provide academia, industry, and individual data scientists a common set of guidelines for driving the development of standards, curriculums, and best practices for the ethical use and sharing of data, ultimately advancing the responsible and ethical use of data as a collective force for good.
Since launching in September 2017 at Bloomberg’s Data for Good Exchange (D4GX) conference in New York City, the data science community formed several working groups, led by Data For Democracy, to conduct literature reviews and draft principles for consideration.
As a living document, the principles will be reviewed regularly and D4GX will serve as a convener for voting and updating the Community Principles on Ethical Data Sharing. As a community-driven crowdsourced effort, you can also join the ongoing discussion and contribute to the next version of CPEDS Code of Ethics by signing up here.
Understand, mitigate and communicate the presence of bias in both data practice and consumption.
Set people before data and be responsible for maximizing social benefit and minimizing harm.
Practice humility and openness. Transparent practices, community engagement, and responsible communications are an integral part of data ethics.
Ensure that every effort is made to glean a complete understanding of what is contained within data, where it came from, and how it was created. Extend this effort for future users of all data and derivative data.
Consider (if not collect) informed and purposeful consent of data subjects for all projects, and discard resulting data when that consent expires.
Make best effort to guarantee the security of data, subjects, and algorithms to prevent unauthorized access, policy violations, tampering, or other harm or actions outside the data subjects’ consent.
Make best effort to protect anonymous data subjects, and any associated data, against any attempts to reverse-engineer, de-anonymize, or otherwise expose confidential information.
Practice responsible transparency as the default where possible, throughout the entire data lifecycle.
Foster diversity by making efforts to ensure inclusion of participants, representation of viewpoints and communities, and openness. The data community should be open to, welcoming of, and inclusive of people from diverse backgrounds.
Acknowledge and mitigate unfair bias throughout all aspects of data work.
Hold up datasets with clearly established provenance as the expected norm, rather than the exception.
Respect relevant tensions of all stakeholders as it relates to privacy and data ownership.
Take great care to communicate responsibly and accessibly.
Ensure that all data practitioners take responsibility for exercising ethical imagination in their work, including considering the implication of what came before and what may come after, and actively working to increase benefit and prevent harm to others.
View the D4GX Recording
The initial version of the Community Principles on Ethical Data Sharing will be made available here for signing and/or further comments in the near future. Subscribe to updates using the button above.
Bethany Patrick, Code for America, Civic Data Alliance, Code Louisville Maria Filippelli, DataKind Carl Marcelus, Bloomberg Becki Hyde, Humana, Code for America, Civic Data Alliance Maya Sabatello Lisa Green, Domino Data Lab Joshua Cohen, University of California, Berkeley, Apple University, Boston Review Susan McGregor, Columbia University Graduate School of Journalism Sharon Sputz, Data Science Institute at Columbia University Marlon Harris Nicolas Le Roux Rachael Riley, Two Sigma Shawn Janzen, American University Gil Appel, USC Marshall School of Business Margeaux Spring, Atria Senior Living, Code for America, Civic Data Alliance, Code Louisville
Dan Gould, Tinder Catie Bialick, Arnold Foundation Marie-Apolline Barbara, Cornell Tech Juan LaVista, Microsoft Jake Metcalf, Data & Society Julia Stoyanovich, Drexel university Brittny Saunders, NYC Ben Wellington, Two Sigma Clay Eltzroth, Bloomberg Maria Filippelli (Moderator) Joe Blankenship Jose Luis Delgado Davara
Matt Gee, BrightHive Lutz Finger, Cornell University Jeannette Wing, Columbia University's Data Science Institute Shamus Khan, Columbia University Susan McGregor, Tow Center for Digital Journalism / Columbia Journalism School Bruce Kogut, Sanford C. Bernstein Center for Leadership and Ethics at Columbia Business School Arnaud Sahuguet, Cornell Tech Juan LaVista, Microsoft Gabrielle Berman, UNICEF Stefaan Verlhurt, The GovLab Samantha Grassle, NYC CTO Erin Stein, Overdeck Amanda Stent, Bloomberg
Maureen (Mo) Johnson, Unpredict | Double Union (writing/research lead) Bernease Herman, University of Washington’s eScience Institute (writing/research lead) Emily Grimes (writing/research lead) Craig Fryar, Data and Analytics executive; data.world Advisor (writing/research lead) Megan Risdal, Kaggle Meghan O'Connell, Kaggle Susan McGregor, Tow Center for Digital Journalism / Columbia Journalism School Catie Bialick, Arnold Foundation Lisa Green, Domino Data Lab Andrew Means, BrightHive Amanda Stent
David Morar, George Mason University Bill Howe, University of Washington’s eScience Institute Julia Stoyanovich, Drexel University H.V. Jagadish, University of Michigan Laura Noren, NYU Patrick McGarry, data.world Matt Gee, University of Chicago Ilana Lichtenstein, USyd grad, Australia Anne Washington, NYU Bryon Jacob, data.world Stefaan Verhult, The GovLab Samantha Grassle, NYC CTO Office
Abhijeet Chavan (Moderator) Alondra Nelson, Social Science Research Council and Columbia University Bernease Herman, University of Washington’s eScience Institute Catie Bialick, Arnold Foundation Carl Marcelus, Bloomberg Christine Chung, Meetup H.V. Jagadish, University of Michigan Holly Taylor, Johns Hopkins Bloomberg School of Public Health Julia Stoyanovich, Drexel University Lisa Green, Domino Data Lab Margeaux Spring, Civic Data Alliance Mehdi Jamei, Bayes Impact Mo Johnson, Unpredict | Double Union
Lilian Huang, Data for Democracy (Moderator) Gil Appel, University of Southern California Craig Fryar, data.world Advisor Dan Gould, Tinder Lisa Green, Domino Data Lab Christine Henry, DataKind UK
be part of this
In order of signing
Giulio Valentino Dalla Riva
Natalie Evans Harris
Gabriel Benatti Alvim
Maureen (Mo) Johnson
Daniel de Wet
Lori Hood Lawson
Pin Gee Lim
Scott N Kurland
"Myron L. Pulier