Skip to content

Commit dd9a8b9

Browse files
committed
add comma
1 parent 69e30d5 commit dd9a8b9

File tree

9 files changed

+9
-9
lines changed

9 files changed

+9
-9
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Options:
158158
- [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
159159
- [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
160160
- [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
161-
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
161+
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
162162

163163
## B. Data Storage
164164
- [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

deon/assets/checklist.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ sections:
1414
line: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
1515
- line_id: A.4
1616
line_summary: Downstream bias mitigation
17-
line: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
17+
line: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
1818
- title: Data Storage
1919
section_id: B
2020
lines:

docs/docs/examples.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ To make the ideas contained in the checklist more concrete, we've compiled examp
1010
**A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent? | <ul><li>[Facebook uses phone numbers provided for two-factor authentication to target users with ads.](https://techcrunch.com/2018/09/27/yes-facebook-is-using-your-2fa-phone-number-to-target-you-with-ads/)</li><li>[African-American men were enrolled in the Tuskagee Study on the progression of syphilis without being told the true purpose of the study or that treatment for syphilis was being withheld.](https://en.wikipedia.org/wiki/Tuskegee_syphilis_experiment)</li></ul>
1111
**A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those? | <ul><li>[StreetBump, a smartphone app to passively detect potholes, may fail to direct public resources to areas where smartphone penetration is lower, such as lower income areas or areas with a larger elderly population.](https://hbr.org/2013/04/the-hidden-biases-in-big-data)</li><li>[Facial recognition cameras used for passport control register Asian's eyes as closed.](http://content.time.com/time/business/article/0,8599,1954643,00.html)</li></ul>
1212
**A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis? | <ul><li>[Personal information on taxi drivers can be accessed in poorly anonymized taxi trips dataset released by New York City.](https://www.theguardian.com/technology/2014/jun/27/new-york-taxi-details-anonymised-data-researchers-warn)</li><li>[Netflix prize dataset of movie rankings by 500,000 customers is easily de-anonymized through cross referencing with other publicly available datasets.](https://www.wired.com/2007/12/why-anonymous-data-sometimes-isnt/)</li></ul>
13-
**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)? | <ul><li>[In six major cities, Amazon's same day delivery service excludes many predominantly black neighborhoods.](https://www.bloomberg.com/graphics/2016-amazon-same-day/)</li><li>[Facial recognition software is significanty worse at identifying people with darker skin.](https://www.theregister.co.uk/2018/02/13/facial_recognition_software_is_better_at_white_men_than_black_women/)</li><li>[-- Related academic study.](http://proceedings.mlr.press/v81/buolamwini18a.html)</li></ul>
13+
**A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)? | <ul><li>[In six major cities, Amazon's same day delivery service excludes many predominantly black neighborhoods.](https://www.bloomberg.com/graphics/2016-amazon-same-day/)</li><li>[Facial recognition software is significanty worse at identifying people with darker skin.](https://www.theregister.co.uk/2018/02/13/facial_recognition_software_is_better_at_white_men_than_black_women/)</li><li>[-- Related academic study.](http://proceedings.mlr.press/v81/buolamwini18a.html)</li></ul>
1414
| <center>**Data Storage**</center>
1515
**B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)? | <ul><li>[Personal and financial data for more than 146 million people was stolen in Equifax data breach.](https://www.nbcnews.com/news/us-news/equifax-breaks-down-just-how-bad-last-year-s-data-n872496)</li><li>[Cambridge Analytica harvested private information from over 50 million Facebook profiles without users' permission.](https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html)</li><li>[AOL accidentally released 20 million search queries from 658,000 customers.](https://www.wired.com/2006/08/faq-aols-search-gaffe-and-you/)</li></ul>
1616
**B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed? | <ul><li>[The EU's General Data Protection Regulation (GDPR) includes the "right to be forgotten."](https://www.eugdpr.org/the-regulation.html)</li></ul>

docs/docs/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ Options:
151151
- [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
152152
- [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
153153
- [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
154-
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
154+
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
155155

156156
## B. Data Storage
157157
- [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

examples/ethics.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ <h2>
4141
<strong>
4242
A.4 Downstream bias mitigation:
4343
</strong>
44-
Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
44+
Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
4545
</li>
4646
</ul>
4747
<br/>

examples/ethics.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{"nbformat": 4, "nbformat_minor": 2, "metadata": {}, "cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Data Science Ethics Checklist\n", "\n", "[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)\n", "\n", "## A. Data Collection\n", " - [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?\n", " - [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?\n", " - [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?\n", " - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?\n", "\n", "## B. Data Storage\n", " - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?\n", " - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?\n", " - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?\n", "\n", "## C. Analysis\n", " - [ ] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?\n", " - [ ] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?\n", " - [ ] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?\n", " - [ ] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?\n", " - [ ] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?\n", "\n", "## D. Modeling\n", " - [ ] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?\n", " - [ ] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?\n", " - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?\n", " - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?\n", " - [ ] **D.5 Communicate bias**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?\n", "\n", "## E. Deployment\n", " - [ ] **E.1 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?\n", " - [ ] **E.2 Roll back**: Is there a way to turn off or roll back the model in production if necessary?\n", " - [ ] **E.3 Concept drift**: Do we test and monitor for concept drift to ensure the model remains fair over time?\n", " - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?\n", "\n", "*Data Science Ethics Checklist generated with [deon](http://deon.drivendata.org).*\n"]}]}
1+
{"nbformat": 4, "nbformat_minor": 2, "metadata": {}, "cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Data Science Ethics Checklist\n", "\n", "[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)\n", "\n", "## A. Data Collection\n", " - [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?\n", " - [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?\n", " - [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?\n", " - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?\n", "\n", "## B. Data Storage\n", " - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?\n", " - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?\n", " - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?\n", "\n", "## C. Analysis\n", " - [ ] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?\n", " - [ ] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?\n", " - [ ] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?\n", " - [ ] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?\n", " - [ ] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?\n", "\n", "## D. Modeling\n", " - [ ] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?\n", " - [ ] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?\n", " - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?\n", " - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?\n", " - [ ] **D.5 Communicate bias**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?\n", "\n", "## E. Deployment\n", " - [ ] **E.1 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?\n", " - [ ] **E.2 Roll back**: Is there a way to turn off or roll back the model in production if necessary?\n", " - [ ] **E.3 Concept drift**: Do we test and monitor for concept drift to ensure the model remains fair over time?\n", " - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?\n", "\n", "*Data Science Ethics Checklist generated with [deon](http://deon.drivendata.org).*\n"]}]}

examples/ethics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
- [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
77
- [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
88
- [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
9-
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
9+
- [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
1010

1111
## B. Data Storage
1212
- [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

examples/ethics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ A. Data Collection
1010
* [ ] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
1111
* [ ] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
1212
* [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
13-
* [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
13+
* [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
1414

1515
B. Data Storage
1616
---------

examples/ethics.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ A. Data Collection
44
* A.1 Informed consent: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
55
* A.2 Collection bias: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
66
* A.3 Limit PII exposure: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
7-
* A.4 Downstream bias mitigation: Have we considered ways to enable testing downstream results for biased outcomes (e.g. collecting data on protected group status like race or gender)?
7+
* A.4 Downstream bias mitigation: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?
88

99
B. Data Storage
1010
* B.1 Data security: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

0 commit comments

Comments
 (0)