Latest revision as of 08:57, 27 March 2018

Scanning the Database

On Feminism, Bias in Data and Classification

data, classification, algorithm, bias, database, feminism, narrative, scanning, selecting, reproduzierbar, artificial intelligence (sorted by relevance)

Scanning means reproducing, but what exactly is being reproduced? The reproduced is being reproduced. The book is a reproduction of a manuscript → the scan is a reproduction of the book → the website is a reproduction of the scan → this website might than be cited again in a book. If a reference is quoted again and again this seems to indicate a certain quality. But this quality is only based on quantity and not on the actual quality of the source. This is how bias spreads. This reader is about selecting data and classification. Scanning is also a practice of reproduction and classification. When scanning we also have to take these questions into account: What do we scan? What do we select? Is it visible that it is only a selection?

Revealing data

Data is invisible • how we interact with the world is determined �by data that is often hidden and not accessible. As soon as a machine learning algorithm is trained, the data behind it becomes invisible. But the algorithm still acts according to the training set while pretending to be a exhaustive, autonomous and even an objective machine. Data nowadays means a lot of power. But in this data there is a certain bias. If this bias is being ignored the data can be threatening.

Classifying algorithms

The datasets which machine learning algorithms are built with are still created by humans and with them come opinions, prejudice, political attitude, etc. → bias. What they do is classify data, for instance they select if something is being included or excluded, they decide if something is good or bad. And so do the algorithms following those very same rules.

Classifying also means: what articles did I include, which did I exclude in this reader? This reader is built upon feminist methodologies. Amongst others the texts are being selected using quantitative and qualitative feminist methods. Trying to balance gender equality reaching from the authors to the type designers. How does this change the perception of the topic. What if I didn’t mention it? → By revealing my personal »algorithm« (methodologies) how I built this »dataset« (reader) I try to provide transparency. I know and indicate: this reader is biased.

Included Texts

1 Bias in machine learning, and how to stop it Hope Reese

2 Microsoft’s racist chatbot returns with drug-smoking Twitter meltdown Samuel Gibbs

3 Women’s Ways of Structuring Data Christine L. Masters

4 Sorting Things Out Geoffrey C. Bowker Susan Leigh Star

5 Database as Symbolic Form Lev Manovich

6 Das Kunstwerk im Zeitalter seiner technischen Reproduzierbarkeit Walter Benjamin

7 Weapons of math destruction How big data increases inequality and threatens democracy Cathy O’Neil

8 Inspecting Algorithms for Bias Matthias Spielkamp

Format

The Reader comes in the form of 3 folded sheets with small typography on it, in the manner of a package insert. This Reader accompanies the Database book just like an insert of medication does.

@@ Line 1: / Line 1: @@
 =Scanning the Database=
+[[File:IMG 6817.JPG|thumbnail]]
+Scanning the Database
+On Feminism, Bias in Data and Classification
+data, classification, algorithm, bias, database, feminism, narrative, scanning, selecting,
+reproduzierbar, artificial intelligence
+(sorted by relevance)
+Scanning means reproducing, but what exactly is being reproduced? The reproduced is being reproduced. The book is a reproduction of a manuscript → the scan is a reproduction of the book → the website is a reproduction of the scan → this website might than be cited again in a book. If a reference is quoted again and again this seems to indicate a certain quality. But this quality is only based on quantity and not on the actual quality of the source. This is how bias spreads. This reader is about selecting data and classification.
+Scanning is also a practice of reproduction and classification. When scanning we also have to take these questions into account: What do we scan? What do we select? Is it visible that it is only a selection?
+Revealing data
+Data is invisible • how we interact with the world is determined �by data that is often hidden and not accessible. As soon as a machine learning algorithm is trained, the data behind it becomes invisible. But the algorithm still acts according to the training set while pretending to be a exhaustive, autonomous and even an objective machine.
+Data nowadays means a lot of power. But in this data there is a certain bias. If this bias is being ignored the data can be threatening.
+Classifying algorithms
+The datasets which machine learning algorithms are built with are still created by humans and with them come opinions, prejudice, political attitude, etc. → bias. What they do is classify data, for instance they select if something is being included or excluded, they decide if something is good or bad. And so do the algorithms following those very same rules.
+Classifying also means: what articles did I include, which did I exclude in this reader? This reader is built upon feminist methodologies. Amongst others the texts are being selected using quantitative and qualitative feminist methods. Trying to balance gender equality reaching from the authors to the type designers. How does this change the perception of the topic. What if I didn’t mention it? → By revealing my personal »algorithm« (methodologies) how I built this »dataset« (reader) I try to provide transparency.
+I know and indicate: this reader is biased.