User:Nadiners/ Thesis draft

From XPUB & Lens-Based wiki

7 feb

"The safest sex is no sex" quoted from a weekly sex ed class by teacher from secondary school

https://www.mediawiki.org/wiki/Manual:Preventing_access#Restrict_viewing_of_certain_specific_pages

Intro/Chapter 1?

‘To Unpublish’: As I type this word my autocorrect software does not recognise it. It is a new verb introduced in the online Oxford Dictionary:

       "Make (content that has previously been published online) unavailable to the public.”

Proper usage of the word is illustrated with two telling and relevant examples:

  • Once the images have been published on the internet it will be practically impossible for any court order to unpublish them.’
  • ‘After an outcry on Twitter, the magazine unpublished the column, but the editors at the blog Retraction Watch managed to find a cached version, reminding us all that the internet never forgets.’

In our daily lives, we constantly now publish content. This is often intentional, as a way to express ourselves. However, just as often we unknowingly leave a trail of digital content which automatically disseminates into the network. We lose track of what both where we have been, and what we have told the world about ourselves. To unpublish is to try to retake control. We actively look to remove a specific content from the internet, to limit what others can know about us. The dictionary examples remind us that this is easier said than done.

Before proceeding, I'd like to make a distinction between publishing and disseminating data. Data is something given, it is raw, a symbol. Not all data is published. Our privately held medical records are not public in the manner of our Facebook posts. The data that we produce only becomes publishing once it has been analysed and dissipated through the network into the public sphere. Publishing, thus, is about making knowledge public (to the people).

As digital memory storage expands, the trail of data that we produce, and ultimately publish, is inexhaustibly growing. Once it reaches the people and comes out as knowledge, it becomes more meaningful, and much harder to forget, let alone to destroy.

If we break it down even more, typically knowledge is defined in terms of information, and information in terms of data (first specified in detail by R. L. Ackoff in 1988 DIKW). Data, from 'Datum' in Latin means 'something given'. Data constantly gives itself, disseminates itself, but how do you get data back? David Thorne's email thread with Jane illustrates the essence of digital information.


               From: Jane Gilles
               Date: Wednesday 8 Oct 2008 12.19pm
               To: David Thorne
               Subject: Overdue account
                                       
               Dear David,                                                                                  
               Our records indicate that your account is overdue by the amount of $233.95. 
               If you have already made this payment please contact us within the next 7 days
               to confirm payment has been applied to your account and is no longer outstanding.
                                                               
               Yours sincerely, Jane Gilles
                                                                                
               From: David Thorne
               Date: Wednesday 8 Oct 2008 12.37pm
               To: Jane Gilles
               Subject: Re: Overdue account
                                       
               Dear Jane,
               
               I do not have any money so am sending you this drawing I did of a spider instead. 
               I value the drawing at $233.95 so trust that this settles the matter.
                                                                
               Regards, David.
                                       
                                       
               From: Jane Gilles
               Date: Thursday 9 Oct 2008 10.07am
               To: David Thorne
               Subject: Overdue account
                                       
               Dear David,
               
                Thank you for contacting us. Unfortunately we are unable to accept drawings 
                as payment and your account remains in arrears of $233.95. Please contact us 
                within the next 7 days to confirm payment has been applied to your account 
                and is no longer outstanding.
               
               Yours sincerely, Jane Gilles
               
               From: David Thorne
               Date: Thursday 9 Oct 2008 10.32am
               To: Jane Gilles
               Subject: Re: Overdue account
                                       
              Dear Jane,
              
              Can I have my drawing of a spider back then please.
              
              Regards, David.
              
              From: Jane Gilles
              Date: Thursday 9 Oct 2008 11.42am
              To: David Thorne
              Subject: Re: Re: Overdue account
                                       
              Dear David,
              
              You emailed the drawing to me. Do you want me to email it back to you?
              
              Yours sincerely, Jane Gilles
              
              From: David Thorne
              
              Date: Thursday 9 Oct 2008 11.56am
              To: Jane Gilles
              Subject: Re: Re: Re: Overdue account
              
              Dear Jane,
              
              Yes please.
              
              Regards, David.


The dialogue continues, with Jane insisting that she has returned the original spider: "I copied and pasted it from the email you sent me". But of course, there is no original spider in the way that there would be an original paper drawing. And with every new email in the chain, the spider is replicated again. This is how data is. Once data is created it can only disseminate, the very idea of receiving your data back is absurd. The very nature of data makes it impossible to be deleted. For any human to try to get rid of it, they would be working against the force of nature.

The intention to unpublish is not in order to delete or destroy anything, the purpose is to limit the access to which public can view the information. An individual wishing to remove a previously published content might do so to protect their privacy and identity. Or a large corporation might want to remove content as censorship, in order to protect its reputation, or its commercial success. There are many reasons why someone might want to unpublish.

Nonetheless, the very nature of data can make unpublishing unviable. Once knowledge is public, then even if its destruction is in principle possible, the process of unpublishing becomes harder. If "knowledge" has already spread through the network, limiting its access won’t stop it from spreading.

Chapter 2?

distinguishing between publishing and data. publishing is social.

There are many reasons why deleting data is, in practice, impossible. For data to be truly deleted, you must physically destroy the hardware because recoveries can always be made. For data that is on the internet, meaning in someone else's server, it would be impossible to physically go there are destroy the hardware with the specific unwanted content. For example, to destroy hardware, one would need a plan as sophisticated as the one described in the tv series Mr. Robot - in which Elliot Alderson, a cybersecurity engineer and hacker, is recruited by "Mr. Robot", to join a group of hacktivists called "fsociety". The group aims to destroy all debt records by encrypting the financial data of the largest conglomerate in the world, E Corp.

Even if the actual content you want to be deleted, is in fact deleted, its metadata will live on. This is a record of when, where, and by whom a specific piece of content was deleted. Metadata is sometimes more telling than the actual content itself.

Dissection: unpublishing a wiki page.

From the Media Wiki usage guide:

To prevent anyone but sysops from viewing a page, it can simply be deleted. To prevent even sysops from viewing it, it can be removed more permanently with the Oversight extension. To completely destroy the text of the page, it can be manually removed from the database. In any case, the page cannot be edited while in this state, and for most purposes no longer exists.

1. The user is offered the possibility to "delete" however in fact what occurs is a change of access, in that users classified as editors (sysops) still have access to the content. 2. In order to actually delete they first recommend an extension and then to "completely destroy" the page recommend "manually removing from the database"

"manual" is an interesting term as it marks more a relative level of technical access. Not literally.. shift of technical access. eg use of a specialised tool such as pho myadmin or a command line.

Media wiki

the above discussion is part of the "restrict viewing" of certain pages suggestion in a FAQ style about how to restrict viewing access to a page, so in fact suggesting "deletion" as a workaround that the system doesn't provide deletion. whereas you can interpret this as a lack of the media wiki software itself, in fact it is very representative of the complexities of many digital systems and the ambiguities of the term "to delete".

notes Dissection : find more examples of deletion - the delete button - interface is a promise starts from a social desire. - if you want to destroy the wiki database, you can't, it has social impact. - when you look at the database it's 40 something tables of which 9 relate to "pages" and the content of the page is spread between tables names: revision, archive and text etc. showing the complexity, deleteing is actually nearly impossible .. each layer becomes increasingly requiring technical knowloedge. nuances, different levels, responsibilty because its a shared subject.. if you remove one row from a table you might take down the entire system. admins hide complicated multilayered structure friction between the social and the technical inherent complxity based on the speed and scale of the shared digital networks (the internet)


Here another example of unpublishing with git, how many layers there are in order to really remove, and a reminder not to commit at the bottom. https://help.github.com/articles/removing-sensitive-data-from-a-repository/ key points:

  • 9 step process
  • another tool to fix it/do it better
  • if warning about github compromise git itself is potentially self contained, github is a social network
  • distinction online offline.. proprietary system, dont have access to all layers
  • final precaution: Avoiding accidental commits in the future!!

facebook:

https://www.facebook.com/help/250563911970368?helpref=hc_global_nav Deactivating or Deleting Your Account What's the difference between deactivating and deleting my account? If you deactivate your account: You can reactivate whenever you want. People can't see your timeline or search for you. Some info may remain visible (example: messages you sent). If you delete your account: You can't regain access once it's deleted. We delay deletion a few days after it's requested. A deletion request is cancelled if you log back into your Facebook account during this time. It may take up to 90 days to delete data stored in backup systems. Your info isn't accessible on Facebook during this time. Some things aren't stored in your account. For example, a friend may have messages from you after deletion. Copies of some material (example: log records) may remain in our database but are disassociated from personal identifiers.


Another backlash effect is very common. When you really try to take something off the internet, you will in fact being more attention to what you are trying to delete. This is called the Streisand effect. It "is the phenomenon whereby an attempt to hide, remove, or censor a piece of information has the unintended consequence of publicizing the information more widely, usually facilitated by the Internet" (wiki). Newspapers like the Guardian regularly report on failed attempts by some individuals to have sensitive information redacted. And even though Google now removes some search results from its listings, it tells you that it is has done so. To a determined bloodhound, this can function like a trail of blood.

All of these obstacles to unpublishing are exacerbated by the growing army of trolls who, in the name of free speech, are willing effortfully disrupt others' attempts to control their data trail. This can be done with good intent (perhaps as in the early days of WikiLeaks. But often it is not. One of the greatest obstacles to countering revenge porn attacks is the willingness of others, found lurking on 4chan or in Reddit forums, to replicate and reupload images of the innocent victims whom they have never met.

Given these challenges to unpublishing, if you'd like something to be unpublished, deleting it is rarely an option. The only way to divert attention from it, is to carry on producing content. In the past, censorship worked by blocking the flow of information. In the twenty-first century censorship works by flooding people with irrelevant information. (Homo Deus, Harrari)

Chapter 3?

The difference between data and publishing is one that relates to humans and machines. For data to sustain and propagate, it no longer necessitates human creation, consumption or involvement. For example, James Bridle explains in an [article] about youtube videos for kids the concern where the videos are being automatically created with algorithms and as a result turn out to be incredibly creepy. "A huge number of these videos are essentially created by bots and viewed by bots, and even commented on by bots." "What I find somewhat disturbing about the proliferation of even (relatively) normal kids videos is the impossibility of determining the degree of automation which is at work here; how to parse out the gap between human and machine."

Humans just as well create disturbing content of course, of which I have many examples I won't be showing. But the main difference is that their aim is to reach an audience when they publish.

We can all agree there is an excess of content in the world. Every interface connects and collects data from anything to anywhere. As long as data enters the network, it will remain there. Whether it will eventually reach a human to become knowledge is no longer even important. The data feeds the algorithms and the human agent is not relevant any longer.

"William Binney, a former technical director of the NSA, told a British parliamentary committee in January 2016 that bulk collection of communications data was “99 percent useless”. “This approach costs lives, and has cost lives in Britain because it inundates analysts with too much data.” This “approach” is the collect-it-all, brute-force process of quantified thinking. “More information” does not produce “more truth”, it endangers it." (James Bridle)

So when does this excess of information become significant to humans? when we can feel it infiltrating our personal lives. When suddenly one bit of information bugs us. Only then does an individual pay attention to the unwanted "knowledge", which is not necessarily truth, knowledge becomes fact, and source of verification becomes insignificant. Most importantly it will keep on propagating through the network.


some more words?

Attempts at unpublishing are being tried through laws such as "the right to be forgotten": a concept implemented in the EU (and Argentina) that creates a lot controversy, on one hand it is a human right to be in control of one's access to their own information, the right to privacy. On the other hand some people believe it might endanger the freedom of expression.

To quote Borris Beaude from his essay The Ends of the Internet "In our present age, no matter which principles are upheld or which rights are enshrined in law, no society in the world grants an absolute freedom of expression.... In Europe, besides security and copyright, respect for human dignity is also usually considered to take precedence over freedom of expression. Even though the EU and the United Nations defend freedom of expression worldwide as a precondition for democracy, they also have set limits to this freedom."

Freedom of expression seems to be an idealistic concept, however there are always exception and it could never truly work. The moment anyone expresses anything, there will be reason for someone else to find fault within the 3.2 billion people who have access to the internet. This meme "haters gonna hate" concludes it beautifully.


“radical transparency” or “ultimate transparency”. The theory holds that the sunshine of sharing our intimate details will disinfect the moral mess of our lives. With the looming threat that our embarrassing information will be broadcast, we’ll behave better. (The Guardian)



The structure

1- Introduction:
  • 1.1 Setting the scene
  • 1.2 Introducing the problem
  • 1.3 short explanation of how you will solve this problem*
  • 1.4 Short explanation of why this is important*

2- The Problem, in detail

  • 2.1 Why content cannot be deleted
  • 2.2 How/why the nature of the internet is that data replicates
  • 2.3 Examples and support of this in the literature

3- Access/Laws/regulations

  • 3.1 how people have tried to solve this problem before
  • 3.2 examples of this/ evidence from literature
  • 3.3 why this doesn't actually work
  • 3.4 examples of how/why it doesn't work and evidence from literature

4- CLAIM: It is impossible to truly unpublish

  • 4.1 recap in a sentence chapters 1-3
  • 4.2 expand on what is this claim means (be precise here) [5]
  • 4.3 why this matters

5- Summary (conclusion)

  • 5.1 recap whole argument
  • 5.2 explain again, in slightly different terms your contribution to the debate (that it is infact impossible to unpublish)
  • 5.3 suggest/sketch out future avenues to address the problem


DUMP

The Nature of information is impossible to Unpublish

Abstract

In my thesis I will discuss how the very nature of information does not allow itself to disappear. When talking about content that goes on the internet, we tend to call it publishing, what I'm proposing is that the reverse: 'Unpublishing' is not possible. I will begin with comparing data (the raw material of publishing) to energy, both their natural courses, as they are being created can be either stored or disseminated. Today we are living in a data-centric world, we are constantly leaving trails of data wether it's voluntarily or unconsciously. This will lead me to talk about how data, from anything living or material, is constantly feeding the algorithms on the network. This then defies the purpose of publishing, if the data is made to grow the web, and it is no longer serving the people (gaining knowledge), but the people are serving the algorithm, then what does publishing become? There is an overproduction of data, and with the ease of data disseminating there is inevitably an excess problem. It's all being stored in huge data centres, what do we do with the junk? If our bodies carry approximately 90% of useless DNA, must there be a natural reason for this. Is nature telling us again that we can't get rid of data? Does the question then become similar to what do we do with nuclear waste? As destruction is beyond our capabilities, we just have to find a way to store it in a harmless way and make it completely inaccessible. Following accessibility in the next chapter, and considering we cannot delete, destroy or unpublish information, we are still trying so hard to block access and regulate the flow. I will give various examples here on how Net Neutrality, Laws such as 'the right to be forgotten' or content moderators, are trying to regulate the web, for better or for worse they are working against the force of nature. It does not however mean we should give up, there are some data we can control and put away or that can exist beside the network.

intro

Read this to begin: Overdue Account

Data, from 'Datum' in Latin means 'something given'. Data constantly gives, but how do you give data back? David Thorne's email thread with Jane illustrates the essence of digital information. Once data is created it can only disseminate, the very idea of receiving your data back is absurd.


Chapter 1 - the nature of information

Before entering the topic I'd like to introduce the DIKW classification model which represents the relationships between the follwing:

  • Data : raw symbols or signs
  • Information : data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions
  • knowledge : application of data and information; answers "how" questions
  • Wisdom : evaluated understanding. (optional)

(first specified in detail by R. L. Ackoff in 1988) Typically information is defined in terms of data, knowledge in terms of information, and wisdom in terms of knowledge.

To Publish: is to make knowledge public. Public comes from the latin 'populus' from latin meaning 'people', so to publish is to give knowledge to people.

If we agree on these definitions, then data, the raw segments, that are available to the public are not considered as publishing.

Chapter 2 - Feeding the algorithm rather than publishing

Chapter 3 - excess problem overproduction .. what to do with the junk

Chapter 3 or 4 - Access and regulation