User:Nadiners/ Thesis draft feb: Difference between revisions

From XPUB & Lens-Based wiki
Line 142: Line 142:
=== Chapter 3? ===
=== Chapter 3? ===


The difference between data and publishing is one that relates to humans and machines.
For data to sustain and propagate, it no longer necessitates human creation, consumption or involvement.  For example, James Bridle explains in an [[https://medium.com/@jamesbridle/something-is-wrong-on-the-internet-c39c471271d2 | article]] about youtube videos for kids the concern where the videos are being automatically created with algorithms and as a result turn out to be incredibly creepy. [<this needs a description, a story – lead with specific stories that serve as examples]
"A huge number of these videos are essentially created by bots and viewed by bots, and even commented on by bots." [<<talk more about what is happening here. Is this the total automation of publishing? Is the need for a human audience is eliminated? If so, how so? How does a machine make decisions about content for instances?]
"What I find somewhat disturbing about the proliferation of even (relatively) normal kids videos is the impossibility of determining the degree of automation which is at work here; how to parse out the gap between human and machine." [where are these quotes from? Give examples of how they are ‘creepy’ what happens in them?, what do they look like?, illustrate lavishly]


Humans just as well create disturbing content of course, of which I have many examples I won't be showing. But the main difference is that their aim is to reach an audience when they publish.


We can all agree there is an excess of content in the world. Every interface connects and collects data from anything to anywhere. As long as data enters the network, it will remain there. Whether it will eventually reach a human to become knowledge is no longer even important. The data feeds the algorithms and the human agent is not relevant any longer.  
The difference between data and publishing is one that relates to humans and machines. For data to sustain and propagate, it no longer necessitates human creation, consumption or involvement. For example, James Bridle explains in his [[https://medium.com/@jamesbridle/something-is-wrong-on-the-internet-c39c471271d2 | article]] “Something is wrong on the Internet” discussing youtube videos for kids and the concern of these videos being automatically created by bots,(viewed and even commented on by bots) and as a result turn out to be incredibly creepy.
I will take his example of The Finger Family videos https://www.youtube.com/watch?time_continue=86&v=3xqqj9o7TgA which is already disturbing enough for an adult to watch. Bridle then finds the following video entitled “Wrong Heads Disney Wrong Ears Wrong Legs Kids Learn Colors Finger Family 2017 Nursery Rhymes” suspiciously sounds automated, and when you view the videos they look as automated as they sound: https://www.youtube.com/watch?v=D52hg9ogvWc
“I have no idea where the “Wrong Heads” trope originates, but I can imagine, as with the Finger Family Song, that somewhere there is a totally original and harmless version that made enough kids laugh that it started to climb the algorithmic rankings until it made it onto the word salad lists, combining with Learn Colors, Finger Family, and Nursery Rhymes, and all of these tropes — not merely as words but as images, processes, and actions — to be mixed into what we see here.”
 
This is content production in the age of algorithmic discovery.
many of the videos he describes in his article have been removed, and as a result Bridle modifies his article to mention these videos removal. Interestingly they carry on living in his description and screenshots.
So if the creation and dissemination of these videos becomes completely autonomous, and the audience is no longer human, the use of the word “published” no longer makes sense. However, if the audience of these automated videos is human, then we reach a grey area. 
 
"What I find somewhat disturbing about the proliferation of even (relatively) normal kids videos is the impossibility of determining the degree of automation which is at work here; how to parse out the gap between human and machine." [where are these quotes from? Give examples of how they are ‘creepy’ what happens in them? what do they look like?, illustrate lavishly]
 
Humans just as well create disturbing content of course, of which I have many examples I won't be showing. But the main difference is that their aim is to reach an audience when they publish.
 
We can all agree there is an excess of content in the world. Every interface connects and collects data from anything to anywhere. As long as data enters the network, it will remain there. Whether it will eventually reach a human to become knowledge is no longer even important. The data feeds the algorithms and the human agent is not relevant any longer.
 


"William Binney, a former technical director of the NSA, told a British parliamentary committee in January 2016 that bulk collection of communications data was “99 percent useless”. “This approach costs lives, and has cost lives in Britain because it inundates analysts with too much data.” This “approach” is the collect-it-all, brute-force process of quantified thinking. “More information” does not produce  
"William Binney, a former technical director of the NSA, told a British parliamentary committee in January 2016 that bulk collection of communications data was “99 percent useless”. “This approach costs lives, and has cost lives in Britain because it inundates analysts with too much data.” This “approach” is the collect-it-all, brute-force process of quantified thinking. “More information” does not produce  

Revision as of 01:13, 23 February 2018

Intro/Chapter 1?

‘To Unpublish’: As I type this word my autocorrect software does not recognise it. It is a new verb introduced in the online Oxford Dictionary:

       "Make (content that has previously been published online) unavailable to the public.”

Proper usage of the word is illustrated with two telling and relevant examples:

  • Once the images have been published on the internet it will be practically impossible for any court order to unpublish them.’
  • ‘After an outcry on Twitter, the magazine unpublished the column, but the editors at the blog Retraction Watch managed to find a cached version, reminding us all that the internet never forgets.’

In our daily lives, we constantly now publish content. This is often intentional, as a way to express ourselves. However, just as often we unknowingly leave a trail of digital content which automatically disseminates into the network. We lose track of what both where we have been, and what we have told the world about ourselves. To unpublish is to try to retake control. We actively look to remove a specific content from the internet, to limit what others can know about us. The dictionary examples remind us that this is easier said than done.

Before proceeding, I'd like to make a distinction between publishing and disseminating data. Data is something given, it is raw, a symbol. Not all data is published. Our privately held medical records are not public in the manner of our Facebook posts. The data that we produce only becomes publishing once it has been analysed and dissipated through the network into the public sphere. Publishing, thus, is about making knowledge intentionally public (to the people). It can vary from writing a novel which is published in a book form, or writing your love for someone in a public toilet.

As digital memory storage expands, the trail of data that we produce, and ultimately publish, is inexhaustibly growing. Once it reaches the people and comes out as knowledge, it becomes more meaningful, and much harder to forget, let alone to destroy. [<is this where data becomes information?]

If we break it down even more, typically knowledge is defined in terms of information, and information in terms of data (first specified in detail by R. L. Ackoff in 1988 DIKW). Data, from 'Datum' in Latin means 'something given'. Data constantly gives itself, disseminates itself, but how do you get data back? David Thorne's email thread with Jane illustrates the essence of digital information.


               From: Jane Gilles
               Date: Wednesday 8 Oct 2008 12.19pm
               To: David Thorne
               Subject: Overdue account
                                       
               Dear David,                                                                                  
               Our records indicate that your account is overdue by the amount of $233.95. 
               If you have already made this payment please contact us within the next 7 days
               to confirm payment has been applied to your account and is no longer outstanding.
                                                               
               Yours sincerely, Jane Gilles
                                                                                
               From: David Thorne
               Date: Wednesday 8 Oct 2008 12.37pm
               To: Jane Gilles
               Subject: Re: Overdue account
                                       
               Dear Jane,
               
               I do not have any money so am sending you this drawing I did of a spider instead. 
               I value the drawing at $233.95 so trust that this settles the matter.
                                                                
               Regards, David.
                                       
                                       
               From: Jane Gilles
               Date: Thursday 9 Oct 2008 10.07am
               To: David Thorne
               Subject: Overdue account
                                       
               Dear David,
               
                Thank you for contacting us. Unfortunately we are unable to accept drawings 
                as payment and your account remains in arrears of $233.95. Please contact us 
                within the next 7 days to confirm payment has been applied to your account 
                and is no longer outstanding.
               
               Yours sincerely, Jane Gilles
               
               From: David Thorne
               Date: Thursday 9 Oct 2008 10.32am
               To: Jane Gilles
               Subject: Re: Overdue account
                                       
              Dear Jane,
              
              Can I have my drawing of a spider back then please.
              
              Regards, David.
              
              From: Jane Gilles
              Date: Thursday 9 Oct 2008 11.42am
              To: David Thorne
              Subject: Re: Re: Overdue account
                                       
              Dear David,
              
              You emailed the drawing to me. Do you want me to email it back to you?
              
              Yours sincerely, Jane Gilles
              
              From: David Thorne
              
              Date: Thursday 9 Oct 2008 11.56am
              To: Jane Gilles
              Subject: Re: Re: Re: Overdue account
              
              Dear Jane,
              
              Yes please.
              
              Regards, David.


The dialogue continues, with Jane insisting that she has returned the original spider: "I copied and pasted it from the email you sent me". But of course, there is no original spider in the way that there would be an original paper drawing. And with every new email in the chain, the spider is replicated again. This is how data is. Once data is created it can only disseminate, the very idea of receiving your data back is absurd. The very nature of data makes it impossible to be deleted. For any human to try to get rid of it, they would be working against the force of nature.

The intention to unpublish is not in order to delete or destroy anything, the purpose is to limit the access to which public can view the information. An individual wishing to remove a previously published content might do so to protect their privacy and identity. Or a large corporation might want to remove content as censorship, in order to protect its reputation, or its commercial success. There are many reasons why someone might want to unpublish.

Nonetheless, the very nature of data can make unpublishing unviable. Once knowledge is public, then even if its destruction is in principle possible, the process of unpublishing becomes harder. If "knowledge" has already spread through the network, limiting its access won’t stop it from spreading.

Chapter 2

There are many reasons why deleting data is, in practice, impossible. For data to be truly deleted, you must physically destroy the hardware because recoveries can always be made. For data that is on the internet, meaning in someone else's server, it would be impossible to physically go there are destroy the hardware with the specific unwanted content. For example, to destroy hardware, one would need a plan as sophisticated as the one described in the tv series Mr. Robot - in which Elliot Alderson, a cybersecurity engineer and hacker, is recruited by "Mr. Robot", to join a group of hacktivists called "fsociety". The group aims to destroy all debt records by encrypting the financial data of the largest conglomerate in the world, E Corp. Elliott’s plots to physically destroy their hardware by hacking the batteries in order to heat up and blow up their data centres. His plan is not physically impossible, yet so complex it seems reality won’t catch up soon enough. As proven, the spy agency GCHQ, made the Guardian editors physically destroy the hardware containing top-secret documents leaked by Edward Snowden. They used “angle-grinders, dremels – a drill with a revolving bit – and masks. The spy agency provided one piece of hi-tech equipment, a "degausser", which destroys magnetic fields, and erases data.” For all this effort and hi-tech equipment, the destruction of the Snowden files would not stop the flow of intelligence-related stories since the documents existed in several jurisdictions.

Even if the actual content you want to be deleted, is in fact deleted, its metadata will live on. This is a record of when, where, and by whom a specific piece of content was deleted. Metadata is sometimes more telling than the actual content itself. So as not to be so meta about metadata, I will enter this in more detail. I will dissect specific examples on how hard it is to truly delete published content. Beginning with Facebook, an opaque platform that until recently didn’t even have a delete button, only “deactivate”. Recently having checked up there now is the delete option, however as expected when you read their FAQ they explain that: “copies of some material may remain in our database…” and “Some of the things you do on Facebook aren’t stored in your account. For example, a friend may still have messages from you even after you delete your account. That information remains after you delete your account.” https://www.facebook.com/help/250563911970368?helpref=hc_global_nav

If we look deeper into unpublishing with git, a tool for developers with a system for tracking changes in files, we can see here just how many layers there are in order to really remove something from this tracking tool. On their help page, they give a 9 step process in order to remove data, they even offer another tool that might help in the beginning. Git is self contained and is offline, though Github is a social network, it is online and a proprietary system in which users don’t have access to all layers. Their final precaution is interesting: “Avoiding accidental commits in the future”, This brings back a conspicuous memory from back at secondary school in a Sex Education class. Every week the whole class was made to repeat out loud what became a saying: “the safest sex is no sex” There is not much space for interpretation here. If you’re going to regret it, don’t do it in the first place. Just don’t do anything, quite the opposite from Nike’s explicit message: “Just do it”. The message was pretty clear, the institution were trying to discourage teenage pregnancy and STDs, basically don’t take risks of transmitting data or diseases. The safest way to unpublish is don’t publish, don’t commit, or let alone push!

Media wiki Unpublishing a wiki page can be shown visually. From the Media Wiki usage guide: https://www.mediawiki.org/wiki/Manual:Preventing_access#Restrict_viewing_of_certain_specific_pages “To prevent anyone but sysops (system operators) from viewing a page, it can simply be deleted. To prevent even sysops from viewing it, it can be removed more permanently with the Oversight extension. To completely destroy the text of the page, it can be manually removed from the database. In any case, the page cannot be edited while in this state, and for most purposes no longer exists.”

  1. The user is offered the possibility to "delete" however in fact what occurs is a change of access, in that users classified as editors (sysops) still have access to the content.
  2. In order to actually delete they first recommend an extension and then to "completely destroy" the page recommend "manually removing from the database"

"manual" is an interesting term as it marks more a relative level of technical access. Not literally.. shift of technical access. eg use of a specialised tool such as php myadmin or a command line.

the above discussion is part of the "restrict viewing" of certain pages suggestion in a FAQ style about how to restrict viewing access to a page, so in fact suggesting "deletion" as a workaround that the system doesn't provide deletion. whereas you can interpret this as a lack of the media wiki software itself, in fact it is very representative of the complexities of many digital systems and the ambiguities of the term "to delete".

the delete button interface is a promise starts from a social desire. \

if you want to destroy the wiki database, you can't, it has social impact.

when you look at the database it's 40 something tables of which 9 relate to "pages" and the content of the page is spread between tables names: revision, archive and text etc. showing the complexity, deleting is actually nearly impossible .. each layer becomes increasingly requiring technical knowledge. nuances, different levels, responsibility because its a shared subject.. if you remove one row from a table you might take down the entire system. admins hide complicated multilayered structure friction between the social and the technical inherent complexity based on the speed and scale of the shared digital networks (the internet).


Another backlash effect is very common. When you attempt to remove, hide, censor (or unpublish) a certain content, you will in fact being more attention and consequently publicise more widely what you are trying to delete with the friendly help of the internet. This is called the Streisand effect. “The term alluded to Barbra Streisand, who had sued photographer Kenneth Adelman and Pictopia.com for violation of privacy. The US$50 million lawsuit endeavoured to remove an aerial photograph of Streisand's mansion from the publicly available collection of 12,000 California coastline photographs. Adelman photographed the beachfront property to document coastal erosion as part of the California Coastal Records Project, which was intended to influence government policymakers. Before Streisand filed her lawsuit, "Image 3850" had been downloaded from Adelman's website only six times; two of those downloads were by Streisand's attorneys. As a result of the case, public knowledge of the picture increased exponentially; more than 420,000 people visited the site over the following month. The lawsuit was dismissed and Streisand was ordered to pay Adelman's legal fees, which amounted to $155,567.” (wiki) Newspapers regularly report on failed attempts by some individuals to have sensitive information redacted. An example from the New Yorker: (https://www.newyorker.com/magazine/2014/09/29/solace-oblivion) of an American girl named Nikki died in a car accident, where a couple of employees working on the scene admitted to have taken and shared images of the moment, because of the scale of horror, the images were successfully circulating the internet, for pure shock value. When Nikki’s father inevitably desired to have these photos removed from the internet not only was he told “Don’t worry, it’ll blow over” he didn’t even have a law to protect his “right to be forgotten” as they do in the EU. And even though Google now removes some search results from its listings, it tells you that it is has done so. To a determined bloodhound, this can function like a trail of blood.

All of these obstacles to unpublishing are exacerbated by the growing army of trolls who, in the name of free speech, are willing effortfully disrupt others' attempts to control their data trail. This can be done with good intent (perhaps as in the early days of WikiLeaks. But often it is not. One of the greatest obstacles to countering revenge porn attacks is the willingness of others, found lurking on 4chan or in Reddit forums, to replicate and re-upload images of the innocent victims whom they have never met. [<this can be unpacked, there are stories to be told (examples to be given) about each of these instances]

Given these challenges to unpublishing, if you'd like something to be unpublished, deleting it is rarely an option. The only way to divert attention from it, is to carry on producing content. In Homo Deus, Yuval Noah Harari points out “In the past, censorship worked by blocking the flow of information. In the twenty-first century censorship works by flooding people with irrelevant information.’’ Which leads to the common Distributed Denial of Service (DDoS) attack, is one way to bring a site offline or create a disturbance in the network by flooding the system with traffic from multiple services. It is a way to silence websites they might disagree with or disrupt an organization’s online operations.

Chapter 3?

The difference between data and publishing is one that relates to humans and machines. For data to sustain and propagate, it no longer necessitates human creation, consumption or involvement. For example, James Bridle explains in his [| article] “Something is wrong on the Internet” discussing youtube videos for kids and the concern of these videos being automatically created by bots,(viewed and even commented on by bots) and as a result turn out to be incredibly creepy. I will take his example of The Finger Family videos https://www.youtube.com/watch?time_continue=86&v=3xqqj9o7TgA which is already disturbing enough for an adult to watch. Bridle then finds the following video entitled “Wrong Heads Disney Wrong Ears Wrong Legs Kids Learn Colors Finger Family 2017 Nursery Rhymes” suspiciously sounds automated, and when you view the videos they look as automated as they sound: https://www.youtube.com/watch?v=D52hg9ogvWc “I have no idea where the “Wrong Heads” trope originates, but I can imagine, as with the Finger Family Song, that somewhere there is a totally original and harmless version that made enough kids laugh that it started to climb the algorithmic rankings until it made it onto the word salad lists, combining with Learn Colors, Finger Family, and Nursery Rhymes, and all of these tropes — not merely as words but as images, processes, and actions — to be mixed into what we see here.”

This is content production in the age of algorithmic discovery. many of the videos he describes in his article have been removed, and as a result Bridle modifies his article to mention these videos removal. Interestingly they carry on living in his description and screenshots. So if the creation and dissemination of these videos becomes completely autonomous, and the audience is no longer human, the use of the word “published” no longer makes sense. However, if the audience of these automated videos is human, then we reach a grey area.

"What I find somewhat disturbing about the proliferation of even (relatively) normal kids videos is the impossibility of determining the degree of automation which is at work here; how to parse out the gap between human and machine." [where are these quotes from? Give examples of how they are ‘creepy’ what happens in them? what do they look like?, illustrate lavishly]

Humans just as well create disturbing content of course, of which I have many examples I won't be showing. But the main difference is that their aim is to reach an audience when they publish.

We can all agree there is an excess of content in the world. Every interface connects and collects data from anything to anywhere. As long as data enters the network, it will remain there. Whether it will eventually reach a human to become knowledge is no longer even important. The data feeds the algorithms and the human agent is not relevant any longer.


"William Binney, a former technical director of the NSA, told a British parliamentary committee in January 2016 that bulk collection of communications data was “99 percent useless”. “This approach costs lives, and has cost lives in Britain because it inundates analysts with too much data.” This “approach” is the collect-it-all, brute-force process of quantified thinking. “More information” does not produce “more truth”, it endangers it." (James Bridle) [Just to build on the difference between information and data – when someone says “communications data was ‘99 percent useless’”, this is because there is no information in the data] So when does this excess of information become significant to humans? when we can feel it infiltrating our personal lives. When suddenly one bit of information bugs us. Only then does an individual pay attention to the unwanted "knowledge", which is not necessarily truth, knowledge becomes fact, and source of verification becomes insignificant. Most importantly it will keep on propagating through the network.

some more words?

[I think you can build on the stuff below.]

Attempts at unpublishing are being tried through laws such as "the right to be forgotten": a concept implemented in the EU (and Argentina) that creates a lot controversy, on one hand it is a human right to be in control of one's access to their own information, the right to privacy. On the other hand some people believe it might endanger the freedom of expression.

To quote Borris Beaude from his essay The Ends of the Internet "In our present age, no matter which principles are upheld or which rights are enshrined in law, no society in the world grants an absolute freedom of expression.... In Europe, besides security and copyright, respect for human dignity is also usually considered to take precedence over freedom of expression. Even though the EU and the United Nations defend freedom of expression worldwide as a precondition for democracy, they also have set limits to this freedom."

Freedom of expression seems to be an idealistic concept, however there are always exception and it could never truly work. The moment anyone expresses anything, there will be reason for someone else to find fault within the 3.2 billion people who have access to the internet. This meme "haters gonna hate" concludes it beautifully.


“radical transparency” or “ultimate transparency”. The theory holds that the sunshine of sharing our intimate details will disinfect the moral mess of our lives. With the looming threat that our embarrassing information will be broadcast, we’ll behave better. (The Guardian)

Bibliography