User:Manetta/i-could-have-written-that/from-mining-minerals-to-mining-data

From XPUB & Lens-Based wiki

mining

'Data mining' is a fashionable term to speak about the practise of data-analytics. The term became more fashionable since the increasing amounts of data that are (claimed to be? [how to nuance this commonly used statement?]) available online. A phenomenon that has been called 'big data'. By analyzing sets of data, a data mining algorithm searches for patterns that overlap. By comparing the content inside such pattern, the algorithm can predict if some other data will match the pattern, or not. (This is very much a short route to explain what data mining is.)

Though the term 'data mining' is actually not very accurate. When calling data-analyzation software 'data-mining-software' is actually misleading. The term contains the metaphor of 'mining', and it hints that the software is mining for data. Looking at the 'data mining' process shows that it is actually not data that the 'miner' is looking for. It looks for patterns that occur in sets of data. 'Pattern mining' would then be a more accurate term to use.

But that is not enough. 'Pattern mining' could not be done without the presence of data, as that is the material in which the software is looking for patterns. Where does 'data' come from? It is sort of common sense that data is this type of material that is freely available online, and could be downloaded by anyone. Data is actually only data when it is prepared to use it for a specific use. This process of text-processing (when working with written text) contains different layers of simplification and abstraction.

In the academic field there is another term that wraps the full process from text-processing untill publishing results: Knowledge Discovery in Data (KDD), and 'data mining' is only one element of the five in total. (bit more about KDD here)

Knowledge Discovery in Data (KDD)
step 1 --> data collection
step 2 --> data preparation
step 3 --> data mining
step 4 --> interpretation
step 5 --> determine actions

from: mining minerals

Meta-metaphors data-mining.gif

Where does 'mining' overlap or differ from the act of actual 'mining'? What influence does the use of this metaphor has on the image of a data mining practise. Is it misleading?

'mining' differences

mining minerals mining data
natural resource not natural resource
non-renewable constantly renewed
mineral = (commercial) product data = cultural by-product
to mine = to extract to mine = to derive

commodification

Mining is required to obtain any material that cannot be grown through agricultural processes, or created artificially in a laboratory or factory. Mining in a wider sense includes extraction of any non-renewable resource such as petroleum, natural gas, or even water.

Mining operations usually create a negative environmental impact, both during the mining activity and after the mine has closed.

Mineral processing (or mineral dressing) is a specialized area in the science of metallurgy that studies the mechanical means of crushing, grinding, and washing that enable the separation (extractive metallurgy) of valuable metals or minerals from their gangue (waste material).

from: Wikipedia (5th Jan. 2016)


Products that make modern life work. (...) Our major products are aluminium, copper, diamonds, gold, industrial minerals (borates, titanium dioxide and salt), iron ore, thermal and metallurgical coal and uranium.

from: riotinto.com

notes

creating value out of material that used to belong to nobody
analogy to regulations of the amount of oxygen

via: archeology

If data mining could be regarded as 'digging for information' there is a close connection between data mining and archeology, where archeologists search for artefacts that could offer information about the human activities on that location in the past.

Archeology is not focused on the monetization of natural resources, and rather focused on writing history and the sense making of a certain site.

'sense making' differences

archeology data mining
searching for stories searching for classifiers
through artefacts through patterns
non-renewable constantly renewed
artefact = cultural product data = cultural by-product
story = derived mining results = derived
standards as main instrument for quality ?

standards

Standardisation is the main instrument for managing quality. (Simons, 1994) In archeology standardization is not only important for high quality and clean databases, also the quality of definitions, thesauri, processes and procedures are relying on an active management of standards. (Wiemer, 2002)

archeology activities

In the Netherlands there is an official guideline called 'KNA' that commercial, scientific and governmental archeologists need to follow in order to deliver research that complies to the basic qualities. (KNA's table of content)'KNA' stands for 'Dutch Qualitynorm for Archeology'. It contains protocols for desk research, writing a program of requirements, exploratory field research for test trenches, excavation fieldwork, the physical protection of the sites, specialist research, archaeological supervision of construction works, and the registration plus management of the findings. (KNA, 2016)

The KNA also distributes a register of standard codes to describe evry possible artefact-type that can be found. (see here) KNA offers similar lists for materials, time periods, collection method and more.

Thanks to these protocols the archeological process is described accurately, which offers the possibility for someone else to retrace the artefact's roots. This data has become the resource of the artefact. It's also the only resource there is, after an archeologists took the artefact from the ground. And it's the resource from where another archeologist can work to write an alternative history around the artefact.

Archis database

Archis is the Dutch national database for archeological information contains multiple types of informations:

  • data model of what has been captured, how, and how data relates to other data
    • area's of research
    • observations
    • artefacts
    • complexes
    • monuments (valuable complexes)
  • value of archeological complex (according to provincial and national government)
  • filter for differences in observation quality related to the probability of the presence of an archeological complex*
  • quality evaluation of the observation files
* Wiemer gives an example of a location description: 
1/2 hours walk in south-east direction starting at the church of ....
(Wiemer, 2002)

(http://archeologieinnederland.nl/sites/default/files/attachments/Archis%20en%20kwaliteit%20van%20informatie.pdf)

terminology

  • artefacts
  • archeological complex (multiple artefacts on same location)
  • monument (valuable complex(es))

to: mining data

computer science

data science

data as resource

history of 'data mining' as term

wiki page: Data mining

appearances:

60s


90s - 20??

  • Yahoo Groups mailinglist for the Stanford Data Mining group, first message from 30th Oct. 1997; Sergey Brin + Jeff Ullman + Howard Ho + George H. John + Pat Langley; (list of first messages, 1997), (group overview, 2002), (group overview, 2008)
  • journal started in 1997, Data Mining and Knowledge Discovery; (overview of issues, 1997 - now)
  • MIT thesis; ParaSite: Mining Structural Information on the Web, by Ellen Spertus; (paper, Feb. 1997/8) [also the name of] The Sixth International WWW Conference (WWW 97). Santa Clara, USA, April 7-11, 1997. (reference from Brin + Page, 1998) Also appearing in Computer Networks and ISDN Systems: The International Journal of Computer and Telecommunications Networking 29 (1997) 1205-1215.
  • Stanford research paper presenting Google, The Anatomy of a Large-Scale Hypertextual Web Search Engine, by Sergey Brin + Lawrence Page (paper html, 1998), (paper pdf, 1998)
  • research paper by Sergey Brin on Nearest Neighbours; (paper, 2005)


notes

no
ne

gallery