User:Manetta/i-could-have-written-that/parsing: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 9: Line 9:


==notes==
==notes==
''The computer scientists view of textual content as "unstructured", be it in a webpage or the pages of a scanned text, underscore / reflect the negligence to the processes and labor of writing, editing, design, layout, typesetting, and eventually publishing, collecting and cataloging'' (Murtaugh 2016, [http://www.mondotheque.be/wiki/index.php/A_bag_but_is_language_nothing_of_words A_bag_but_is_language_nothing_of_words])
------------------------------------


''The superficial similarity between text and data mining conceals real differences. In the Preface (page xxi), we characterized data mining as the extraction of implicit, previously unknown, and potentially useful information from data. With text mining, however, the information to be extracted is clearly and explicitly stated in the text. It is not hidden at all—most authors go to great pains to make sure that they express themselves clearly and unambiguously. From a human point of view, the only sense in which it is “previously unknown” is that time restrictions make it infeasible for people to read the text themselves.'' (Witten 2011, p. 386)
''The superficial similarity between text and data mining conceals real differences. In the Preface (page xxi), we characterized data mining as the extraction of implicit, previously unknown, and potentially useful information from data. With text mining, however, the information to be extracted is clearly and explicitly stated in the text. It is not hidden at all—most authors go to great pains to make sure that they express themselves clearly and unambiguously. From a human point of view, the only sense in which it is “previously unknown” is that time restrictions make it infeasible for people to read the text themselves.'' (Witten 2011, p. 386)

Revision as of 16:43, 18 January 2016

parsing

text is unstructured, amorphous, and difficult to deal with. (...) The motivation for trying to extract information from it is compelling—even if success is only partial. (Witten 2011, p.386)

Weka 3

Pattern

notes

The computer scientists view of textual content as "unstructured", be it in a webpage or the pages of a scanned text, underscore / reflect the negligence to the processes and labor of writing, editing, design, layout, typesetting, and eventually publishing, collecting and cataloging (Murtaugh 2016, A_bag_but_is_language_nothing_of_words)


The superficial similarity between text and data mining conceals real differences. In the Preface (page xxi), we characterized data mining as the extraction of implicit, previously unknown, and potentially useful information from data. With text mining, however, the information to be extracted is clearly and explicitly stated in the text. It is not hidden at all—most authors go to great pains to make sure that they express themselves clearly and unambiguously. From a human point of view, the only sense in which it is “previously unknown” is that time restrictions make it infeasible for people to read the text themselves. (Witten 2011, p. 386)

gallery