User:Manetta/i-could-have-written-that/parsing: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
(Blanked the page)
 
Line 1: Line 1:
<div style="width:100%;max-width:800px;">
__TOC__
=parsing=
''text is unstructured, amorphous, and difficult to deal with. (...) The motivation for trying to extract information from it is compelling—even if success is only partial.'' (Witten 2011, p.386)


''In other words, by "unstructured" it is meant: unstructured in relation to the machine -- that is, not explicitly structured in a format directly amenable to use by automated means.'' (Murtaugh 2016, [http://www.mondotheque.be/wiki/index.php/A_bag_but_is_language_nothing_of_words A_bag_but_is_language_nothing_of_words])
==Weka 3==
==Pattern==
==notes==
''The computer scientists view of textual content as "unstructured", be it in a webpage or the pages of a scanned text, underscore / reflect the negligence to the processes and labor of writing, editing, design, layout, typesetting, and eventually publishing, collecting and cataloging'' (Murtaugh 2016, [http://www.mondotheque.be/wiki/index.php/A_bag_but_is_language_nothing_of_words A_bag_but_is_language_nothing_of_words])
------------------------------------
''The superficial similarity between text and data mining conceals real differences. In the Preface (page xxi), we characterized data mining as the extraction of implicit, previously unknown, and potentially useful information from data. With text mining, however, the information to be extracted is clearly and explicitly stated in the text. It is not hidden at all—most authors go to great pains to make sure that they express themselves clearly and unambiguously. From a human point of view, the only sense in which it is “previously unknown” is that time restrictions make it infeasible for people to read the text themselves.'' (Witten 2011, p. 386)
==gallery==
</div>

Latest revision as of 16:43, 19 January 2016