|
|
Line 3: |
Line 3: |
| =mining software= | | =mining software= |
|
| |
|
| ==CLiPS (Computational Linguistics & Psycholinguistics Research Center)==
| |
|
| |
|
| the research center of the University of Antwerp where Pattern is comming from
| |
|
| |
| {| class="wikitable" style="padding:20px;width:750px;"
| |
| |-
| |
| ! text mining project !! application !! date !! link
| |
| |-
| |
| | automatic detection of crucial information || in clinical reports || 01/01/2016 - 31/12/2019 || [http://www.clips.ua.ac.be/projects/accumulate-acquiring-crucial-medical-information-using-language-technology link]
| |
| |-
| |
| | computational creativity || TheRiddlerBot, a twitter bot generating riddles about well-known characters || 2015 || [http://www.clips.ua.ac.be/sites/default/files/the_riddler_bot_a_next_step_on_the_ladder_towards_creative_twitter_bots.pdf link]
| |
| |-
| |
| | automatic opinion detection (using commercial web-services) || analyse media coverage on political issues; reporting on the 'sentiment' tone of news reports about the new Belgium parlement of 2011 (established after 541 days of negotiation) || 01/02/2014 - 31/01/2015 || [http://www.clips.ua.ac.be/projects/text-analytics-web-services-for-profiling-and-opinion-mining link]
| |
| |-
| |
| | styleometry || authorship attribution, personality prediction, gender prediction || 01/10/2014 - 30/09/2018 || [http://www.clips.ua.ac.be/projects/deep-linguistic-features-for-computational-stylometry link]
| |
| |-
| |
| | connecting knowledge from different domains || investigating the learning process of language by very young children || 01/01/2014 - 31/12/2017 || [http://www.clips.ua.ac.be/projects/bootstrapping-operations-in-language-acquisition-a-computational-psycholinguistic-approach link]
| |
| |-
| |
| | Automatic Monitoring for Cyberspace Applications (AMiCA) || mine blogs, chat rooms, social networking sites, tracing harmful content, contact, or conduct (cyber-bullying, pedophilia); detecting risks, and sending alerts to moderators; collecting accurate data to support providers, science and governments in decision-making processes with respect to child safety online || 01/01/2013 - 31/12/2016 || [http://www.amicaproject.be/ link]
| |
| |-
| |
| | language technology development for African languages || improving language software for minority languages, like translation engines and corpus development || 2006 - 2011 || [http://www.clips.ua.ac.be/projects/data-driven-techniques-in-african-language-technology link]
| |
| |-
| |
| | '''improving text-mining techniques''' || sentiment-analysis on suicide notes, to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness; (emotion detection, suicide prevention) || 2012 || [http://www.clips.ua.ac.be/sites/default/files/f_bii-fine-grained-emotion-detection-in-suicide-notes-a-thresholding-approach_4099.pdf link]
| |
| |}
| |
|
| |
| ==Weka 3==
| |
| http://www.cs.waikato.ac.nz/ml/Title-Bird-Header.gif
| |
|
| |
| Weka is a data mining application written in Java, developed at the university of Waikato, New Sealand. [http://www.cs.waikato.ac.nz/ml/weka/ This is a link to Weka's project page.]
| |
|
| |
| description of text mining (2003): '' It most commonly targets text whose function is the '''communication of factual information or opinions''', (...) “Textmining”(sometimes called “text data mining”;[4]) defies tight definition but encompasses a wide range of activities: text summarization; document retrieval; document clustering; text categorization; language identification; authorship ascription; identifying phrases, phrase structures, and key phrases; extracting “entities” such as names, dates, and abbreviations; locating acronyms and their definitions; filling predefined templates with extracted information; and even learning rules from such templates[8].'' [http://researchcommons.waikato.ac.nz/bitstream/handle/10289/1298/text%20mining%20in%20a%20digital%20library.pdf?sequence=1&isAllowed=y (Witten ed., 2003)]
| |
|
| |
|
| |
| {| class="wikitable" style="padding:20px;width:750px;"
| |
| |-
| |
| ! (text) mining project !! application !! date !! link
| |
| |-
| |
| | Document Copy Detector || plagiarism detection || 2016 || [http://www.cs.waikato.ac.nz/~fjb11/publications/inffus15.pdf link]
| |
| |-
| |
| | large-scale continuous global optimisation || increase efficiency in search engines; || 2015 || [http://www.ncbi.nlm.nih.gov/pubmed/25950391 link]
| |
| |-
| |
| | opinion mining || tourism product reviews || 2014 || [http://www.cs.waikato.ac.nz/~fjb11/publications/ESWA2014.pdf link]
| |
| |-
| |
| | mining on the web || ranking order op webpages in search engines (p.21); search query patterns for advertisment profiling; costumer recommendations to increase sales; user recommendations for films to make sure they come back to the website; ''And then there are social networks and other personal data;'' decision procedures at loan-companies through questionaires, which motivates such companies when seeing their results increase: it 'works' (p.22); ''detect intrusion by recognizing unusual patterns of operation'' (p.28) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
| |
| |-
| |
| | marketing and sales || (''In these applications, predictions themselves are the chief interest: The structure of how decisions are made is often completely irrelevant.'') to 'woo' customers back by offering special treatments; product positioning in supermarkets after 'Market basket analysis', ''customers who buy beer also buy chips,''; personal discounts: ''Supermarkets want you to feel that although [prices are ricing], they don’t increase so much for you because the bargains offered by personalized coupons make it attractive for you to stock up on things that you wouldn’t normally have bought.''; direct marketing, focused promotions; demographic information is correlated to product demands; (p.26-) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
| |
| |-
| |
| | (healthcare) || increasing success rated of artificial insemination || ? || ?
| |
| |-
| |
| | (image recognition) || detect oil slicks from satellite images to give early warning of ecological disasters and deter illegal dumping (p.23) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
| |
| |-
| |
| | (electricity industry) || determine future demand for power as far in advance as possible (p.24) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
| |
| |-
| |
| | (technological diagnoses) || forestall failures that disrupt industrial processes (p.25) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
| |
| |-
| |
| | text mining in a digital library || enrich the library reader’s experience; ''a carefully chosen set of authoritative documents in a particular topic area is far more useful to those working in the area than a huge, unfocused collection (like the Web)'' || 2003 - 2004 || [http://researchcommons.waikato.ac.nz/bitstream/handle/10289/1298/text%20mining%20in%20a%20digital%20library.pdf?sequence=1&isAllowed=y link], [http://researchcommons.waikato.ac.nz/handle/10289/1298 link]
| |
| |-
| |
| | '''improving text mining techniques''' || (un)supervised creation of twitter opinion/sentiment(POS/NEUT/NEG) corpus; towards solution for text mining that are general, effective, and scalable; || 2015 || [http://ijcai.org/papers15/Papers/IJCAI15-177.pdf link], [http://www.cs.waikato.ac.nz/~ml/publications.html link]
| |
|
| |
| |}
| |
|
| |
|
| |
| ==notes==
| |
| ''Automation is especially welcome in situations involving continuous monitoring,''
| |
| ''a job that is time consuming and exceptionally tedious for humans.'' (Witten ed., 2011)
| |
|
| |
| ''Statistical tests are used to validate machine learning models ''
| |
| ''and to evaluate machine learning algorithms.'' (Witten ed., 2011)
| |
|
| |
| ''If you do come up with conclusions (e.g., red car owners being greater credit risks),''
| |
| ''you need to attach caveats to them and back them up with arguments other than''
| |
| ''purely statistical ones. The point is that data mining is just a tool in the whole''
| |
| ''process. It is people who take the results, along with other knowledge, and decide''
| |
| ''what action to apply.'' (Witten ed., 2011)
| |
|
| |
| ==gallery==
| |
|
| |
| ==references==
| |
| * Witten, Frank, Hall 2011 - [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining - Practical Machine Learning Tools and Techniques, 3rd Edition]
| |
|
| |
|
| </div> | | </div> |