User:Manetta/i-could-have-written-that/mining-software: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
Line 23: Line 23:
| Automatic Monitoring for Cyberspace Applications (AMiCA) || mine blogs, chat rooms, social networking sites, tracing harmful content, contact, or conduct (cyber-bullying, pedophilia); detecting risks, and sending alerts to moderators; collecting accurate data to support providers, science and governments in decision-making processes with respect to child safety online || 01/01/2013 - 31/12/2016 || [http://www.amicaproject.be/ link]
| Automatic Monitoring for Cyberspace Applications (AMiCA) || mine blogs, chat rooms, social networking sites, tracing harmful content, contact, or conduct (cyber-bullying, pedophilia); detecting risks, and sending alerts to moderators; collecting accurate data to support providers, science and governments in decision-making processes with respect to child safety online || 01/01/2013 - 31/12/2016 || [http://www.amicaproject.be/ link]
|-
|-
| improving text-mining techniques || sentiment-analysis on suicide notes, to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness; (emotion detection, suicide prevention) || 2012 || [http://www.clips.ua.ac.be/sites/default/files/f_bii-fine-grained-emotion-detection-in-suicide-notes-a-thresholding-approach_4099.pdf link]
| language technology development for African languages || improving language software for minority languages, like translation engines and corpus development || 2006 - 2011 || [http://www.clips.ua.ac.be/projects/data-driven-techniques-in-african-language-technology link]
|-
|-
| language technology development for African languages || improving language software for minority languages, like translation engines and corpus development || 2006 - 2011 || [http://www.clips.ua.ac.be/projects/data-driven-techniques-in-african-language-technology link]
| '''improving text-mining techniques''' || sentiment-analysis on suicide notes, to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness; (emotion detection, suicide prevention) || 2012 || [http://www.clips.ua.ac.be/sites/default/files/f_bii-fine-grained-emotion-detection-in-suicide-notes-a-thresholding-approach_4099.pdf link]
|}
|}


Line 31: Line 31:
http://www.cs.waikato.ac.nz/ml/Title-Bird-Header.gif
http://www.cs.waikato.ac.nz/ml/Title-Bird-Header.gif


[http://www.cs.waikato.ac.nz/ml/weka/ Weka's project page]
Weka is a data mining application written in Java, developed at the university of Waikato, New Sealand. [http://www.cs.waikato.ac.nz/ml/weka/ This is a link to Weka's project page.]


description of text mining (2003): '' It most commonly targets text whose function is the '''communication of factual information or opinions''', (...) “Textmining”(sometimes called “text data mining”;[4]) defies tight definition but encompasses a wide range of activities: text summarization; document retrieval; document clustering; text categorization; language identification; authorship ascription; identifying phrases, phrase structures, and key phrases; extracting “entities” such as names, dates, and abbreviations; locating acronyms and their definitions; filling predefined templates with extracted information; and even learning rules from such templates[8].'' [http://researchcommons.waikato.ac.nz/bitstream/handle/10289/1298/text%20mining%20in%20a%20digital%20library.pdf?sequence=1&isAllowed=y (Witten ed., 2003)]


{| class="wikitable" style="padding:20px;width:750px;"
|-
! (text) mining project !! application !! date !! link
|-
| Document Copy Detector || plagiarism detection || 2016 || [http://www.cs.waikato.ac.nz/~fjb11/publications/inffus15.pdf link]
|-
| large-scale continuous global optimisation || increase efficiency in search engines; || 2015 || [http://www.ncbi.nlm.nih.gov/pubmed/25950391 link]
|-
| opinion mining || tourism product reviews || 2014 || [http://www.cs.waikato.ac.nz/~fjb11/publications/ESWA2014.pdf link]
|-
| mining on the web || ranking order op webpages in search engines (p.21); search query patterns for advertisment profiling; costumer recommendations to increase sales; user recommendations for films to make sure they come back to the website; ''And then there are social networks and other personal data;'' decision procedures at loan-companies through questionaires, which motivates such companies when seeing their results increase: it 'works' (p.22); ''detect intrusion by recognizing unusual patterns of operation'' (p.28) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
|-
| marketing and sales || (''In these applications, predictions themselves are the chief interest: The structure of how decisions are made is often completely irrelevant.'') to 'woo' customers back by offering special treatments; product positioning in supermarkets after 'Market basket analysis', ''customers who buy beer also buy chips,''; personal discounts: ''Supermarkets want you to feel that although [prices are ricing], they don’t increase so much for you because the bargains offered by personalized coupons make it attractive for you to stock up on things that you wouldn’t normally have bought.''; direct marketing, focused promotions; demographic information is correlated to product demands; (p.26-)  || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
|-
| (healthcare) || increasing success rated of artificial insemination || ? || ?
|-
| (image recognition) || detect oil slicks from satellite images to give early warning of ecological disasters and deter illegal dumping (p.23)  || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
|-
| (electricity industry) || determine future demand for power as far in advance as possible (p.24) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
|-
| (technological diagnoses) || forestall failures that disrupt industrial processes (p.25) || 2011 || [http://www.cs.waikato.ac.nz/ml/weka/book.html book]
|-
| text mining in a digital library || enrich the library reader’s experience; ''a carefully chosen set of authoritative documents in a particular topic area is far more useful to those working in the area than a huge, unfocused collection (like the Web)'' || 2003 - 2004 || [http://researchcommons.waikato.ac.nz/bitstream/handle/10289/1298/text%20mining%20in%20a%20digital%20library.pdf?sequence=1&isAllowed=y link], [http://researchcommons.waikato.ac.nz/handle/10289/1298 link]
|-
| '''improving text mining techniques''' || (un)supervised creation of twitter opinion/sentiment(POS/NEUT/NEG) corpus; towards solution for text mining that are general, effective, and scalable;  || 2015 || [http://ijcai.org/papers15/Papers/IJCAI15-177.pdf link], [http://www.cs.waikato.ac.nz/~ml/publications.html link]
|}




==notes==
==notes==
   
  ''Automation is especially welcome in situations involving continuous monitoring,''
''a job that is time consuming and exceptionally tedious for humans.'' (Witten ed., 2011)
 
''Statistical tests are used to validate machine learning models ''
''and to evaluate machine learning algorithms.'' (Witten ed., 2011)
 
''If you do come up with conclusions (e.g., red car owners being greater credit risks),''
''you need to attach caveats to them and back them up with arguments other than''
''purely statistical ones. The point is that data mining is just a tool in the whole''
''process. It is people who take the results, along with other knowledge, and decide''
''what action to apply.'' (Witten ed., 2011)


==gallery==
==gallery==


==references==
* Witten, Frank, Hall 2011 - [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining - Practical Machine Learning Tools and Techniques, 3rd Edition]


</div>
</div>

Revision as of 13:03, 28 January 2016

mining software

CLiPS (Computational Linguistics & Psycholinguistics Research Center)

the research center of the University of Antwerp where Pattern is comming from

text mining project application date link
automatic detection of crucial information in clinical reports 01/01/2016 - 31/12/2019 link
computational creativity TheRiddlerBot, a twitter bot generating riddles about well-known characters 2015 link
automatic opinion detection (using commercial web-services) analyse media coverage on political issues; reporting on the 'sentiment' tone of news reports about the new Belgium parlement of 2011 (established after 541 days of negotiation) 01/02/2014 - 31/01/2015 link
styleometry authorship attribution, personality prediction, gender prediction 01/10/2014 - 30/09/2018 link
connecting knowledge from different domains investigating the learning process of language by very young children 01/01/2014 - 31/12/2017 link
Automatic Monitoring for Cyberspace Applications (AMiCA) mine blogs, chat rooms, social networking sites, tracing harmful content, contact, or conduct (cyber-bullying, pedophilia); detecting risks, and sending alerts to moderators; collecting accurate data to support providers, science and governments in decision-making processes with respect to child safety online 01/01/2013 - 31/12/2016 link
language technology development for African languages improving language software for minority languages, like translation engines and corpus development 2006 - 2011 link
improving text-mining techniques sentiment-analysis on suicide notes, to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness; (emotion detection, suicide prevention) 2012 link

Weka 3

Title-Bird-Header.gif

Weka is a data mining application written in Java, developed at the university of Waikato, New Sealand. This is a link to Weka's project page.

description of text mining (2003): It most commonly targets text whose function is the communication of factual information or opinions, (...) “Textmining”(sometimes called “text data mining”;[4]) defies tight definition but encompasses a wide range of activities: text summarization; document retrieval; document clustering; text categorization; language identification; authorship ascription; identifying phrases, phrase structures, and key phrases; extracting “entities” such as names, dates, and abbreviations; locating acronyms and their definitions; filling predefined templates with extracted information; and even learning rules from such templates[8]. (Witten ed., 2003)


(text) mining project application date link
Document Copy Detector plagiarism detection 2016 link
large-scale continuous global optimisation increase efficiency in search engines; 2015 link
opinion mining tourism product reviews 2014 link
mining on the web ranking order op webpages in search engines (p.21); search query patterns for advertisment profiling; costumer recommendations to increase sales; user recommendations for films to make sure they come back to the website; And then there are social networks and other personal data; decision procedures at loan-companies through questionaires, which motivates such companies when seeing their results increase: it 'works' (p.22); detect intrusion by recognizing unusual patterns of operation (p.28) 2011 book
marketing and sales (In these applications, predictions themselves are the chief interest: The structure of how decisions are made is often completely irrelevant.) to 'woo' customers back by offering special treatments; product positioning in supermarkets after 'Market basket analysis', customers who buy beer also buy chips,; personal discounts: Supermarkets want you to feel that although [prices are ricing], they don’t increase so much for you because the bargains offered by personalized coupons make it attractive for you to stock up on things that you wouldn’t normally have bought.; direct marketing, focused promotions; demographic information is correlated to product demands; (p.26-) 2011 book
(healthcare) increasing success rated of artificial insemination ? ?
(image recognition) detect oil slicks from satellite images to give early warning of ecological disasters and deter illegal dumping (p.23) 2011 book
(electricity industry) determine future demand for power as far in advance as possible (p.24) 2011 book
(technological diagnoses) forestall failures that disrupt industrial processes (p.25) 2011 book
text mining in a digital library enrich the library reader’s experience; a carefully chosen set of authoritative documents in a particular topic area is far more useful to those working in the area than a huge, unfocused collection (like the Web) 2003 - 2004 link, link
improving text mining techniques (un)supervised creation of twitter opinion/sentiment(POS/NEUT/NEG) corpus; towards solution for text mining that are general, effective, and scalable; 2015 link, link


notes

Automation is especially welcome in situations involving continuous monitoring, 
a job that is time consuming and exceptionally tedious for humans. (Witten ed., 2011)
Statistical tests are used to validate machine learning models 
and to evaluate machine learning algorithms. (Witten ed., 2011)
If you do come up with conclusions (e.g., red car owners being greater credit risks),
you need to attach caveats to them and back them up with arguments other than
purely statistical ones. The point is that data mining is just a tool in the whole
process. It is people who take the results, along with other knowledge, and decide
what action to apply. (Witten ed., 2011)

gallery

references