User:Manetta/i-could-have-written-that/kdd-applications: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
Line 39: Line 39:
the research center of the University of Antwerp where Pattern is comming from
the research center of the University of Antwerp where Pattern is comming from


'''text mining application focus''': cybersecurity, sentiment analysis and authorship detection
'''text mining application focus''': cybersecurity, sentiment analysis and authorship detection, text understanding, language development research with young children


--------------------
--------------------

Latest revision as of 14:41, 28 February 2016

knowledge discovery in data applications

  • supermarket customer behavior prediction, increasing sales & profit (Witter 2011)
  • artificial insemination, increasing the chance of pregnancy (Witter 2011)
  • policy windtunnels, efficiency & policy optimation --- (NRC 15-01-2016)
  • vacancy websites, cv parsing & personalised recommendations



WCC, Smart Search & Match

www.wcc-group.com

WCC is the world’s leading developer of advanced search and match technology. Our software enables large organizations to make optimal business decisions through better use of available data sources. WCC about-page

text mining application focus: optimizing business decision making processes


text mining project application date link
... ... ... ...


CLiPS (Computational Linguistics & Psycholinguistics Research Center)

CLiPS project website CLiPS research mission
CLiPS page at the University of Antwerp

the research center of the University of Antwerp where Pattern is comming from

text mining application focus: cybersecurity, sentiment analysis and authorship detection, text understanding, language development research with young children


text mining project application date link
automatic detection of crucial information in clinical reports 01/01/2016 - 31/12/2019 link
computational creativity TheRiddlerBot, a twitter bot generating riddles about well-known characters 2015 link
automatic opinion detection (using commercial web-services) analyse media coverage on political issues --- reporting on the 'sentiment' tone of news reports about the new Belgium parlement of 2011 (established after 541 days of negotiation) 01/02/2014 - 31/01/2015 link
styleometry authorship attribution, personality prediction, gender prediction 01/10/2014 - 30/09/2018 link
connecting knowledge from different domains investigating the learning process of language by very young children 01/01/2014 - 31/12/2017 link
Automatic Monitoring for Cyberspace Applications (AMiCA) mine blogs, chat rooms, social networking sites, tracing harmful content, contact, or conduct (cyber-bullying, pedophilia) --- detecting risks, and sending alerts to moderators --- collecting accurate data to support providers, science and governments in decision-making processes with respect to child safety online 01/01/2013 - 31/12/2016 link
language technology development for African languages improving language software for minority languages, like translation engines and corpus development 2006 - 2011 link
improving text-mining techniques sentiment-analysis on suicide notes, to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness --- (emotion detection, suicide prevention) 2012 link


Weka 3

Title-Bird-Header.gif

Weka is a data mining application written in Java, developed at the university of Waikato, New Sealand. This is a link to Weka's project page.

description of text mining (2003): It most commonly targets text whose function is the communication of factual information or opinions, (...) “Textmining”(sometimes called “text data mining” ---[4]) defies tight definition but encompasses a wide range of activities: text summarization --- document retrieval --- document clustering --- text categorization --- language identification --- authorship ascription --- identifying phrases, phrase structures, and key phrases --- extracting “entities” such as names, dates, and abbreviations --- locating acronyms and their definitions --- filling predefined templates with extracted information --- and even learning rules from such templates[8]. (Witten ed., 2003)

text mining application focus: computer technical improvements of text-mining techniques, other applications are merely described as examples in the Weka textbook: (Witter ed., 2011).


(text) mining project application date link
Document Copy Detector plagiarism detection 2016 link
large-scale continuous global optimisation increase efficiency in search engines 2015 link
opinion mining tourism product reviews 2014 link
mining on the web ranking order op webpages in search engines (p.21) --- search query patterns for advertisment profiling --- costumer recommendations to increase sales --- user recommendations for films to make sure they come back to the website --- And then there are social networks and other personal data --- decision procedures at loan-companies through questionaires, which motivates such companies when seeing their results increase: it 'works' (p.22) --- detect intrusion by recognizing unusual patterns of operation (p.28) 2011 book
marketing and sales (In these applications, predictions themselves are the chief interest: The structure of how decisions are made is often completely irrelevant.) to 'woo' customers back by offering special treatments --- product positioning in supermarkets after 'Market basket analysis', customers who buy beer also buy chips, --- personal discounts: Supermarkets want you to feel that although [prices are ricing], they don’t increase so much for you because the bargains offered by personalized coupons make it attractive for you to stock up on things that you wouldn’t normally have bought. --- direct marketing, focused promotions --- demographic information is correlated to product demands --- (p.26-) 2011 book
(healthcare) increasing success rated of artificial insemination ? ?
(image recognition) detect oil slicks from satellite images to give early warning of ecological disasters and deter illegal dumping (p.23) 2011 book
(electricity industry) determine future demand for power as far in advance as possible (p.24) 2011 book
(technological diagnoses) forestall failures that disrupt industrial processes (p.25) 2011 book
text mining in a digital library enrich the library reader’s experience --- a carefully chosen set of authoritative documents in a particular topic area is far more useful to those working in the area than a huge, unfocused collection (like the Web) 2003 - 2004 link, link
improving text mining techniques (un)supervised creation of twitter opinion/sentiment(POS/NEUT/NEG) corpus --- towards solution for text mining that are general, effective, and scalable --- 2015 link, link


World Well Being Project (WWBP)

www.wwbp.org

The WWBP is a research project developed at the University of Pennsylvania's Positive Psychology Center. The project works on techniques for measuring psychological well-being and physical health based on the analysis of language in social media. The aim of the project is described as: advancing understanding of human flourishing using language analysis. - WWBP research project website

text mining application focus: correlations between written language on social media (the possible becomes the desirable) and psychological profiles, personality characteristic like age, gender or income, and the effects on the writer's health and well-being


(text) mining project application date link
(analytics) well-being prediction measuring life satisfaction scale ('the good life') according to Tweets and FB messages by rating the PERMA (Positive Emotions, Engagement, Relationships, Meaning, and Accomplishment), 2016 link
(analytics) user attribute stylistic differences correlation of linguistic Twitter style and gender, age, occupational class --- income through tweets --- occupational class of users through tweets --- the mechanics of human achievement through talent and effort 2015 - 2016 link, link, link, link
(analytics) health prediction linguistic style at twitter as marker of cardiovascular mortality at the community --- subjective well-being, health, gender differences & personality --- personality, age, gender & mental illness 2015 link, link, link
(analytics) psychological profiles temporal orientation (speaking about past/present/future) via Facebook, correlated with conscientiousness, age, and gender, to explore social scientific questions in association with the factors openness to experience, satisfaction with life, depression, IQ, and one's number of friends --- personality predictions trained on 66.000+ FB users, by using questionaires, social facts (as nr. of friends or political attitudes), friend's ratings to develop a quick and cheaply assessment of people's personalities --- detecting changes in degree of depression through FB --- correlating Big Five personality categories to Facebook user's text, giving a new insight in the Big Five test 2015 link, link, link, link
(analytics) religion, understanding the undescribable understanding mystical experiences 2015 link
improving text mining techniques unsupervised sentiment corpuses from Twitter avoiding ambiguity --- using MTers to predict user attributes from Tweets to improve MT workflows 2015 link, link

notes

A requirement common to both data and text mining is that the information extracted should be 
potentially useful. In one sense, this means actionable—capable of providing a basis for 
actions to be taken automatically. (Witter 2011, p. 386)
Automation is especially welcome in situations involving continuous monitoring, 
a job that is time consuming and exceptionally tedious for humans. (Witten ed., 2011)
Statistical tests are used to validate machine learning models 
and to evaluate machine learning algorithms. (Witten ed., 2011)
If you do come up with conclusions (e.g., red car owners being greater credit risks),
you need to attach caveats to them and back them up with arguments other than
purely statistical ones. The point is that data mining is just a tool in the whole
process. It is people who take the results, along with other knowledge, and decide
what action to apply. (Witten ed., 2011)


references