7. Data Mining
All Past Paper Questions: https://docs.google.com/document/d/1xs8OARJz5f_FvsmwGeYaKqG2dCGurXeLRbaVD_ZEGU4/edit?usp=sharing
Badly organized version: Click here
Meaning
- process of analysing trends in larg quantity of data
- to show trends
- to show patterns
Stages
They are:
- bussiness understanding
- data understanding
- data preparation
- data modelling
- evaluvating
Bussiness Understanding
- description
- what is required?
- is it worth the cost/risk?
- success criteria
- tasks (described)
- identify bussiness goals and impact
- assessing the situation
- problem to be solved by data mining
- producing a project plan
Data Understanding
- description
- identifies data sets to be used
- ensure enough data is available
- ensure we have enough time and resources
- create a report about data sets to be used
Data Preparation
- description
- selects data to use
- according to relevance (of requirements)
- "cleaning"
- remove redundant + irrelevant data
- create combinations, by merging data with common characteristics
- selects data to use
Data Modelling
-
description
- out the mining to discover relationships
- create new relationships to analyze patterns
- document the mining
- allows repetiton
- and evaluvation of results
-
tasks (described)
- gather data required for the process
- Documenting/describing the data, eg:
- location of source
- how it was acquired
- Listing the source of data
- Populate the analysis tool
- Reviewing data to check
- completeness
- anomalies
- outliers
- Visually checking the data for patterns
- Verifying the quality gathered
Evaluvating
- description
- process against success criteria
- if it is met
- results shown to assess
- else
- model is amended,
- and re-run
?? Deployment ??
- tasks (described)
- Plan how data mining results will be used
- Creating a plan to maintain the model
- ensure it remains valid
- process to new data to generate predictions/trends (required)
- Reporting the final results of the data mining process
- to check for errors
- and how to correct them
Uses
by a Bussinesses
-
advantages
- analyses large amounts of data from many sources
- to discover trends
- (that are not immediately obvious)
- Retail businesses
- analyse historical data
- to predict who may respond to targeted advertising
- to increase sales
- who may respons to what advertising techniques
- using social media
- emails
- direct marketing
- discounts
- vouchers
- who may buy related goods
- Financial institutions
- analyses customer data
- to detect fraudulent credit card transactions
- to protect user
- determine who is interested in loans
- Manufacturers
- analyse production data
- to detect faulty equipment
- + determine optimal control parameters
- to increase quality
- + reduce errors
-
disadvantages
- privacy issues of extracting data from internet
- so people reluctant to allow their data to be used
- businesses cannot rely on the results of data mining
- to make informed decisions
- may lose control of their customer data
- will be responsible for data breaches
- data might not be anonymised
- data might be traced back
- so dont like to provide personal info
- legal liability if data leaked/lost
- Data can be misused
- to take advantage of vulnerable people
by a Retail Bussinesses
- how
- Divide customers into groups according to purchasing habits
- e.g. recency/frequency/monetary (RFM) groups
- different groups are targeted by different marketing campaigns
- recent buyers sent money-off coupons with time limit
- frequent customers sent coupons off regular purchases
- big spenders dealt with differently from those who spend little at a time
- help decide when to put items on sale
- Can target customers
- from purchasing habits
- from loyalty card schemes
- decide advertising campaigns worked
- decide which items sold well (to different demographics)
- Divide customers into groups according to purchasing habits
by a Health Care Organizations
- why
- Identify patterns in large sets of data
- Data patterns to predict trends in information
- Compare symptoms to analyse disease causes
- Determine the effectiveness of treatments
- most effective course of treatment
- Repeated analysis to standardise treatment of specific diseases
- to speed up diagnosis
- Determine patterns of medical claims by patients
- Determine abnormal patient outcomes from treatments
- Determine abnormal patterns of medical claims by patients (to identify fraudulent claims)
for Analyzing Social Trends
-
advantages
- Useful for
- predicting future trends
- keeping track of customer habits
- decision making
- Speeds up the data analysis
- people can collect information about marketed products
- can find out fake products (with marketing analysis)
- can create unexpected trends
- e.g. 'me-too' campaigns
- can cause good influences
- eg: anti-plastic movements
- ?? Can influence the rebellion against plastic wrapped goods being identified by data mining of social trends ??
- Useful for
-
disadvantages
- Violates user privacy
- collects irrelevant information
- need skilled person analyze & understand
- security issues as data can be misused (to harm others)
- Results can be sold to others
- without being anonymised
- revealing personal details
- Data patterns can be misused against
- different social groups
- Accuracy of data can be in doubt
for Analyzing Economic Trends
- tasks
- Identify anomalies in economic data
- Searching for relationships between variables
- Clustering 'similar' groups and structures
- generalizing known structures (to apply to new data)
- Finding functions that model the data with the least error
- Summarising the data
- Producing reports
- charts/graphs to show trends in the data.
Other / Unknown
- document analysis
- use
- advantages
- a systematic process of reviewing printed and computer-based documents
- provides a large amount of information
- Data in documents is analysed to gain knowledge
- requires the analyst to have knowledge (in that field)
- Analyst should combine document research with other forms of analysis
- Analysts should be highly skilled
- uses data that has already been collected
- only data selection is needed
- requires less time than other methods
- costs only include the method of analysing
- no costs for collecting data
- cheap
- Many documents are freely available
- Document analysis uses non-invasive research methods
- does not affect the subject being researched
- Historical documents are not altered by being researched
- so can be re-examined by analysts
- Documents contain references that can be cross-refenced to other sources
- making the research more reliable
- a systematic process of reviewing printed and computer-based documents
- disadvantages
- documents are produced only as a record of an event
- lacks insufficient detail to be useful in the research
- Data may be inaccurately recorded
- may not reflect actual events
- physical documents, hard to search through
- research can be prolonged
- might miss important data
- Researcher may be biased when selecting documents
- data may be incomplete.
- advantages
- use