Data mining for business development

By | 17.09.2018

mining refers to the application of algorithms for extracting patterns from data. Data mining, business decisions [3]. Data mining or development [3]. This. Literature Review on Interestingness Based Data Mining for Business Development Sakthi Nathiarasan A*1, Manikandan M2 *1,2Department of CSE. Data Mining Specialist Resume; A data mining specialist is a person is to enhance business processes. A data mining specialist is and development of.
I have experience in the field of Business development for companies and data mining and creating analytical reports and presentations. I also have intermediate level. Data Mining Executive Overview Alan Montgomery VP Business Development, SPSS. Report. Data Mining Executive Overview Alan Montgomery VP Business Development, . mining refers to the application of algorithms for extracting patterns from data. Data mining, business decisions [3]. Data mining or development [3]. This. Literature Review on Interestingness Based Data Mining for Business Development Sakthi Nathiarasan A*1, Manikandan M2 *1,2Department of CSE. Data Mining Specialist Resume; A data mining specialist is a person is to enhance business processes. A data mining specialist is and development of.


Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers, data mining for business development. It discovers information within the data that queries and reports can't effectively reveal. This paper explores many aspects of data mining in the following areas:

Data Rich, Information Poor

The amount of raw data stored in corporate databases is exploding. From trillions of point-of-sale transactions and credit card purchases to pixel-by-pixel images of galaxies, databases are now measured in gigabytes and terabytes. (One terabyte = one trillion bytes. A terabyte is equivalent to about 2 million books!) For instance, every day, Wal-Mart uploads 20 million point-of-sale transactions to an A&T massively parallel system with 483 processors running a centralized database. Raw data by itself, however, does not provide much information. In today's fiercely competitive business environment, companies need to rapidly turn these terabytes of raw data into significant insights into data mining for business development customers and markets to guide their marketing, investment, and management strategies.

Data Warehouses

The drop in price of data storage has given companies willing to make the investment a tremendous resource: Data about their customers data mining for business development potential customers stored in "Data Warehouses." Data warehouses are becoming part of the technology. Data warehouses are used to consolidate data located in disparate databases. A data warehouse stores large data mining for business development of data by specific categories so it can be more easily retrieved, interpreted, and sorted by users. Warehouses enable executives and managers to work with vast stores of transactional or other data to respond faster to markets and make more informed business decisions. It has been predicted that every business will have a data warehouse within ten years. But merely storing data in a data warehouse does a company little good. Companies will want to learn more about that data to improve knowledge of customers and markets. The company benefits when meaningful trends and patterns are extracted from the data mining for business development is Data Mining?

Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining derives its name from the similarities between searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.

What Can Data Mining Do?

Although data mining is still in its infancy, companies in a wide range of industries - including retail, finance, heath care, manufacturing transportation, and aerospace - are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.

For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of data mining include:

  • Market segmentation - Identify the common characteristics of customers who buy the same products from your company.
  • Customer churn - Predict which customers are likely to leave your company and go to a competitor.
  • Fraud detection - Identify which transactions are most likely to be fraudulent.
  • Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.
  • Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing.
  • Market basket analysis - Understand what products or services are commonly purchased together; e.g., data mining for business development and diapers.
  • Trend analysis - Reveal the difference between a typical customer this month and last.

Data mining technology can generate new business opportunities by:

Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database. Questions that traditionally required extensive hands-on analysis can now be directly answered from the data. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional data mining for business development to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.

Automated discovery of previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying data mining for business development massively parallel computers, companies dig through volumes of data to discover patterns about their customers and products. For example, grocery chains have found that when men data mining for business development to a supermarket to buy diapers, data mining for business development, they sometimes walk out with a six-pack of beer russian coal mining well. Using that information, it's possible to lay out a store so that these items are closer.

AT&T, data mining for business development, A.C. Nielson, and American Express are among the growing ranks of companies implementing data mining techniques for sales and marketing. These systems are crunching through terabytes of point-of-sale data to aid analysts in understanding consumer behavior and promotional strategies. Why? To gain a competitive advantage and increase profitability!

Similarly, financial analysts are plowing through vast sets of financial records, data feeds, and other information sources in order to make investment decisions. Health-care organizations are examining medical records to understand trends of the past so they can reduce costs in the future.

The Evolution of Data Mining

Data mining is a natural development of the increased use of computerized databases to store data and provide answers to business analysts.

Evolutionary Step

Business Question

Enabling Technology

Data Collection (1960s)

"What was my total revenue in the last five years?"

computers, tapes, disks

Data Access (1980s)

"What were unit sales in New England last March?"

faster and cheaper computers with more storage, relational databases

Data Warehousing and Decision Support

"What were unit sales in New England last March? Drill down to Boston."

faster and cheaper computers with more storage, On-line analytical processing (OLAP), multidimensional databases, data warehouses

Data Mining

"What's likely to happen to Boston unit sales next month? Why?"

faster and cheaper computers with more storage, advanced computer algorithms

Traditional query and report tools have been used to describe and extract what is in a database. The user forms a hypothesis about a relationship and verifies it or discounts it with a series of queries against the data. For example, an analyst might hypothesize that people with low income and high debt are bad credit risks and query the database to verify or disprove this assumption. Data mining can be used to generate an hypothesis. For example, an analyst might use a neural net to discover a pattern that analysts did not think to try - for example, that people over 30 years old with low incomes and high debt but who own their own homes and have children are good credit risks.

How Data Mining Works

How is data mining able to tell you important things that you didn't know or what is going to happen next? That technique that is used to perform these feats is called modeling, data mining for business development. Modeling is simply the act of building a model (a set of examples or a mathematical relationship) based on data from situations where the answer is known and then applying the model to other situations where the answers aren't known. Modeling techniques have been around for centuries, of course, but it is only recently that data storage and communication capabilities required to collect and store huge amounts of data, and the computational power to automate modeling techniques to work directly on the data, have been available.

As a simple example all grade mining building a model, consider the director of marketing for a telecommunications company. He would like to focus his marketing and sales efforts on segments of the population most likely to data mining for business development big users of long distance services. He knows a lot about his customers, but it is impossible to discern the common characteristics of his best customers because there are so many variables. From his existing database of customers, which contains information such as age, sex, credit history, income, zip code, occupation, etc., he can use data mining tools, such as neural networks, to identify the characteristics of those customers who make lots of long distance calls. For instance, he might learn that his best customers are unmarried females between the age of 34 and 42 who make in excess of $60,000 per year. This, then, is his model for high value customers, and he would budget his marketing efforts to accordingly.

Data Mining Technologies

The analytical techniques used in data mining are often well-known mathematical algorithms and techniques. What is new is the application of those techniques to general business problems made possible by the increased availability of data and inexpensive storage and processing power. Also, the use of graphical interfaces has led to tools becoming available that business experts can easily use.

Some of the tools used for data mining are:

Artificial data mining for business development networks - Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Decision trees - Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

Rule induction - The extraction of useful if-then rules from data based on statistical significance.

Genetic algorithms - Optimization techniques based on the concepts of genetic combination, mutation, and natural selection.

Nearest neighbor - A classification technique that classifies each record based on the records most similar to it in an historical database.

Real-World Examples

Details about who calls whom, how long they are on the phone, and whether a line is used for fax as well as voice can be invaluable in targeting sales of services and equipment to specific customers. But these tidbits are buried in masses of numbers in the database. By delving into its extensive customer-call database to manage its communications network, a regional telephone company identified new types of unmet customer needs. Using its data mining system, it discovered how to pinpoint prospects for additional services by measuring daily household usage for selected periods. For example, households that make many lengthy calls between 3 p.m. and 6 p.m. are likely to include teenagers who are prime candidates for their own phones and lines. When the company used target marketing that emphasized convenience and value for adults - "Is the phone always tied up?" - hidden demand surfaced. Extensive telephone use between 9 a.m. and 5 p.m. characterized by patterns related to voice, fax, and modem usage suggests a customer has business activity. Target marketing offering those customers "business communications capabilities for small budgets" resulted in sales of additional lines, functions, and equipment.

The ability to accurately gauge customer response to changes in business rules is a powerful competitive advantage. A bank searching for new ways to increase revenues from its credit card operations tested a nonintuitive possibility: Would credit card usage and interest earned increase significantly if the bank halved its 7 gpu mining required payment? With hundreds of gigabytes of data representing two years of average credit card balances, payment amounts, payment timeliness, data mining for business development, credit limit usage, and other key parameters, the bank used a powerful data mining system to model the impact of the proposed policy change on specific customer categories, such as customers consistently near or at their credit limits who make timely minimum or small payments. The bank discovered that cutting minimum payment requirements for small, targeted customer categories could increase average balances and extend indebtedness periods, generating more than $25 million in additional interest earned,

Merck-Medco Managed Care is a mail-order business which sells drugs to the country's largest health care providers: Blue Cross and Blue Shield state organizations, large HMOs, U.S. corporations, state governments, data mining for business development, etc. Merck-Medco is mining its one terabyte data warehouse to uncover hidden links between illnesses and known drug treatments, and spot trends that help pinpoint which drugs are the most effective for what types of patients. The results are more effective treatments that are also less costly. Merck-Medco's data mining project has helped customers save an average of 10-15% on prescription costs.

The Future of Data Mining

In the short-term, the results of data mining will be in profitable, if mundane, business related areas. Micro-marketing campaigns will explore new niches. Advertising will target potential customers with new precision.

In the medium term, data mining may be as common and easy to use as e-mail. We may use these tools to find the best airfare to New York, root out a phone number of a long-lost classmate, or find the best prices on lawn mowers.

The long-term prospects are truly exciting. Imagine intelligent agents turned loose on medical research data or on sub-atomic particle data. Computers may reveal new treatments for diseases or new insights into the nature of the universe. There are potential dangers, though, as discussed below.

Privacy Concerns

What if every telephone call you make, every credit card purchase you make, every flight you take, every visit to the doctor you make, every warranty card you send in, every employment application you fill out, every school record you have, your credit record, every web page you visit . was all collected together? A lot would be known about you! This is an all-too-real company hecla mining. Much of this kind of information is already stored in a database. Remember that phone interview you gave to a marketing data mining for business development last week? Your replies went into a database. Remember that loan application you filled out? In a database, data mining for business development. Too much information about too many people for anybody to make sense of? Not with data mining tools running on massively parallel processing computers! Would you feel comfortable about someone (or lots of someones) having access to all this data about you? And remember, all this data does not have to reside in one physical location; as the net grows, information data mining for business development this type becomes more available to more people.

Check out:


Explore Further on the Internet


Introduction to Data Mining

Information about data mining research, applications, data mining for business development, and tools:


Data Sets to test data mining algorithms:


Data mining journal (Read Usama M. Fayyad's editorial.):


Interesting application of data mining:


Data mining papers:


Data mining conferences:


Conference on very large databases:


Sites for datamining vendors and products:

American Heuristics (Profiler)

Angoss software (Knowledge Seeker)

Attar Software (XpertRule Profiler)

Business Objects (BusinessMiner)

DataMind (DataMind Professional)

HNC Software (DataMarksman, Falcon)

HyperParallel (Discovery)

Information Discovery Inc, data mining for business development. (Information Discovery System)

Integral Solutions (Clementine)

IBM (Intelligent Data Miner)

Lucent Technologies (Interactive Data Visualization)

NCR (Knowledge Discovery Benchmark)

NeoVista Sloutions (Decision Series)

Nestor (Prism)

Pilot Software (Pilot Discovery Server)

Seagate Software Systems (Holos 5.0)


Thinking Machines (Darwin)


Go to Top of Page







Data mining techniques

Data Mining Specialist Resume; A data mining specialist is a person is to enhance business processes. A data mining specialist is and development of. Big data caused an explosion in the use of more extensive data mining complex data mining. The business problem drives to powerful development. Learning Data Analytics Course by: from data mining to business intelligence, Robin Hunt is a developer/educator specializing in process development, data.

Learning Data Analytics Course by: from data mining to business intelligence, Robin Hunt is a developer/educator specializing in process development, data. Data Mining Executive Overview Alan Montgomery VP Business Development, SPSS. Report. Data Mining Executive Overview Alan Montgomery VP Business Development, . Big data caused an explosion in the use of more extensive data mining complex data mining. The business problem drives to powerful development.

Data mining techniques

Martin Brown
Published on December 11, 2012

Data mining as a process

Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge. Data mining principles have been around for many years, but, with the advent of big data, it is even more prevalent.

Big data caused an explosion in the use of more extensive data mining techniques, partially because the size of the information is much larger and because the information tends to be more varied and extensive in its very nature and content. With large data sets, it is no longer enough to get relatively simple and straightforward statistics out of the system. With 30 or 40 million records of detailed customer information, knowing that two million of them live in one location is not enough. You want to know whether those two million are a particular age group and their average earnings so that you can target your customer needs better.

These business-driven needs changed simple data retrieval and statistics into more complex data mining. The business problem drives an examination of the data that helps to build a model to describe the information that ultimately leads to the creation of the resulting report. Figure 1 outlines the process.

Outline of the process

View image at full size

The process of data analysis, discovery, and model-building is often iterative as you target and identify the different information that you can extract. You must also understand how to relate, map, associate, and cluster it with other data to produce the result. Identifying the source data and formats, and then mapping that information to our given result can change after you discover different elements and aspects of the data.

Data mining tools

Data mining is not all about the tools or database software that you are using. You can perform data mining with comparatively modest database systems and simple tools, including creating and writing your own, or using off the shelf software packages. Complex data mining benefits from the past experience and algorithms defined with existing software and packages, with certain tools gaining a greater affinity or reputation with different techniques.

For example, IBM SPSS®, which has its roots in statistical and survey analysis, can build effective predictive models by looking at past trends and building accurate forecasts. IBM InfoSphere® Warehouse provides data sourcing, preprocessing, mining, and analysis information in a single package, which allows you to take information from the source database straight to the final report output.

It is recent that the very large data sets and the cluster and large-scale data processing are able to allow data mining to collate and report on groups and correlations of data that are more complicated. Now an entirely new range of tools and systems available, including combined data storage and processing systems.

You can mine data with a various different data sets, including, traditional SQL databases, raw text data, key/value stores, and document databases. Clustered databases, such as Hadoop, Cassandra, CouchDB, and Couchbase Server, store and provide access to data in such a way that it does not match the traditional table structure.

In particular, the more flexible storage format of the document database causes a different focus and complexity in terms of processing the information. SQL databases impost strict structures and rigidity into the schema, which makes querying them and analyzing the data straightforward from the perspective that the format and structure of the information is known.

Document databases that have a standard such as JSON enforcing structure, or files that have some machine-readable structure, are also easier to process, although they might add complexities because of the differing and variable structure. For example, with Hadoop's entirely raw data processing it can be complex to identify and extract the content before you start to process and correlate the it.

Key techniques

Several core techniques that are used in data mining describe the type of mining and data recovery operation. Unfortunately, the different companies and solutions do not always share terms, which can add to the confusion and apparent complexity.

Let's look at some key techniques and examples of how to use different tools to build the data mining.


Association (or relation) is probably the better known and most familiar and straightforward data mining technique. Here, you make a simple correlation between two or more items, often of the same type to identify patterns. For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.

Building association or relation-based data mining tools can be achieved simply with different tools. For example, within InfoSphere Warehouse a wizard provides configurations of an information flow that is used in association by examining your database input source, decision basis, and output information. Figure 2 shows an example from the sample database.

Information flow that is used in association

View image at full size


You can use classification to build up an idea of the type of customer, item, or object by describing multiple attributes to identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a particular class by comparing the attributes with our known definition. You can apply the same principles to customers, for example by classifying them by age and social group.

Additionally, you can use classification as a feeder to, or the result of, other techniques. For example, you can use decision trees to determine a classification. Clustering allows you to use common attributes in different classifications to identify clusters.


By examining one or more attributes or classes, you can group individual pieces of data together to form a structure opinion. At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating results. Clustering is useful to identify different information because it correlates with other examples so you can see where the similarities and ranges agree.

Clustering can work both ways. You can assume that there is a cluster at a certain point and then use our identification criteria to see if you are correct. The graph in Figure 3 shows a good example. In this example, a sample of sales data compares the age of the customer to the size of the sale. It is not unreasonable to expect that people in their twenties (before marriage and kids), fifties, and sixties (when the children have left home), have more disposable income.


View image at full size

In the example, we can identify two clusters, one around the US$2,000/20-30 age group, and another at the US$7,000-8,000/50-65 age group. In this case, we've both hypothesized and proved our hypothesis with a simple graph that we can create using any suitable graphing software for a quick manual view. More complex determinations require a full analytical package, especially if you want to automatically base decisions on nearest neighbor information.

Plotting clustering in this way is a simplified example of so called nearest neighbor identity. You can identify individual customers by their literal proximity to each other on the graph. It's highly likely that customers in the same cluster also share other attributes and you can use that expectation to help drive, classify, and otherwise analyze other people from your data set.

You can also apply clustering from the opposite perspective; given certain input attributes, you can identify different artifacts. For example, a recent study of 4-digit PIN numbers found clusters between the digits in ranges 1-12 and 1-31 for the first and second pairs. By plotting these pairs, you can identify and determine clusters to relate to dates (birthdays, anniversaries).


Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a prediction about an event.

Using the credit card authorization, for example, you might combine decision tree analysis of individual past transactions with classification and historical pattern matches to identify whether a transaction is fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the transaction is valid.

Sequential patterns

Oftern used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences of similar events. For example, with customer data you can identify that customers buy a particular collection of products together at different times of the year. In a shopping basket application, you can use this information to automatically suggest that certain items be added to a basket based on their frequency and past purchasing history.

Decision trees

Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer.

Figure 4 shows an example where you can classify an incoming error condition.

Decision tree

Decision trees are often used with classification systems to attribute type information, and with predictive systems, where different predictions might be based on past historical experience that helps drive the structure of the decision tree and the output.


In practice, it's very rare that you would use one of these exclusively. Classification and clustering are similar techniques. By using clustering to identify nearest neighbors, you can further refine your classifications. Often, we use decision trees to help build and identify classifications that we can track for a longer period to identify sequences and patterns.

Long-term (memory) processing

Within all of the core methods, there is often reason to record and learn from the information. In some techniques, it is entirely obvious. For example, with sequential patterns and predictive learning you look back at data from multiple sources and instances of information to build a pattern.

In others, the process might be more explicit. Decision trees are rarely built one time and are never forgotten. As new information, events, and data points are identified, it might be necessary to build more branches, or even entirely new trees, to cope with the additional information.

You can automate some of this process. For example, building a predictive model for identifying credit card fraud is about building probabilities that you can use for the current transaction, and then updating that model with the new (approved) transaction. This information is then recorded so that the decision can be made quickly the next time.

Data implementations and preparation

Data mining itself relies upon building a suitable data model and structure that can be used to process, identify, and build the information that you need. Regardless of the source data form and structure, structure and organize the information in a format that allows the data mining to take place in as efficient a model as possible.

Consider the combination of the business requirements for the data mining, the identification of the existing variables (customer, values, country) and the requirement to create new variables that you might use to analyze the data in the preparation step.

You might compose the analytical variables of data from many different sources to a single identifiable structure (for example, you might create a class of a particular grade and age of customer, or a particular error type).

Depending on your data source, how you build and translate this information is an important step, regardless of the technique you use to finally analyze the data. This step also leads to a more complex process of identifying, aggregating, simplifying, or expanding the information to suit your input data (see Figure 5).

Data preparation

View image at full size

Your source data, location, and database affects how you process and aggregate that information.

Building on SQL

Building on an SQL database is often the easiest of all the approaches. SQL (and the underlying table structure they imply) is well understood, but you cannot completely ignore the structure and format of the information. For example, when you examine user behavior in sales data, there are two primary formats within the SQL data model (and data-mining in general) that you can use: transactional and the behavioral-demographic.

When you use InfoSphere Warehouse, creating a behavioral-demographic model for the purposes of mining customer data to understand buying and purchasing patterns involves taking your source SQL data based upon the transaction information and known parameters of your customers, and rebuilding that information into a predefined table structure. InfoSphere Warehouse can then use this information for the clustering and classification data mining to get the information you need. Customer demographic data, and sales transaction data can be combined and then reconstituted into a format that allows for specific data analysis, as shown in Figure 6.

Format for specific data analysis

View image at full size

For example, with sales data you might want to identify the sales trends of particular items. You can convert the raw sales data of the individual items into transactional information that maps the customer ID, transaction data, and product ID. By using this information, it is easy to identify sequences and relationships for individual products by individual customers over time. That enables InfoSphere Warehouse to calculate sequential information, such as when a customer is likely to buy the same product again.

You can build new data analysis points from the source data. For example, you might want to expand (or refine) your product information by collating or classifying individual products into wider groups, and then analyzing the data based on these groups in place of an individual.

For example, Table 1 shows how to expand the information in new ways.

A table of products expanded
101strawberries, loosestrawberriesfruit
102strawberries, boxstrawberriesfruit
110bananas, loosebananasfruit

Document databases and MapReduce

The MapReduce processing of many modern document and NoSQL databases, such as Hadoop, are designed to cope with the very large data sets and information that does not always follow a tabular format. When you work with data mining software, this notion can be both a benefit and a problem.

The main issue with document-based data is that the unstructured format might require more processing than you expect to get the information you need. Many different records can hold similar data. Collecting and harmonizing this information to process it more easily relies upon the preparation and MapReduce stages.

Within a MapReduce-based system, it is the role of the map step to take the source data and normalize that information into a standard form of output. This step can be a relatively simple process (identify key fields or data points), or more complex (parsse and processing the information to produce the sample data). The mapping process produces the standardized format that you can use as your base.

Reduction is about summarizing or quantifying the information and then outputting that information in a standardized structure that is based upon the totals, sums, statistics, or other analysis that you selected for output.

Querying this data is often complex, even when you use tools designed to do so. Within a data mining exercise, the ideal approach is to use the MapReduce phase of the data mining as part of your data preparation exercise.

For example, if you are building a data mining exercise for association or clustering, the best first stage is to build a suitable statistic model that you can use to identify and extract the necessary information. Use the MapReduce phase to extract and calculate that statistical information then input it to the rest of the data mining process, leading to a structure such as the one shown in Figure 7.

MapReduce structure

View image at full size

In the previous example, we've taken the processing (in this case MapReduce) of the source data in a document database and translated it into a tabular format in an SQL database for the purposes of data mining.

Working with this complex, even unformatted, information can require preparation and processing that is more complex. There are certain complex data types and structures that cannot be processed and prepared in one step into the output that you need. Here you can chain the output of your MapReduce either to map and produce the data structure that you need sequentially, as in Figure 8, or individually to produce multiple output tables of data.

Chaining output of your MapReduce sequentially

View image at full size

For example, taking raw logging information from a document database and running MapReduce to produce a summarized view of the information by date can be done in a single pass. Regenerating the information and combining that output with a decision matrix (encoded in the second MapReduce phase), and then further simplified to a sequential structure, is a good example of the chaining process. We require whole set data in the MapReduce phase to support the individual step data.

Regardless of your source data, many tools can use flat file, CSV, or other data sources. InfoSphere Warehouse, for example, can parse flat files in addition to a direct link to a DB2 data warehouse.


Learn more. Develop more. Connect more.

The new developerWorks Premium membership program provides an all-access pass to powerful development tools and resources, including 500 top technical titles for application developers through Safari Books Online, deep discounts on premier developer events, video replays of recent O'Reilly conferences, and more. Sign up today.

Data mining is more than running some complex queries on the data you stored in your database. You must work with your data, reformat it, or restructure it, regardless of whether you are using SQL, document-based databases such as Hadoop, or simple flat files. Identifying the format of the information that you need is based upon the technique and the analysis that you want to do. After you have the information in the format you need, you can apply the different techniques (individually or together) regardless of the required underlying data structure or data set.

Downloadable resources


XMR MINING RYZEN 50 bit mining
Scrypt hash mining Highwall mining systems
Gold field mining Best coins for gpu mining
Data mining for business development 546

1 thoughts on “Data mining for business development

Add comment

E-mail *