About data mining and warehousing

By | 09.01.2018
3

ASEE 2014 Zone I Conference, April 3-5, 2014, University of Bridgeport, Bridgpeort, CT, USA. Data Mining and Warehousing Ali Radhi Al Essa School of Engineering. Data Warehousing and Data Mining (90s) Global/Integrated Information Systems (2000s) A.A. 04-05 Datawarehousing & Datamining 4 Introduction and Terminology. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful.
Collections of databases that work together are called data warehouses. This makes it possible to integrate data from multiple databases. Data. Database Design. Building a well-designed database is both an art and a science. Classification in Data Mining. Article. What is Transitive Dependency in a Database. ASEE 2014 Zone I Conference, April 3-5, 2014, University of Bridgeport, Bridgpeort, CT, USA. Data Mining and Warehousing Ali Radhi Al Essa School of Engineering. Data Warehousing and Data Mining (90s) Global/Integrated Information Systems (2000s) A.A. 04-05 Datawarehousing & Datamining 4 Introduction and Terminology. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful.

Warehousing Data: The Data Warehouse, Data Mining, and OLAP

Warehousing data is based on the premise that the quality of a manager's decisions is based, at least in part,on the quality of his information. The goal of storing data in a xcp mining pool system is thus to have the means to provide them with the right building blocks for sound information and knowledge, about data mining and warehousing. Data warehouses contain information ranging from measurements of performance to competitive intelligence (Tanler 1997).

Data mining tools and techniques can be used to search stored data for patterns that might lead to new insights. Furthermore, the data warehouse is usually the driver of data-driven decision support systems (DSS), discussed in the following subsection.

Thierauf (1999) describes the process of warehousing data, extraction, and distribution. First data extraction of operational production data takes place, and this data is passed on to the warehouse database. A server hosts the data warehouse and the DSS. This server then passes on the extracted data to the warehouse database, which is employed by users to extract data through some form of software.

Theirauf's model for data warehousing is as follows:


Warehousing Data: Design and Implementation

Tanler (1997) identifies three stages in the design and implementation of the data warehouse. The first stage is largely concerned with identifying the critical success factors of the enterprise, so as to determine the focus of the systems applied to the warehouse. The next step is to identify the information needs of the decision makers. This involves the about data mining and warehousing of current information lacks and the stages of the decision-making process (i.e. the time taken to analyze data and arrive at a decision). Finally, warehousing data should be implemented in a way that ensures that users understand the benefit early on. The size of the database and the complexity of the analytical requirements must be determined. Deployment issues, such as how users will receive the information, how routine decisions must be automated, and how users with varying technical skills can access the data, must be addressed.

According to Frank (2002), the success of the implementation of the data warehouse depends on:

  • Accurately specifying user information needs
  • Implementing metadata: Metadata is essentially data about data. This is regarded as a particularly crucial step. Parankusham & Madupu (2006) outline the different roles of metadata as including: data characterization and indexing, the facilitation or restriction of data access, and the determination of the source and currency of data. They further identify the lifecycle of metadata as:
    • Collection: Identification and capture
    • Maintenance: Updating of metadata to match changes in data architecture
    • Deployment: Users access the relevant metadata, based on their needs.
    To this, we can add the 5 criteria presented on the www.syntelinc.com website:
  • Recognize that the job is probably harder than you expect: A large portion of the data in data warehouses is incorrect, missing, or input in such a way that it is not usable (e.g. historical databases that have not been updated to modern schemas).
  • Understand the data in your existing systems: Analyze existing databases. Identify relationships between existing data systems so as to avoid inconsistencies when these are moved to the warehouse.
  • Be sure to recognize equivalent entities: Identify equivalent entities in heterogeneous systems, which may appear under a different name.
  • Emphasize early wins to build support throughout the organization
  • Consider outsourcing your data warehouse development and maintenance: Implementing a data warehouse can be a huge task that can often be better handled by experts. Many data warehousing applications are suited for outsourcing.

If properly designed and implemented, the goal of warehousing data is to drastically reduce the time required in the decision making process. To do so, it employs three tools, namely Online Analytical Processing System (OLAP), data mining, and data visualization (Parankusham & Madupu 2006).


OLAP

OLAP allows three functions to be carried out.

  • Query and reporting: Ability to formulate queries without having to use the database programming language.
  • Multidimensional analysis: The ability to carry out analyses from multiple perspectives. Tanler (1997) provides an example of a product analysis that can be then repeated for each market segment, about data mining and warehousing. This allows for quick comparison of data relationships from different areas (e.g. by location, time, etc.). This analysis can include customers, markets, products, and so on,
  • Statistical analysis: This function attempts to reduce the large quantities of data into formulas that capture the answer to the query.

OLAP is basically responsible for telling the user what happened to the organization (Theirauf 1999). It thus enhances understanding reactively, using summarization of data and information.


What is Data Mining?

This is another process used to try to create useable knowledge or information from data warehousing. Data mining, unlike statistical analysis, does not start with a preconceived hypothesis about the data, and the technique is more suited for heterogeneous databases and date sets (Bali et al 2009). Karahoca and Ponce (2009) describe data mining as "an important tool for the mission critical applications to minimize, filter, extract or transform large databases or datasets into summarized information and exploring hidden patterns in knowledge discovery (KD)." The knowledge discovery aspect is emphasized by Bali et al (2009), since the management of this new knowledge falls within the KM discipline.

It is beyond the scope of this site to offer an in-depth look at the data mining process. Instead, I will present a very brief overview, and point readers that are interested in the technical aspects towards free sources of information.

Very briefly, data mining employs a wide range of tools and systems, including symbolic methods and statistical analysis. According to Botha et al (2008), symbolic methods look for pattern primitives by using pattern description languages so as to find structure. Statistical methods on the other hand measure and plot important characteristics, which are then divided into classes and clusters.

Data mining is a very complex process with different process models. One is the CRoss-Industry Standard Process for Data Mining (or Crisp-DM). The process involves six steps (Maraban et al, in Karahoca & Ponce 2009):

Business understanding -> data understanding -> data preparation -> modeling -> evaluation -> deployment

For more on data mining see the book "Data Mining and Knowledge Discovery in Real Life Applications", about data mining and warehousing, edited by Ponce & Karahoca (2009), available for free from intechopen.com where numerous other potentially relevant resources can also be downloaded.


Data Visualization

This process involves representing data and information graphically so as to better communicate its content to the user. It is a way to make data patterns more visible, more accessible, easier to compare, and easier to communicate. Data visualization includes graphical interfaces, tables, graphs, images, 3D presentations, animation, about data mining and warehousing, and so on (Turban & Aaronson in Parankusham & Madupu 2006).

DSS are other tools used in conjunction with warehousing data. These are discussed in the following about data mining and warehousing Frost M.Sc., ruschrome mining - Updated 2015

Источник:




Warehousing Data

In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful. Enterprise data is the lifeblood of a corporation, but it's useless if it's left to languish in data silos. Data warehousing and mining provide the tools to bring. Warehousing data: A discussion of the implementation of data warehouses and analysis techniques, consisting of data mining, OLAP, and data visualization. Bill Palace’s paper on Data Mining has been a major success from the perspective that it is still available and listed on the first page of a Google or a Yahoo.

Enterprise data is the lifeblood of a corporation, but it's useless if it's left to languish in data silos. Data warehousing and mining provide the tools to bring. Data Warehousing and Data Mining (90s) Global/Integrated Information Systems (2000s) A.A. 04-05 Datawarehousing & Datamining 4 Introduction and Terminology. Data Warehouse Overview analytical processing, and data mining are the three types of data warehouse applications that are discussed below.


Data Warehouses

A database consists of one or more files that need to be stored on a computer. In large organizations, databases are typically not stored on the individual computers of employees but in a central system. This central system typically consists of one or more computer servers. A server is a computer system that provides a service over a network. The server is often located in a room with controlled access, so only authorized personnel can get physical access to the server.

In a typical setting, the database files reside on the server, but they can be accessed from many different computers in the organization. As the number and complexity of databases grows, we start referring to them together as a data warehouse.

A data warehouse is a collection of databases that work together. A data warehouse makes it possible to integrate data from multiple databases, which can give new insights into the data. The ultimate goal of a database is not just to store data, but to help businesses make decisions based on that data. A data warehouse supports this goal by providing an architecture and tools to systematically organize and understand data from multiple databases.

Distributed DBMS

As databases get larger, it becomes increasingly difficult to keep the entire database in a single physical location. Not only does storage capacity become an issue, there are also security and performance considerations. Consider a company with several offices around the world.

It is possible to create one large, single database at the main office and have all other offices connect to this database. However, every single time an employee needs to work with the database, this employee needs to create a connection over thousands of miles, through numerous network nodes. As long as you are moving relatively small amounts of data around, this does not present a major challenge.

But, what if the database is huge? It is not very efficient to move large amounts of data back and forth over the network. It may be more efficient to have a distributed database. This means that the database consists of multiple, interrelated databases stored at different computer network sites.

To a typical user, the distributed database appears as a centralized database. Behind the scenes, however, parts of that database are located in different places. The typical characteristics of a distributed database management system, or DBMS, are:

  • Multiple computer network sites are connected by a communication system
  • Data at any site are available to users at other sites
  • Data at each site are under control of the DBMS

You have probably used a distributed database without realizing it. For example, you may be using an e-mail account from one of the major service providers. Where exactly do your e-mails reside? Most likely, the company hosting the e-mail service uses several different locations without you knowing it.

The major advantage of distributed databases is that data access and processing is much faster. The major disadvantage is that the database is much more complex to manage. Setting up a distributed database is typically the task of a database administrator with very specialized database skills.

Data Mining

Источник:

About data mining and warehousing 71
ETHEREUM CLOUD MINING HASHFLARE 713
METAL MINING INTELLIGENCE 531
MINING CRAFTING GAMES 104

3 thoughts on “About data mining and warehousing

Add comment

E-mail *