Knowledge mining from databases

Data Mining Knowledge Discovery - Learn Data Mining in simple and easy steps starting from basic to advanced concepts with examples Overview, Tasks, Data Mining. Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. Knowledge Discovery and Data Mining focuses on the process of extracting meaningful patterns from biomedical data (knowledge discovery), database science.
Knowledge Discovery and Data Mining knowledge from data. The ongoing rapid growth of online data due to the Internet and the widespread use of databases. Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining . Data Mining Knowledge Discovery - Learn Data Mining in simple and easy steps starting from basic to advanced concepts with examples Overview, Tasks, Data Mining. Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. Knowledge Discovery and Data Mining focuses on the process of extracting meaningful patterns from biomedical data (knowledge discovery), database science.

Overview

Most people consider a database is merely a data repository that supports data storage and retrieval.
Actually, a database contains rich, inter-related, multi-typed data and information, forming one or mining in kentucky set of gigantic, knowledge mining from databases, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology.

In this talk, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality knowledge mining from databases consistency, facilitate data integration, and generate interesting knowledge. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks

Speakers

Jiawei Han

Jiawei Han, Endowed University Professor, Director of Intelligent Database Systems Research Laboratory, School of Computing Science, knowledge mining from databases, Simon Fraser University, Canada. He has been researching into data mining, database systems, data warehousing, spatial and multimedia databases, deductive and object-oriented databases, Web databases, bio-medical databases, etc. with over 150 journal and conference publications. He has chaired or served in many program committees of international conferences and workshops, including 2002 ICDE conference (vice PC chairman), 2002 and 2001 SIAM-Data Mining conference (PC co-chair), 2001 ACM SIGKDD conference (best paper award chair), 2000 ACM SIGMOD conference (demo/exhibit program chair), and KDD’96 (PC co-chair). He has also served on the editorial boards for IEEE Transactions on Knowledge and Data Engineering, Journal of Intelligent Information Systems, and Data Mining and Knowledge Discovery. His book “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001) has been popularly adopted as textbook in many universities

Источник:




Knowledge Discovery in Databases

Knowledge Discovery and Data Mining focuses on the process of extracting meaningful patterns from biomedical data (knowledge discovery), database science. Databases, Data Mining & Knowledge Discovery Charlotte Seckman, PhD, RN-BC Assistant Professor, Course Director University of Maryland School of Nursing. Knowledge Discovery and Data Mining 1. KDD: A Definition• KDD is the automatic extraction of non-obvious, hidden knowledge from large volumes of data. Knowledge Discovery in Databases is the process of searching for hidden knowledge in the massive amounts of data that "From Data Mining to Knowledge. Knowledge extraction is the creation of The most well-known branch of data mining is knowledge discovery, also known as knowledge discovery in databases. Introduction to Knowledge Discovery in Databases 3 Taxonomy is appropriate for the Data Mining methods and is presented in the next section. Figure 1.1.

Knowledge Discovery in Databases: 10 years after SIGKDD Explorations, Vol 1, No 2, Feb 2000 Gregory Piatetsky-Shapiro KDnuggets The pre-history of KDD. Introduction to Knowledge Discovery in Databases 3 Taxonomy is appropriate for the Data Mining methods and is presented in the next section. Figure 1.1. Knowledge Discovery and Data Mining knowledge from data. The ongoing rapid growth of online data due to the Internet and the widespread use of databases.


Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.

The RDB2RDF W3C group [1] is currently standardizing a language for extraction of RDF from relational databases. Another popular example for knowledge extraction is the transformation of Wikipedia into structured data and also the mapping to existing knowledge (see DBpedia and Freebase).

Overview[edit]

After the standardization of knowledge representation languages such as RDF and OWL, much research has been conducted in the area, especially regarding transforming relational databases into RDF, identity resolution, knowledge discovery and ontology learning. The general process uses traditional methods from information extraction and extract, transform, and load (ETL), which transform the data from the sources into structured formats.

The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):[2]

SourceWhich data sources are covered: Text, Relational Databases, XML, CSV
ExpositionHow is the extracted knowledge made explicit (ontology file, semantic database)? How can you query it?
SynchronizationIs the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Static or dynamic. Are changes to the result written back (bi-directional)
Reuse of vocabulariesThe tool is able to reuse existing vocabularies in the extraction. For example, the table column 'firstName' can be mapped to foaf:firstName. Some automatic approaches are not capable of mapping vocab.
AutomatizationThe degree to which the extraction is assisted/automated. Manual, GUI, semi-automatic, automatic.
Requires a domain ontologyA pre-existing ontology is needed to map to it. So either a mapping is created or a schema is learned from the source (ontology learning).

Examples[edit]

Entity linking[edit]

  1. DBpedia Spotlight, OpenCalais, Dandelion dataTXT, the Zemanta API, Extractiv and PoolParty Extractor analyze free text via named-entity recognition and then disambiguates candidates via name resolution and links the found entities to the DBpedia knowledge repository[3] (Dandelion dataTXT demo or DBpedia Spotlight web demo or PoolParty Extractor Demo).

President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.

As President Obama is linked to a DBpedia LinkedData resource, further information can be retrieved automatically and a Semantic Reasoner can for example infer that the mentioned entity is of the type Person (using FOAF (software)) and of type Presidents of the United States (using YAGO). Counter examples: Methods that only recognize entities or link to Wikipedia articles and other targets that do not provide further retrieval of structured data and formal knowledge.

Relational databases to RDF[edit]

  1. Triplify, D2R Server, Ultrawrap, and Virtuoso RDF Views are tools that transform relational databases to RDF. During this process they allow reusing existing vocabularies and ontologies during the conversion process. When transforming a typical relational table named users, one column (e.g.name) or an aggregation of columns (e.g.first_name and last_name) has to provide the URI of the created entity. Normally the primary key is used. Every other column can be extracted as a relation with this entity.[4] Then properties with formally defined semantics are used (and reused) to interpret the information. For example, a column in a user table called marriedTo can be defined as symmetrical relation and a column homepage can be converted to a property from the FOAF Vocabulary called foaf:homepage, thus qualifying it as an inverse functional property. Then each entry of the user table can be made an instance of the class foaf:Person (Ontology Population). Additionally domain knowledge (in form of an ontology) could be created from the status_id, either by manually created rules (if status_id is 2, the entry belongs to class Teacher ) or by (semi)-automated methods (ontology learning). Here is an example transformation:
:Peter:marriedTo:Mary.:marriedToaowl:SymmetricProperty.:Peterfoaf:homepage<http://example.org/Peters_page>.:Peterafoaf:Person.:Petera:Student.:Clausa:Teacher.

[edit]

1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values[edit]

When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:

  • Each column in the table is an attribute (i.e., predicate)
  • Each column value is an attribute value (i.e., object)
  • Each row key represents an entity ID (i.e., subject)
  • Each row represents an entity instance
  • Each row (entity instance) is represented in RDF by a collection of triples with a common subject (entity ID).

So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows:

  1. create an RDFS class for each table
  2. convert all primary keys and foreign keys into IRIs
  3. assign a predicate IRI to each column
  4. assign an rdf:type predicate for each row, linking it to an RDFS class IRI corresponding to the table
  5. for each column that is neither part of a primary or foreign key, construct a triple containing the primary key IRI as the subject, the column IRI as the predicate and the column's value as the object.

Early mentioning of this basic or direct mapping can be found in Tim Berners-Lee's comparison of the ER model to the RDF model.[4]

Complex mappings of relational databases to RDF[edit]

The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in object-relational impedance mismatch) and has to be reverse engineered. From a conceptual view, approaches for extraction can come from two directions. The first direction tries to extract or learn an OWL schema from the given database schema. Early approaches used a fixed amount of manually created mapping rules to refine the 1:1 mapping.[5][6][7] More elaborate methods are employing heuristics or learning algorithms to induce schematic information (methods overlap with ontology learning). While some approaches try to extract the information from the structure inherent in the SQL schema[8] (analysing e.g. foreign keys), others analyse the content and the values in the tables to create conceptual hierarchies[9] (e.g. a columns with few values are candidates for becoming categories). The second direction tries to map the schema and its contents to a pre-existing domain ontology (see also: ontology alignment). Often, however, a suitable domain ontology does not exist and has to be created first.

XML[edit]

As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph. XML2RDF is one example of an approach that uses RDF blank nodes and transforms XML elements and attributes to RDF properties. The topic however is more complex as in the case of relational databases. In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples. An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple. XSLT can be used a standard transformation language to manually convert XML to RDF.

Survey of methods / tools[edit]

NameData SourceData ExpositionData SynchronisationMapping LanguageVocabulary ReuseMapping Automat.Req. Domain OntologyUses GUI
A Direct Mapping of Relational Data to RDFRelational DataSPARQL/ETLdynamicN/Afalseautomaticfalsefalse
CSV2RDF4LODCSVETLstaticRDFtruemanualfalsefalse
Convert2RDFDelimited text fileETLstaticRDF/DAMLtruemanualfalsetrue
D2R ServerRDBSPARQLbi-directionalD2R Maptruemanualfalsefalse
DartGridRDBown query languagedynamicVisual Tooltruemanualfalsetrue
DataMasterRDBETLstaticproprietarytruemanualtruetrue
Google Refine's RDF ExtensionCSV, XMLETLstaticnonesemi-automaticfalsetrue
KrextorXMLETLstaticxslttruemanualtruefalse
MAPONTORDBETLstaticproprietarytruemanualtruefalse
METAmorphosesRDBETLstaticproprietary xml based mapping languagetruemanualfalsetrue
MappingMasterCSVETLstaticMappingMastertrueGUIfalsetrue
ODEMapsterRDBETLstaticproprietarytruemanualtruetrue
OntoWiki CSV Importer Plug-in - DataCube & TabularCSVETLstaticThe RDF Data Cube Vocaublarytruesemi-automaticfalsetrue
Poolparty Extraktor (PPX)XML, TextLinkedDatadynamicRDF (SKOS)truesemi-automatictruefalse
RDBToOntoRDBETLstaticnonefalseautomatic, the user furthermore has the chance to fine-tune resultsfalsetrue
RDF 123CSVETLstaticfalsefalsemanualfalsetrue
RDOTERDBETLstaticSQLtruemanualtruetrue
Relational.OWLRDBETLstaticnonefalseautomaticfalsefalse
T2LDCSVETLstaticfalsefalseautomaticfalsefalse
The RDF Data Cube VocabularyMultidimensional statistical data in spreadsheetsData Cube Vocabularytruemanualfalse
TopBraid ComposerCSVETLstaticSKOSfalsesemi-automaticfalsetrue
TriplifyRDBLinkedDatadynamicSQLtruemanualfalsefalse
UltrawrapRDBSPARQL/ETLdynamicR2RMLtruesemi-automaticfalsetrue
Virtuoso RDF ViewsRDBSPARQLdynamicMeta Schema Languagetruesemi-automaticfalsetrue
Virtuoso Spongerstructured and semi-structured data sourcesSPARQLdynamicVirtuoso PL & XSLTtruesemi-automaticfalsefalse
VisAVisRDBRDQLdynamicSQLtruemanualtruetrue
XLWrap: Spreadsheet to RDFCSVETLstaticTriG Syntaxtruemanualfalsefalse
XML to RDFXMLETLstaticfalsefalseautomaticfalsefalse

[edit]

The largest portion of information contained in business documents (about 80%[10]) is encoded in natural language and therefore unstructured. Because unstructured data is rather a challenge for knowledge extraction, more sophisticated methods are required, which generally tend to supply worse results compared to structured data. The potential for a massive acquisition of extracted knowledge, however, should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data is given in an unstructured fashion as plain text. If the given text is additionally embedded in a markup document (e. g. HTML document), the mentioned systems normally remove the markup elements automatically.

[edit]

Traditional information extraction[11] is a technology of natural language processing, which extracts information from typically natural language texts and structures these in a suitable manner. The kinds of information to be identified must be specified in a model before beginning the process, which is why the whole process of traditional Information Extraction is domain dependent. The IE is split in the following five subtasks.

The task of named entity recognition is to recognize and to categorize all named entities contained in a text (assignment of a named entity to a predefined category). This works by application of grammar based methods or statistical models.

Coreference resolution identifies equivalent entities, which were recognized by NER, within a text. There are two relevant kinds of equivalence relationship. The first one relates to the relationship between two different represented entities (e.g. IBM Europe and IBM) and the second one to the relationship between an entity and their anaphoric references (e.g. it and IBM). Both kinds can be recognized by coreference resolution.

During template element construction the IE system identifies descriptive properties of entities, recognized by NER and CO. These properties correspond to ordinary qualities like red or big.

Template relation construction identifies relations, which exist between the template elements. These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities.

In the template scenario production events, which are described in the text, will be identified and structured with respect to the entities, recognized by NER and CO and relations, identified by TR.

[edit]

Ontology-based information extraction [10] is a subfield of information extraction, with which at least one ontology is used to guide the process of information extraction from natural language text. The OBIE system uses methods of traditional information extraction to identify concepts, instances and relations of the used ontologies in the text, which will be structured to an ontology after the process. Thus, the input ontologies constitute the model of information to be extracted.

Ontology learning (OL)[edit]

Main article: Ontology learning

Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms from natural language text. As building ontologies manually is extremely labor-intensive and time consuming, there is great motivation to automate the process.

Semantic annotation (SA)[edit]

During semantic annotation,[12] natural language text is augmented with metadata (often represented in RDFa), which should make the semantics of contained terms machine-understandable. At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and for example concepts from ontologies is established. Thus, knowledge is gained, which meaning of a term in the processed context was intended and therefore the meaning of the text is grounded in machine-readable data with the ability to draw inferences. Semantic annotation is typically split into the following two subtasks.

  1. Terminology extraction
  2. Entity linking

At the terminology extraction level, lexical terms from the text are extracted. For this purpose a tokenizer determines at first the word boundaries and solves abbreviations. Afterwards terms from the text, which correspond to a concept, are extracted with the help of a domain-specific lexicon to link these at entity linking.

In entity linking [13] a link between the extracted lexical terms from the source text and the concepts from an ontology or knowledge base such as DBpedia is established. For this, candidate-concepts are detected appropriately to the several meanings of a term with the help of a lexicon. Finally, the context of the terms is analyzed to determine the most appropriate disambiguation and to assign the term to the correct concept.

Tools[edit]

The following criteria can be used to categorize tools, which extract knowledge from natural language text.

SourceWhich input formats can be processed by the tool (e.g. plain text, HTML or PDF)?
Access ParadigmCan the tool query the data source or requires a whole dump for the extraction process?
Data SynchronizationIs the result of the extraction process synchronized with the source?
Uses Output OntologyDoes the tool link the result with an ontology?
Mapping AutomationHow automated is the extraction process (manual, semi-automatic or automatic)?
Requires OntologyDoes the tool need an ontology for the extraction?
Uses GUIDoes the tool offer a graphical user interface?
ApproachWhich approach (IE, OBIE, OL or SA) is used by the tool?
Extracted EntitiesWhich types of entities (e.g. named entities, concepts or relationships) can be extracted by the tool?
Applied TechniquesWhich techniques are applied (e.g. NLP, statistical methods, clustering or machine learning)?
Output ModelWhich model is used to represent the result of the tool (e. g. RDF or OWL)?
Supported DomainsWhich domains are supported (e.g. economy or biology)?
Supported LanguagesWhich languages can be processed (e.g. English or German)?

The following table characterizes some tools for Knowledge Extraction from natural language sources.

NameSourceAccess ParadigmData SynchronizationUses Output OntologyMapping AutomationRequires OntologyUses GUIApproachExtracted EntitiesApplied TechniquesOutput ModelSupported DomainsSupported Languages
AeroText[14]plain text, HTML, XML, SGMLdumpnoyesautomaticyesyesIEnamed entities, relationships, eventslinguistic rulesproprietarydomain-independentEnglish, Spanish, Arabic, Chinese, indonesian
AlchemyAPI[15]plain text, HTMLautomaticyesSAmultilingual
ANNIE[16]plain textdumpyesyesIEfinite state algorithmsmultilingual
ASIUM[17]plain textdumpsemi-automaticyesOLconcepts, concept hierarchyNLP, clustering
Attensity Exhaustive Extraction[18]automaticIEnamed entities, relationships, eventsNLP
Dandelion APIplain text, HTML, URLRESTnonoautomaticnoyesSAnamed entities, conceptsstatistical methodsJSONdomain-independentmultilingual
DBpedia Spotlight[19]plain text, HTMLdump, SPARQLyesyesautomaticnoyesSAannotation to each word, annotation to non-stopwordsNLP, statistical methods, machine learningRDFadomain-independentEnglish
EntityClassifier.euplain text, HTMLdumpyesyesautomaticnoyesIE, OL, SAannotation to each word, annotation to non-stopwordsrule-based grammarXMLdomain-independentEnglish, German, Dutch
K-Extractor[20][21]plain text, HTML, XML, PDF, MS Office, e-maildump, SPARQLyesyesautomaticnoyesIE, OL, SAconcepts, named entities, instances, concept hierarchy, generic relationships, user-defined relationships, events, modality, tense, entity linking, event linking, sentimentNLP, machine learning, heuristic rulesRDF, OWL, proprietary XMLdomain-independentEnglish, Spanish
iDocument[22]HTML, PDF, DOCSPARQLyesyesOBIEinstances, property valuesNLPpersonal, business
NetOwl Extractor[23]plain text, HTML, XML, SGML, PDF, MS OfficedumpNoYesAutomaticyesYesIEnamed entities, relationships, eventsNLPXML, JSON, RDF-OWL, othersmultiple domainsEnglish, Arabic Chinese (Simplified and Traditional), French, Korean, Persian (Farsi and Dari), Russian, Spanish
OntoGen[24]semi-automaticyesOLconcepts, concept hierarchy, non-taxonomic relations, instancesNLP, machine learning, clustering
OntoLearn[25]plain text, HTMLdumpnoyesautomaticyesnoOLconcepts, concept hierarchy, instancesNLP, statistical methodsproprietarydomain-independentEnglish
OntoLearn Reloadedplain text, HTMLdumpnoyesautomaticyesnoOLconcepts, concept hierarchy, instancesNLP, statistical methodsproprietarydomain-independentEnglish
OntoSyphon[26]HTML, PDF, DOCdump, search engine queriesnoyesautomaticyesnoOBIEconcepts, relations, instancesNLP, statistical methodsRDFdomain-independentEnglish
ontoX[27]plain textdumpnoyessemi-automaticyesnoOBIEinstances, datatype property valuesheuristic-based methodsproprietarydomain-independentlanguage-independent
OpenCalaisplain text, HTML, XMLdumpnoyesautomaticyesnoSAannotation to entities, annotation to events, annotation to factsNLP, machine learningRDFdomain-independentEnglish, French, Spanish
PoolParty Extractor[28]plain text, HTML, DOC, ODTdumpnoyesautomaticyesyesOBIEnamed entities, concepts, relations, concepts that categorize the text, enrichmentsNLP, machine learning, statistical methodsRDF, OWLdomain-independentEnglish, German, Spanish, French
Rosokaplain text, HTML, XML, SGML, PDF, MS OfficedumpYesYesAutomaticnoYesIEnamed entity extraction, entity resolution, relationship extraction, attributes, concepts, multi-vector sentiment analysis, geotagging, language identification, machine learningNLPXML, JSON, POJOmultiple domainsMultilingual 200+ Languages
SCOOBIEplain text, HTMLdumpnoyesautomaticnonoOBIEinstances, property values, RDFS typesNLP, machine learningRDF, RDFadomain-independentEnglish, German
SemTag[29][30]HTMLdumpnoyesautomaticyesnoSAmachine learningdatabase recorddomain-independentlanguage-independent
smart FIXplain text, HTML, PDF, DOC, e-MaildumpyesnoautomaticnoyesOBIEnamed entitiesNLP, machine learningproprietarydomain-independentEnglish, German, French, Dutch, polish
Text2Onto[31]plain text, HTML, PDFdumpyesnosemi-automaticyesyesOLconcepts, concept hierarchy, non-taxonomic relations, instances, axiomsNLP, statistical methods, machine learning, rule-based methodsOWLdeomain-independentEnglish, German, Spanish
Text-To-Onto[32]plain text, HTML, PDF, PostScriptdumpsemi-automaticyesyesOLconcepts, concept hierarchy, non-taxonomic relations, lexical entities referring to concepts, lexical entities referring to relationsNLP, machine learning, clustering, statistical methodsGerman
ThatNeedlePlain Textdumpautomaticnoconcepts, relations, hierarchyNLP, proprietaryJSONmultiple domainsEnglish
The Wiki Machine[33]plain text, HTML, PDF, DOCdumpnoyesautomaticyesyesSAannotation to proper nouns, annotation to common nounsmachine learningRDFadomain-independentEnglish, German, Spanish, French, Portuguese, Italian, Russian
ThingFinder[34]IEnamed entities, relationships, eventsmultilingual

Knowledge discovery[edit]

Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledgeabout the data.[35] It is often described as derivingknowledge from the input data. Knowledge discovery developed out of the data mining domain, and is closely related to it both in terms of methodology and terminology.[36]

The most well-known branch of data mining is knowledge discovery, also known as knowledge discovery in databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery. Often the outcomes from knowledge discovery are not actionable, actionable knowledge discovery, also known as domain driven data mining,[37] aims to discover and deliver actionable knowledge and insights.

Another promising application of knowledge discovery is in the area of software modernization, weakness discovery and compliance which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group (OMG) developed specification Knowledge Discovery Metamodel (KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery of existing code. Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain enormous value for risk management and business value, key for the evaluation and evolution of software systems. Instead of mining individual data sets, software mining focuses on metadata, such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.

Input data[edit]

Output formats[edit]

See also[edit]

References[edit]

  1. ^RDB2RDF Working Group, Website: http://www.w3.org/2001/sw/rdb2rdf/, charter: http://www.w3.org/2009/08/rdb2rdf-charter, R2RML: RDB to RDF Mapping Language: http://www.w3.org/TR/r2rml/
  2. ^LOD2 EU Deliverable 3.1.1 Knowledge Extraction from Structured Sources http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf
  3. ^"Life in the Linked Data Cloud". www.opencalais.com. Archived from the original on 2009-11-24. Retrieved 2009-11-10.  
  4. ^ abTim Berners-Lee (1998), "Relational Databases on the Semantic Web". Retrieved: February 20, 2011.
  5. ^Hu et al. (2007), "Discovering Simple Mappings Between Relational Database Schemas and Ontologies", In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐238, Busan, Korea, 11‐15 November 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf
  6. ^R. Ghawi and N. Cullot (2007), "Database-to-Ontology Mapping Generation for Semantic Interoperability". In Third International Workshop on Database Interoperability (InterDB 2007). http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf
  7. ^Li et al. (2005) "A Semi-automatic Ontology Acquisition Method for the Semantic Web", WAIM, volume 3739 of Lecture Notes in Computer Science, page 209-220. Springer. doi:10.1007/11563952_19
  8. ^Tirmizi et al. (2008), "Translating SQL Applications to the Semantic Web", Lecture Notes in Computer Science, Volume 5181/2008 (Database and Expert Systems Applications). http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf
  9. ^Farid Cerbah (2008). "Learning Highly Structured Semantic Repositories from Relational Databases", The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf
  10. ^ abWimalasuriya, Daya C.; Dou, Dejing (2010). "Ontology-based information extraction: An introduction and a survey of current approaches", Journal of Information Science, 36(3), p. 306 - 323, http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf (retrieved: 18.06.2012).
  11. ^Cunningham, Hamish (2005). "Information Extraction, Automatic", Encyclopedia of Language and Linguistics, 2, p. 665 - 677, http://gate.ac.uk/sale/ell2/ie/main.pdf (retrieved: 18.06.2012).
  12. ^Erdmann, M.; Maedche, Alexander; Schnurr, H.-P.; Staab, Steffen (2000). "From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools", Proceedings of the COLING, http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf (retrieved: 18.06.2012).
  13. ^Rao, Delip; McNamee, Paul; Dredze, Mark (2011). "Entity Linking: Finding Extracted Entities in a Knowledge Base", Multi-source, Multi-lingual Information Extraction and Summarization, http://www.cs.jhu.edu/~delip/entity-linking.pdf (retrieved: 18.06.2012).
  14. ^Rocket Software, Inc. (2012). "technology for extracting intelligence from text", http://www.rocketsoftware.com/products/aerotext (retrieved: 18.06.2012).
  15. ^Orchestr8 (2012): "AlchemyAPI Overview", http://www.alchemyapi.com/api (retrieved: 18.06.2012).
  16. ^The University of Sheffield (2011). "ANNIE: a Nearly-New Information Extraction System", http://gate.ac.uk/sale/tao/splitch6.html#chap:annie (retrieved: 18.06.2012).
  17. ^ILP Network of Excellence. "ASIUM (LRI)", http://www-ai.ijs.si/~ilpnet2/systems/asium.html (retrieved: 18.06.2012).
  18. ^Attensity (2012). "Exhaustive Extraction", http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ (retrieved: 18.06.2012).
  19. ^Mendes, Pablo N.; Jakob, Max; Garcia-Sílva, Andrés; Bizer; Christian (2011). "DBpedia Spotlight: Shedding Light on the Web of Documents", Proceedings of the 7th International Conference on Semantic Systems, p. 1 - 8, http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf (retrieved: 18.06.2012).
  20. ^Balakrishna, Mithun; Moldovan, Dan (2013). "Automatic Building of Semantically Rich Domain Models from Unstructured Data", Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference (FLAIRS), p. 22 - 27, http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS13/paper/view/5909/6036 (retrieved: 11.08.2014)
  21. ^2. Moldovan, Dan; Blanco, Eduardo (2012). "Polaris: Lymba's Semantic Parser", Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), p. 66 - 72, http://www.lrec-conf.org/proceedings/lrec2012/pdf/176_Paper.pdf (retrieved: 11.08.2014)
  22. ^Adrian, Benjamin; Maus, Heiko; Dengel, Andreas (2009). "iDocument: Using Ontologies for Extracting Information from Text", http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf (retrieved: 18.06.2012).
  23. ^SRA International, Inc. (2012). "NetOwl Extractor", http://www.sra.com/netowl/entity-extraction/ (retrieved: 18.06.2012).
  24. ^Fortuna, Blaz; Grobelnik, Marko; Mladenic, Dunja (2007). "OntoGen: Semi-automatic Ontology Editor", Proceedings of the 2007 conference on Human interface, Part 2, p. 309 - 318, http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf (retrieved: 18.06.2012).
  25. ^Missikoff, Michele; Navigli, Roberto; Velardi, Paola (2002). "Integrated Approach to Web Ontology Learning and Engineering", Computer, 35(11), p. 60 - 63, http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf (retrieved: 18.06.2012).
  26. ^McDowell, Luke K.; Cafarella, Michael (2006). "Ontology-driven Information Extraction with OntoSyphon", Proceedings of the 5th international conference on The Semantic Web, p. 428 - 444, http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf (retrieved: 18.06.2012).
  27. ^Yildiz, Burcu; Miksch, Silvia (2007). "ontoX - A Method for Ontology-Driven Information Extraction", Proceedings of the 2007 international conference on Computational science and its applications, 3, p. 660 - 673, http://publik.tuwien.ac.at/files/pub-inf_4769.pdf (retrieved: 18.06.2012).
  28. ^semanticweb.org (2011). "PoolParty Extractor", http://semanticweb.org/wiki/PoolParty_Extractor (retrieved: 18.06.2012).
  29. ^Dill, Stephen; Eiron, Nadav; Gibson, David; Gruhl, Daniel; Guha, R.; Jhingran, Anant; Kanungo, Tapas; Rajagopalan, Sridhar; Tomkins, Andrew; Tomlin, John A.; Zien, Jason Y. (2003). "SemTag and Seeker: Bootstraping the Semantic Web via Automated Semantic Annotation", Proceedings of the 12th international conference on World Wide Web, p. 178 - 186, http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html (retrieved: 18.06.2012).
  30. ^Uren, Victoria; Cimiano, Philipp; Iria, José; Handschuh, Siegfried; Vargas-Vera, Maria; Motta, Enrico; Ciravegna, Fabio (2006). "Semantic annotation for knowledge management: Requirements and a survey of the state of the art", Web Semantics: Science, Services and Agents on the World Wide Web, 4(1), p. 14 - 28, http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf, (retrieved: 18.06.2012).
  31. ^Cimiano, Philipp; Völker, Johanna (2005). "Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery", Proceedings of the 10th International Conference of Applications of Natural Language to Information Systems, 3513, p. 227 - 238, http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf (retrieved: 18.06.2012).
  32. ^Maedche, Alexander; Volz, Raphael (2001). "The Ontology Extraction & Maintenance Framework Text-To-Onto", Proceedings of the IEEE International Conference on Data Mining, http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf (retrieved: 18.06.2012).
  33. ^Machine Linking. "We connect to the Linked Open Data cloud", http://thewikimachine.fbk.eu/html/index.html (retrieved: 18.06.2012).
  34. ^Inxight Federal Systems (2008). "Inxight ThingFinder and ThingFinder Professional", http://inxightfedsys.com/products/sdks/tf/ (retrieved: 18.06.2012).
  35. ^Frawley William. F. et al. (1992), "Knowledge Discovery in Databases: An Overview", AI Magazine (Vol 13, No 3), 57-70 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011)
  36. ^Fayyad U. et al. (1996), "From Data Mining to Knowledge Discovery in Databases", AI Magazine (Vol 17, No 3), 37-54 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230
  37. ^Cao, L. (2010). "Domain driven data mining: challenges and prospects". IEEE Trans. on Knowledge and Data Engineering. 22 (6): 755–769. doi:10.1109/tkde.2010.32. 
Источник:

Knowledge mining from databases Data mining concept and techniques
Knowledge mining from databases Mining russia
WINDOWS OR LINUX FOR MINING Titan x pascal mining

2 thoughts on “Knowledge mining from databases

  1. microsoft sql server 2012 data mining add ins

    mining the oil sands

    Reply

Add comment

E-mail *