Leave a comment

Computationally Efficient Induction of Classification Rules with the PMCRI and J-PMCRI Framework

This paper has been recently accepted for publication in the “Knowledge-Based Systems”. The  accepted manuscript  is accessible online at:

http://www.sciencedirect.com/science/article/pii/S0950705112001104?v=s5

Authors: Frederic Stahl and Max Bramer

Abstract

In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.

Highlights

Keywords:

  • Parallel Computing;
  • Parallel Rule Induction;
  • Modular Classification Rule Induction;
  • PMCRI;
  • J-PMCRI;
  • Prism
Leave a comment

My Google Scholar Profile is now activated!

http://scholar.google.co.uk/citations?user=gD96Zu8AAAAJ&hl=en

Leave a comment

BCS SGAI: Free one-day event for AI Research Students

FAIRS’11 – Forum for AI Research Students 2011

Call for Participation

A full day of workshops and presentations FREE TO STUDENTS
December 12th 2011, Cambridge, UK
Deadline for applications: November 28th 2011

http://www.ai-research.org.uk/FAIRS2011/
http://www.bcs-sgai.org/ai2011
***********************************************

AI-2011 is the latest in the leading series of UK-based international conferences on Artificial Intelligence run by the BCS Specialist Group on Artificial Intelligence (SGAI). In 2007 we launched the Forum for AI Research Students to support student members of the AI community, and repeated the event in the  following three years. SGAI is again offering this event at AI-2011. This event offers a full day of workshops and presentations and is FREE TO STUDENTS.

A one day event, FREE OF CHARGE to students, offering PhD and MRes/MPhil students in AI fields:
*   Guidance on conducting research and writing their thesis
*   Advice on undergoing the viva
*   Advice and discussion on careers after a PhD
*   The chance to meet other research students and develop networks
*  An opportunity to discuss their work with senior researchers and

practitioners This year we will also be holding the elections for a new student representative to the SGAI committee.

Deadline for applications: November 28th 2011

Overview FAIRS’11 is intended to allow PhD and MRes/MPhil students working on AI related projects to discuss their research and issues around doing a PhD with senior scholars in the field and with other Forum attendees. The forum is aimed at those who have either completed a Master’s degree in an AI field and who are interested in progressing to a PhD, or are current doctoral or research Masters students. Students will be expected to contribute to small workshop discussions on issues related to the conduct of the PhD. It will be a stand-alone
programme which will take place on December 12th, the day preceding the main AI-2011 conference workshops and tutorials.

Programme The Forum will include:

*   Getting a PhD – Workshop on conducting research and developing and writing a thesis. *   Student discussion groups
*   Parallel streams, for students at different stages of the PhD.
* Discussion of issues and problems experienced by PhD students. All encouraged to share and get involved!
*   Panel sessions – Question and answer sessions with a panel comprising recent PhD graduates, AI academics, industrial representative and PhD employers.

Focus on:

*   doing a PhD
*   careers and employment
*   How To Survive your Viva! – A lighthearted presentation on how to (and how not to) prepare for and perform at a Viva.
*   New this year – hustings and voting for the student representative to the SGAI committee.

Further Information
Full details of the Forum, including the application process, are at
http://www.ai-research.org.uk/FAIRS2011/. As part of SGAI’s aim to
support student members of the AI community, this event will be free
to student attendees. Please encourage research students to attend
this event. There is no need for attendees to register for AI-2011.
However, a special student rate has been arranged – full details at
http://www.bcs-sgai.org/ai2011.

1 Comment

Best Paper Prize will be awarded at the The Thirty-first SGAI International Conference Cambridge, UK, 13th-15th December 2011

Title: Random Prism: An Alternative to Random Forests

Authors: Frederic Stahl and Max Bramer

Abstract:

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.

Leave a comment

Jmax-pruning: A Facility for the Information Theoretic Pruning of Modular Classification Rules

This paper has been recently accepted for publication in the “Knowledge-Based Systems”. The  accepted manuscript  is accessible online at

http://dx.doi.org/10.1016/j.knosys.2011.06.016

Authors: Frederic Stahl and Max Bramer

Abstract

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.

Highlights

► We improve a rule pruning method for modular classification rules. ► We examine the information theoretical shortcoming of the J-pruning approach. ► Our Jmax-Pruning is based on the rule’s maximum theoretical information content. ► Empirical results show a significant improvement of Jmax-Pruning to J-pruning.

Keywords: J-pruning; Jmax-pruning; Modular Classification Rule Induction; Pre-pruning

Cite this Article: This article can already be cited using the year of online availability and the DOI as follows: Author(s), Article Title, Journal (Year), DOI.

Related Work: Please find a earlier publication about Jmax-Pruning here:

Stahl, F., Bramer, M. (2011). Induction of Modular Classification Rules: Using Jmax-pruning. In Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge: Springer.

Leave a comment

Upcoming Talk at the UK Symposium on Knowledge Discovery and Data Mining

I have been invited to give a talk about the Pocket Data Mining system. More information about the event can be found here: http://ukkdd.org.uk/index.php?section=home.

Foundations of Pocket Data Mining

Abstract: Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining (PDM). Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system.

This emerging area of study has been shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users.

We propose the use of distributed data stream classification techniques in the PDM framework over vertically partitioned data streams.

Leave a comment

Our First Pocket Data Mining Prototype is ready!

Pocket Data Mining PDM is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application.

Related publications:

Stahl F., Gaber M. M., Bramer M., and Yu P. S, Distributed Hoeffding Trees for Pocket Data Mining, Proceedings of the 2011 International Conference on High Performance Computing & Simulation (HPCS 2011), Special Session on High Performance Parallel and Distributed Data Mining (HPPD-DM 2011), July 4 — 8, 2011, Istanbul, Turkey, IEEE press.
http://eprints.port.ac.uk/id/eprint/3523

Stahl F., Gaber M. M., Bramer M., Liu H., and Yu P. S., Distributed Classification for Pocket Data Mining, Proceedings of the 19th International Symposium on Methodologies for Intelligent Systems (ISMIS 2011), Warsaw, Poland, 28-30 June, 2011, Lecture Notes in Artificial Intelligence LNAI, Springer Verlag.
http://eprints.port.ac.uk/3524/

Stahl F., Gaber M. M., Bramer M., and Yu P. S., Pocket Data Mining: Towards Collaborative Data Mining in Mobile Computing Environments, Proceedings of the IEEE 22nd International Conference on Tools with Artificial Intelligence (ICTAI 2010), Arras, France, 27-29 October, 2010.
http://eprints.port.ac.uk/3248/

Follow

Get every new post delivered to your Inbox.