Leave a comment

A Statistical Learning Method to Fast Generalised Rule Induction Directly from Raw Measurements

Authors: Thien Le, Frederic Stahl, Chris Wrench and Mohamed Gaber

Abstract:

Induction of descriptive models is one of the most important technologies in data mining. The expressiveness of descriptive models are of paramount importance in applications that examine the causality of relationships between variables. Most of the work on descriptive models has concentrated on less expressive approaches such as clustering algorithms or rule-based approaches that are limited to a particular type of data, such as association rule mining for binary data. However, in many applications its important to understand the structure of the produced model for further human evaluation. In this research we present a novel generalised rule induction method that allows the induction of descriptive and expressive rules directly from both categorical and numerical features.

please request a copy here.

Picture1

 

Leave a comment

new PhD project: Developing a Data Mining Method for Data Diffusion Detection

Big Data Analytics refers to the analytics/data mining of large and complex datasets. Special efficient algorithms are needed to process and analyse Big Data. This project is mainly concerned with the development of an analytics methodology for diffusion detection of spatial-temporal data. Loosely speaking, it is about the development of a method that enables the detection of the diffusion of events (the directions in which events spread) over time. A possible case study for this is detecting how crime or certain patterns of crime spread geographically over a certain period of time. However, alternative case studies may be proposed by the applicant and will be considered.

For enquiries please contact me here.

You can find relevant publications for the project here.

Find out how to apply here.

Leave a comment

A rule dynamics approach to event detection in Twitter with its application to sports and politics

This paper has been published in Elsevier’s Expert Systems with Applications.  The manuscript is accessible online at:

https://www.researchgate.net/profile/Mohamed_Gaber16/publication/295217245_A_Rule_Dynamics_Approach_to_Event_Detection_in_Twitter_with_Its_Application_to_Sports_and_Politics/links/56db2e8408aebe4638beebda.pdf?inViewer=0&pdfJsDownload=0&origin=publication_detail

Authors: Mariam Adedoyin-Olowe, Mohamed Medhat Gaber, Carlos Martin Dancausa, Frederic Stahl and João Bartolo Gomes

Abstract

The increasing popularity of Twitter as social network tool for opinion expression as well as information retrieval has resulted in the need to derive computational means to detect and track relevant topics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we apply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events.

Leave a comment

PhD Project in “Real-time Parallel Big Data Analysis”

Description

The Real-time Parallel Big Data Analysis project is concerned with the development of scalable Big Data Analytics techniques for fast data streams. There is significant activity going on in Big Data Analytics throughout the University, as Big Data techniques underpin large areas of the University’s research activity in much the same way that the research platforms underpin experimental research.
  • Applications are accepted all all year round.
  • Self-funded PhD students only.

Get in touch here if you have any questions.

Leave a comment

PhD Project in”Automatic Classification of Data Streams with Sparse Class Labels”

Description

Data Stream Mining has become a hot topic and is concerned with the analytics of data that arrives in real-time and at a fast speed. Two general challenges in Data Stream Mining are (1) the data stream is infinite and storing the data and learning off line is not possible and (2) the pattern in the data may change over time (known as concept drift). Challenge (1) is typically met through algorithms that only need one pass through the data; and challenge (2) is typically met through frequent feedback about the pattern and thus changes of pattern encoded in the stream.
  • Applications are accepted all all year round.
  • Self-funded PhD students only.

 

Get in touch here if you have any questions.

 

Leave a comment

A method of rule induction for predicting and describing future alarms in a telecommunication network

Authors: Chris Wrench , Frederic Stahl, Thien Le, Giuseppe Di Fatta, Vidhyalakshmi Karthikeyan, Detlef Nauck

Abstract:

In order to gain insights into events and issues that may cause alarms in parts of IP networks, intelligent methods that capture and express causal relationships are needed. Methods that are predictive and descriptive are rare and those that do predict are often limited to using a single feature from a vast data set. This paper follows the progression of a Rule Induction Algorithm that produces rules with strong causal links that are both descriptive and predict events ahead of time. The algorithm is based on an information theoretic approach to extract rules comprising of a conjunction of network events that are significant prior to network alarms. An empirical evaluation of the algorithm is provided.

please request a copy here: https://fredericstahl.wordpress.com/publications/request-copy-of-a-publication/

Leave a comment

Towards Expressive Modular Rule Induction for Numerical Attributes

Authors: Manal Almutairi, Frederic Stahl, ,Matthew Jennings, Thien Le and Max Bramer

Abstract

The Prism family is an alternative set of predictive data mining algorithms to the more established decision tree data mining algorithms. Prism classifiers are more expressive and user friendly compared with decision trees and achieve a similar accuracy compared with that of decision trees and even outperform decision trees in some cases. This is especially the case where there is noise and clashes in the training data. However, Prism algorithms still tend to overfit on noisy data; this has led to the development of pruning methods which have allowed the Prism algorithms to generalise better over the dataset. The work presented in this paper aims to address the problem of overfitting at rule induction stage for numerical attributes by proposing a new numerical rule term structure based on the Gauss Probability Density Distribution. This new rule term structure is not only expected to lead to a more robust classifier, but also lowers the computational requirements as it needs to induce fewer rule terms.

To request a copy of paper just drop a message through this form:

https://fredericstahl.wordpress.com/publications/request-copy-of-a-publication/