Leave a comment

PhD Project in “Real-time Parallel Big Data Analysis”

Description

The Real-time Parallel Big Data Analysis project is concerned with the development of scalable Big Data Analytics techniques for fast data streams. There is significant activity going on in Big Data Analytics throughout the University, as Big Data techniques underpin large areas of the University’s research activity in much the same way that the research platforms underpin experimental research.
  • Applications are accepted all all year round.
  • Self-funded PhD students only.

Get in touch here if you have any questions.

Advertisements
Leave a comment

PhD Project in”Automatic Classification of Data Streams with Sparse Class Labels”

Description

Data Stream Mining has become a hot topic and is concerned with the analytics of data that arrives in real-time and at a fast speed. Two general challenges in Data Stream Mining are (1) the data stream is infinite and storing the data and learning off line is not possible and (2) the pattern in the data may change over time (known as concept drift). Challenge (1) is typically met through algorithms that only need one pass through the data; and challenge (2) is typically met through frequent feedback about the pattern and thus changes of pattern encoded in the stream.
  • Applications are accepted all all year round.
  • Self-funded PhD students only.

 

Get in touch here if you have any questions.

 

Leave a comment

A method of rule induction for predicting and describing future alarms in a telecommunication network

Authors: Chris Wrench , Frederic Stahl, Thien Le, Giuseppe Di Fatta, Vidhyalakshmi Karthikeyan, Detlef Nauck

Abstract:

In order to gain insights into events and issues that may cause alarms in parts of IP networks, intelligent methods that capture and express causal relationships are needed. Methods that are predictive and descriptive are rare and those that do predict are often limited to using a single feature from a vast data set. This paper follows the progression of a Rule Induction Algorithm that produces rules with strong causal links that are both descriptive and predict events ahead of time. The algorithm is based on an information theoretic approach to extract rules comprising of a conjunction of network events that are significant prior to network alarms. An empirical evaluation of the algorithm is provided.

please request a copy here: https://fredericstahl.wordpress.com/publications/request-copy-of-a-publication/

Leave a comment

Towards Expressive Modular Rule Induction for Numerical Attributes

Authors: Manal Almutairi, Frederic Stahl, ,Matthew Jennings, Thien Le and Max Bramer

Abstract

The Prism family is an alternative set of predictive data mining algorithms to the more established decision tree data mining algorithms. Prism classifiers are more expressive and user friendly compared with decision trees and achieve a similar accuracy compared with that of decision trees and even outperform decision trees in some cases. This is especially the case where there is noise and clashes in the training data. However, Prism algorithms still tend to overfit on noisy data; this has led to the development of pruning methods which have allowed the Prism algorithms to generalise better over the dataset. The work presented in this paper aims to address the problem of overfitting at rule induction stage for numerical attributes by proposing a new numerical rule term structure based on the Gauss Probability Density Distribution. This new rule term structure is not only expected to lead to a more robust classifier, but also lowers the computational requirements as it needs to induce fewer rule terms.

To request a copy of paper just drop a message through this form:

https://fredericstahl.wordpress.com/publications/request-copy-of-a-publication/

Leave a comment

Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification

 

Authors: Mahmood Hammoodi, Frederic Stahl and Mark Tennant

Abstract

Data Streams are unbounded, sequential data instances that are generated very rapidly. The storage, querying and mining of such rapid flows of data is computationally very challenging. Data Stream Mining (DSM) is concerned with the mining of such data streams in real-time using techniques that require only one pass through the data. DSM techniques need to be adaptive to reflect changes of the pattern encoded in the stream (concept drift). The relevance of features for a DSM classification task may change due to concept drifts and this paper describes the first step towards a concept drift detection method with online feature tracking capabilities.

The manuscript is accessible online at:

http://centaur.reading.ac.uk/68360/1/asPrintedOpenAccess.pdf

 

Leave a comment

BCS SGAI Workshop on Data Stream Mining Techniques and Applications

Introduction
============
The four main dimensions of Big Data are known as Volume, referring to the size of the data, Velocity, referring to the data that is generated rapidly, Veracity, referring to uncertainty in data and Variety, referring to data from different kinds of sources such as text, structure and video data. This workshop’s focus is on the Velocity dimension of Big Data. The analytics of high velocity data has many applications, such as topic detection in Twitter, traffic control, network intrusion detection, etc. The difference compared with data that is stored on a disk is that real-time data may change its characteristics over time. However, decision support applications rely on the recency of their supporting data, hence, data generated at a high velocity needs to be processed ‘on the fly’. On the other hand, there are applications that are more interested in the actual change of the data, i.e. intrusion detection and network fault detection. Hence there is a need for computationally efficient real-time techniques that take changes of the data into consideration.
This workshop not only welcomes papers on data stream mining of high velocity data but also application from various domains, such as science, engineering, finance, web, etc. The workshop’s aim is to bring together researchers in this field to present their latest work, discuss challenges and future directions of research in Data Stream Mining.
Submitted extended abstracts (2 pages) will be reviewed. The authors of the best abstracts will be invited to submit full workshop papers, which will be further reviewed.
Publication of Workshop Papers
==========================
Accepted papers will be published in a special issue of the BCS SGAI publication Expert Update: http://expertupdate.org/
Workshop Website
===============
Topics of interest
==================
* High Velocity Data Stream mining algorithms and techniques
* Big Data Streams
* Concept Drift Detection
* Real-time data mining applications
* Real-time event detection from streaming data.
Important dates
==============
* Extended Abstract Submission (2 pages any format): extended until 12th of August 2016
* Invitation to submit full papers (8 pages):19th August
* Submission deadline for full paper: 9th September
* Notification of acceptance: 3rd October 2016
* Camera ready papers and workshop registration: 14th October 2016
* Workshop: 13 December 2016
Workshop chair
==============
* Frederic Stahl, University of Reading, UK
Programme committee
===================
* Frederic Stahl (University of Reading, UK)
* Max Bramer (University of Portsmouth, UK)
* Mohamed Medhat Gaber (Robert Gordon University, UK)
* Joao Gomes (DataRobot, Singapore)
* Thien Le (University of Reading, UK)
Paper submission
================
Extended Abstracts can be directly send to Dr Frederic Stahl (F.T.Stahl@reading.ac.uk)
Workshop Registration
=====================
One author per paper must present their work at the workshop and be registered for the workshop day of the AI2016 conference: http://www.bcs-sgai.org/ai2016/
Regular Rate £120
Student Rate £75
VAT is charged at 20%
Leave a comment

Scaling Up Classification Rule Induction Through Parallel Processing

This paper has been published in Cambridge University Press’s  Knowledge Engineering Review. The manuscript is accessible online at:

https://fredericstahl.files.wordpress.com/2012/02/paper13.pdf

Authors: Frederic Stahl and Max Bramer

Abstract

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelisation seems to be a natural and cost effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelisation in the field of classification rule induction.