Leave a comment

38th SGAI International Conference on Artificial Intelligence. CAMBRIDGE, ENGLAND 11-13 DECEMBER 2018

Going to attend this event on Artificial Intelligence, mainly because of the excellent tracks related to data science. This year there is also a walking tour through Cambridge including Kings College.

http://www.bcs-sgai.org/ai2018/?section=home

 

 

Advertisements
Leave a comment

AI-2017 Thirty-seventh SGAI International Conference on Artificial Intelligence. CAMBRIDGE, ENGLAND 12-14 DECEMBER 2017

Will be attending this long standing conference on AI. Will I see you there?

 

http://www.bcs-sgai.org/ai2017/

 

Leave a comment

Towards Real-Time Feature Tracking Technique using Adaptive Micro-Clusters

Authors: Mahmood Shakir Hammoodi, Frederic Stahl, Mark Tennant, Atta Badii

Abstract:

Data streams are unbounded, sequential data instances that are generated with high velocity. Classifying sequential data instances is a very challenging problem in machine learning with applications in network intrusion detection, financial markets and sensor networks. Data stream classification is concerned with the automatic labelling of unseen instances from the stream in real-time. For this the classifier needs to adapt to concept drifts and can only have a single pass through the data if the stream is fast. This research paper presents our work on a real-time pre-processing technique, in particular a feature tracking technique that takes concept drift into consideration. The feature tracking technique is designed to improve Data Stream Mining (DSM) classification algorithms by enabling real-time feature selection. The technique is based on adaptive summaries of the data and class distributions, known as Micro-Clusters. Currently the technique is able to detect concept drift and identifies which features have been involved.

please request a copy here.

Leave a comment

On expressiveness and uncertainty awareness in rule-based classification for data streams

Authors: Thien Le, Frederic Stahl, Mohamed Medhat Gaber, João Bártolo Gomes, Giuseppe Di Fatta

Abstract:

Mining data streams is a core element of Big Data Analytics. It represents the velocity of large datasets, which is one of the four aspects of Big Data, the other three being volume, variety and veracity. As data streams in, models are constructed using data mining techniques tailored towards continuous and fast model update. The Hoeffding Inequality has been among the most successful approaches in learning theory for data streams. In this context, it is typically used to provide a statistical bound for the number of examples needed in each step of an incremental learning process. It has been applied to both classification and clustering problems. Despite the success of the Hoeffding Tree classifier and other data stream mining methods, such models fall short of explaining how their results (i.e., classifications) are reached (black boxing). The expressiveness of decision models in data streams is an area of research that has attracted less attention, despite its paramount of practical importance. In this paper, we address this issue, adopting Hoeffding Inequality as an upper bound to build decision rules which can help decision makers with informed predictions (white boxing). We termed our novel method Hoeffding Rules with respect to the use of the Hoeffding Inequality in the method, for estimating whether an induced rule from a smaller sample would be of the same quality as a rule induced from a larger sample. The new method brings in a number of novel contributions including handling uncertainty through abstaining, dealing with continuous data through Gaussian statistical modelling, and an experimentally proven fast algorithm. We conducted a thorough experimental study using benchmark datasets, showing the efficiency and expressiveness of the proposed technique when compared with the state-of-the-art.

access a copy here.

Leave a comment

Scalable real-time classification of data streams with concept drift

Authors: Mark Tennant, Frederic Stahl, Omer Rana, João Bártolo Gomes

Abstract:

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study.

access a copy here.

Leave a comment

A Text Mining Framework for Big Data

 

Authors: Niki Pavlopoulou, Aeham Abushwashi, Frederic Stahl and Vittorio Scibetta

Abstract:

Text Mining is the ability to generate knowledge (insight) from text. This is a challenging task, especially when the target text databases are very large. Big Data has attracted much attention lately, both from academia and industry. A number of distributed databases, search engines and frameworks have been developed to handle the memory and time constraints, which are required to process a large amount of data. However, there is no open-source end-to-end framework that can combine near real-time and batch processing of ingested big textual data along with user-defined options and provision of specific, reliable insight from the data. This is important as this way new unstructured information is made accessible in near real-time, more personalised customer products can be created and novel unusual patterns can be found and actioned on quickly. This work focuses on a proprietary complete near real-time automated classification framework for unstructured data with the use of Natural Language Processing and Machine Learning algorithms on Apache Spark. The evaluation of our framework shows that it achieves a comparable accuracy with respect to some of the best approaches presented in the literature.

access a copy here.

Leave a comment

A Statistical Learning Method to Fast Generalised Rule Induction Directly from Raw Measurements

Authors: Thien Le, Frederic Stahl, Chris Wrench and Mohamed Gaber

Abstract:

Induction of descriptive models is one of the most important technologies in data mining. The expressiveness of descriptive models are of paramount importance in applications that examine the causality of relationships between variables. Most of the work on descriptive models has concentrated on less expressive approaches such as clustering algorithms or rule-based approaches that are limited to a particular type of data, such as association rule mining for binary data. However, in many applications its important to understand the structure of the produced model for further human evaluation. In this research we present a novel generalised rule induction method that allows the induction of descriptive and expressive rules directly from both categorical and numerical features.

please request a copy here.

Picture1