All Hands 2005 Major Sponsor:

Innovating through e-Science


Press release issued by the UK e-Science Programme

Innovating through e-Science: 4th e-Science All Hands meeting
20-22 September 2005, East Midlands Conference Centre, Nottingham

Database of cancer records now available for research

Data on more than 22,000 cancer cases are now available for research by bona fide clinical and medical researchers. This repository is the first major output of the Clinical e-Science Framework (CLEF), an e-Science project funded by the Medical Research Council (MRC). Sophisticated security systems, also developed by CLEF, ensure secure and ethical access to the databank. Dr Catalina Hallett will demonstrate the query of the new database at the e-Science All Hands meeting in Nottingham on 20 September.

Patient records contain a wealth of information that could be very useful to medical research. To make this information accessible to researchers, however, it must be extracted from what is often written text and presented in such a way that it can be compared with data from scientific and other databases. CLEF has developed techniques to capture relevant information from text automatically and enter it into a database. The project has also implemented stringent access control, authentication and secure transmission protocols using sophisticated encryption standards to protect against accidental disclosures.

Professor Alan Rector, CLEF's director, said: "The CLEF repository is optimised to treat electronic healthcare records as an interactive knowledge source for academic researchers and clinicians to help them access the latest medical information. Once fully deployed, it will lead to previously unthinkable, rapid advances in healthcare research by enabling researchers to analyse data stored in a wide range of geographically-spread databases , on-line."

Professor David Ingram's team at University College, London built the repository using a new method for importing and structuring data so that users can do population queries over longitudinal data sets.  The CLEF repository supports the large-scale analysis of patient records in a Grid environment. It can handle complex queries, whilst retaining the critical semantic, structural and medico-legal integrity of the data.

The process, developed in part by Professor Alan Rector's team at the University of Manchester, structures the source data in multiple steps enabling users to put complex clinical questions to the repository. First data is structured in a longitudinal format, then by clinical context and finally by the actual type of data. Previously, the retrieval of similarly complex data would have required time-consuming manual search and data analysis. Using the work of Professor Rob Gaizauskas' team from the University of Sheffield, the CLEF system is able to extract key medical information from clinical records that are in a narrative format, for example medical letters, discharge summaries, radiology reports, etc.

A new, generic WYSIWYM ("What you see is what you mean") interface that was developed by Professor Donia Scott's team at The Open University enables users to pose complex clinical queries in natural language and receive answers in plain English text or simple tables and graphs. Users no longer need to learn "computer-speak" to communicate with an electronic database.

CLEF's future work includes extending its database and refining its use of knowledge resources to help both patients and professionals to access the right information and interpret scientific data. The project's aim is to provide user-friendly and secure tools to improve clinical and research practices, teaching methods and care management processes.

Conference website


Professor Alan Rector, Director of CLEF Project, Department of Computer Science , University of Manchester tel. +44 (0) 161 275 6188/6149 e-mail:

Dr Aniko Zagon, CLEF Industry Liaison, JEZZ Remedies Ltd. Tel: 07970 130 681 e-mail:

Judy Redfearn, e-Science/Research communications officer, JISC/e-Science Core Programme tel. 07768 356309 e-mail:


CLEF website

MRC website

UK e-Science Programme

Notes for editors

  1. e-Science is the very large scale science that can be carried out by pooling access to very large digital data collections, very large scale computing resources and high performance visualisation held at different sites.
  2. A computing grid refers to geographically dispersed computing resources that are linked together by software known as middleware so that the resources can be shared. The vision is to provide computing resources to the consumer in a similar way to the electric power grid. The consumer can access electric or computing power without knowing which power station or computer it is coming from.
  3. The UK e-Science Programme is a coordinated £230M initiative involving all the Research Councils and the Department of Trade and Industry. It has also leveraged industrial investment of £30M. The Engineering and Physical Sciences Research Council manages the e-Science Core Programme, which is developing generic technologies, on behalf of all the Research Councils.

The UK e-Science Programme as a whole is fostering the development of IT and grid technologies to enable new ways of doing faster, better or different research, with the aim of establishing a sustainable, national e-infrastructure for research and innovation. Further information at