Interoperability and adaptability of text mining tools
A mini-symposium at the UK e-Science All Hands Meeting 2007
10-13 September, 2007, Nottingham, UK
Call for papers
Text mining goes beyond information retrieval by analyzing texts above the ‘bag of words’ representation from which search engines normally index documents. Applying text mining in the e-science community involves choosing from an increasing variety of ways in which to analyze documents, annotate them with metadata, index them and mine the indexes for associations between annotated text elements and higher-level constructs. To benefit from community accomplishments in text mining technology, two key features of analysis tools are interoperability and adaptability.
Interoperability allows tools from one source to be composed with tools from other sources. This allows both substitution of tools to find the fittest alternatives for a given level of analysis within a pipeline, and also allows higher-level analyses to build on established analysis tools without reinventing the wheel. Several software frameworks (again sometimes competing) are in place to enable interoperability. Their key components are a workflow management system in which software modules’ capabilities are described, and composed together, relying on a common invocation API., and a common document representation framework realized as a repository of annotations. The latter is a basic requirement, but because the metalanguage of annotations is usually pitched at a level of generality that makes almost no ontological commitments, it remains a challenge to have tools written from within different analysis paradigms integrated to best advantage. The utilization of third-party tools integrated into frameworks often remains at a low level of analysis, and fails to exploit the full potential of richer analysis tools.
Adaptability in a tool allows it to be readily deployed in different ways from its original genesis, most typically in a different domain of application, or perhaps in a different language. Individual components of text mining analysis pipelines are often constructed by adaptive techniques such as supervised and unsupervised learning, but others are crafted intellectually using linguistic and domain knowledge. In both cases, obtaining the empirical evidence or data for development is a significant challenge, and standard approaches to resource construction involve considerable effort in text annotation for no purpose other than resource development. In other words it requires the scientist to accept considerable deferred gratification for efforts that do not in themselves lead to a ‘result’. The adaptability challenge in TM is to develop modes of analysis that generate resources for further analysis as a by-product of their deployment.
Original papers addressing these two themes, within the following list of topics, are solicited:
1.1 Format of the Workshop
- Experience of integrating high level NLP tools into frameworks such as UIMA
- Deployment of text mining analysis components as Web services.
- Exploiting upper-model and/or domain ontologies in text mining.
- Integrating domain ontology & lexical/terminological resources for conceptual text analysis.
- Exploitation of domain-independent grammatical and lexical resources in robust text mining for specific domains.
- Composition of text mining pipelines from mixed machine learning and rule/knowledge-based components.
- Text analysis tasks and tools that can be developed with unsupervised learning for specific domains, e.g. terminology and ontology learning, summarization.
- Extension of linguistic and domain resources by learning during analysis.
- End user facilities based on output of text analysis: tools for search, association mining, researcher workflows.
- Facilities for e-science users to discover analysis components and design custom pipelines.
- Any of the above in particular e-science contexts, e.g. natural sciences, social sciences or humanities.
The workshop will take the form of a mini-symposium with one invited keynote talk and up to 6 submitted talks. Presentations should be planned on the assumption that a slot will be 15 minutes plus 5 minutes for questions and discussion, although the organizers may allocate a shorter or longer slot to accepted papers.
Proceedings will be published as part of the AHM proceedings CD with ISBN, and will additionally be published on the Web site of the National Centre for Text Mining. As with the AHM conference, we propose to aim for a journal special issue with extended versions of the best submissions, subject to a further refereeing stage.
1.3 How to submit your paper
All submissions should be made electronically using the online form at the AHM 2007 web site ( http://www.allhands.org.uk ) in the same way as for conference paper submissions.
1.4 Format of submissions
Submitted papers should be up to eight pages in length, in 10 point text. A two-column format is preferred for the workshop proceedings. Formatting guidelines are at the allhands Web site.
1.5 Refereeing of submissions
Papers will be refereed by a minimum of two referees, who will normally be members of the Workshop program committee and/or of the AHM program committee.
1.6 Key Dates
|Paper submission deadline:||16 th April, 2007|
|Notice of acceptance or otherwise:||14 th May, 2007|
|Final papers due:||TBA|
|Presentations (powerpoint or PDF) due to local organizers:||TBA|
1.7 Program Committee
Dr S. Ananiadou, NaCTeM, University of Manchester
Mr W Black, NaCTeM, University of Manchester
Dr A Copestake, Computer Laboratory, University of Cambridge
Prof E. Klein, School of Informatics, University of Edinburgh
Mr J McNaught, NaCTeM, University of Manchester
Dr Goran Nenadic, NaCTeM, University of Manchester
Prof J Tsujii, NaCTeM, University of Manchester and University of Tokyo
Mr William J Black
Senior Lecturer in Computer Science
NaCTeM, the National Centre for Text Mining
Manchester Interdisciplinary Biocentre
University of Manchester
0161 306 3096