яяшкйх  

Unstructured Information Management Architecture
...

What is Unstructured Information Management Architecture (UIMA)?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, such as "language identification" > "language-specific segmentation" > "sentence boundary detection" > "entity detection (person/place names, etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata through XML descriptor files.

The framework manages these components and the data flow between them. Components are written in Java≥ or C++; the data that flows between components is designed for efficient mapping between these languages. In addition, UIMA provides capabilities for wrapping components as network services, and it can scale to large volumes by replicating processing pipelines over a cluster of networked nodes.

How does it work?

UIMA SDK was originally developed by IBM╝ and made available here at alphaWorks╝. In October 2006, IBM donated UIMA SDK to Apache; ongoing development will be done in the open-source style by the Apache UIMA community. For further details about Apache UIMA and its development process, please refer to the Apache UIMA Web site.

There are still some IBM products in the field that uses older IBM UIMA releases instead of the new Apache UIMA releases. If you need an older IBM UIMA release, please check the IBM product page for UIMA on developerWorks╝ in order to get the product-aligned version of IBM UIMA. The Java source code for some of the older IBM UIMA releases is available at SourceForge.

IBM technology related to Apache UIMA

The alphaWorks UIMA pages contain some additional components and technologies that work with Apache UIMA and enrich the functionality of Apache UIMA. Currently, the available components are as follows:

SemanticSearch 2.1: The SemanticSearch package is an add-on to Apache UIMA that provides a full-featured semantic search engine. The package includes a CAS consumer that populates a search engine index with the document content together with the semantic annotations added by the analysis pipeline. The index can be then queried by XML Fragments that are small, well-balanced XML pieces of text with annotations. For example, if your text document contains the person name ?Donald Knuth?, and this name is identified by an annotator as being the author of the document (and is indexed as an annotation called ?author?), you can query this information by using a query <author>Donald Knuth</author>. The index is accessed by an API, and the package includes an example semantic search application written in Java along with the API full documentation.


IBM UIMA wrapper: The IBM UIMA wrapper package enables you to run IBM UIMA components inside Apache UIMA 2.2 or above. This package is designed for projects and products that migrate to Apache UIMA but that must still be able to run older IBM UIMA components.

...

UIMA Component Repository
...

About this Site
Our goal in creating this site is to provide the basis for a thriving community of UIMA developers who can announce, discuss, design, share, and critique UIMA-compliant components, resources and solutions.

The Unstructured Information Management Architecture (UIMA) is a software framework that supports rapid development and deployment of multimodal analytics - applications which provide value by processing human-readable text, audio and/or video in order to extract information, answer questions, summarize documents, etc.

At Carnegie Mellon, we are currently using UIMA for both research and education. UIMA is used as a framework for corpus annotation on the JAVELIN, RADAR and ROSETTA projects. UIMA has also been used as a framework for student homeworks and externally-sponsored projects in the Software Engineering


...


Сайт создан в системе uCoz