I am here: NooJ2011 home - > About NooJ

NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional, derivational and productive morphology, local, structural syntax and transformational syntax. For each of these levels, NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that should describe everything. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc.

NooJ is freely available and linguistic modules can already be freely downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish. A dozen other modules are under construction.

NooJ's most exclusive characteristics are:

  • NooJ can process texts and corpora in over 100+ file formats, including HTML, PDF, MS-OFFICE, all variants of UNICODE, ASCII, etc. It can import information from, and export its annotations back to XML documents.
  • NooJ's linguistic engine uses an annotation system that allows all levels of grammars to be applied to texts without modifying them; this allows linguists to formalize various phenomena independently, and to apply the corresponding grammars in cascade. For instance, by combining inflection, derivation and syntactic data, NooJ can perform Harris-type transformations.

NooJ is used as:

  • a linguistic engineering development platform,
  • a corpus processor,
  • an information extraction system,
  • a terminological extractor,
  • a Machine Translation development tool
  • a tool to teach linguistics and computational linguistics.

To learn more about NooJ, download the software, linguistic resources, manual, tutorials and reference papers: www.nooj4nlp.net.

Forecast for Dubrovnik

SPONSORS: