RSS 2.0
Sign In
# Sunday, March 8, 2009

Recently, we have started looking into a task of creating an interactive parser. A generic one.

Yes, we know there are plenty of them all around, however the goals we have defined made us to construct the new implementation.

The goals:

  • Parser must be incremental.
    You should direct what to parse, and when to stop.
    This virtually demands rather "pull" than conventional "push" implementation.
  • Parser must be able to synchronize a tree with text.
    Whenever the underlying text is changed, a limited part of a tree should to be updated.
  • Parser should be able to recover from errors, and continue parsing.
  • Parser should be manageable.
    This is a goal of every program, really.
  • Parser must be fast.
  • A low memory footprint is desired.

What's implemented (VS2008, C#) and put at SourceForge, is called an Incremental Parser.

These are parser's insights:

  • Bookmarks are objects to track text points. We use a binary tree (see Bare binary tree algorithms) to adjust positions of bookmarks when text is changed.
  • Ranges define parsed tree elements. Each range is defined by two bookmarks, and a grammar annotation.
  • There are grammar primitives, which can be composed into a grammar graph.
  • A grammar graph along with ranges form a state machine.
  • Grammar chains are cached, turning parsing into a series of probes of literal tokens and transitions between grammar chains. This caching is done on demand, which results in warming-up effect.
  • Parser itself includes a random access tokenizer, and a queue of ranges pending to be parsed.
  • Parsing is conducted as a cycle of pulling and parsing of pending ranges.
  • Whenever text is changed a closest range is queued for the reparsing.
  • A balance between amount of parsing and memory consumption can be achieved through a detalization of grammar annotation for a range. An active text fragment can be fully annotated, while for other text parts a coarse range can be stored.

We have defined xpath like grammar to test our ideas. See printed parsed trees to get understanding of what information can be seen from ranges.

Sunday, March 8, 2009 9:00:38 PM UTC  #    Comments [0] -
Announce | Incremental Parser | xslt
All comments require the approval of the site owner before being displayed.
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
<March 2009>
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1361
Locations of visitors to this page
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)