Skip to content

Releases: digitalheir/java-probabilistic-earley-parser

v0.10.0

30 Dec 17:11
Compare
Choose a tag to compare
Update readme

v0.9.12

08 Feb 21:47
Compare
Choose a tag to compare
  • Added lenient scanning options -scanmode drop and -scanmode wildcard for handling when the grammar can't find the terminal type for a token. See #7

You can use this project as a library in your Java application or as a standalone command-line app.

Using the app from command-line

We define a grammar in a .cfg file.

By default, the parser will assume that you distinguish non-terminals from terminals by capitalizing them. You can use a custom category handler if you call the API from Java code.

# grammar.cfg

S  -> NP VP  (1.0)    # specify probability between 0 and 1 by appending between parentheses
NP -> D N             # probability defaults to 1.0
VP →  V NP            # Use '->' or '→'
D  →  the
N  →  noses   (0.7)
V  →  noses   (0.3)
V  →  sniff   (0.9)
N  →  sniff   (0.1)

Execute runnable jar on the terminal:

java -jar probabilistic-earley-parser-0.9.12-jar-with-dependencies.jar -i grammar.cfg -goal S the noses sniff the noses

This will give the Viterbi parse to the Sentence "the noses sniff the noses":

0.44099999999999995 (= 0.7 * 0.7 * 0.9)
└── <start>
    └── S
        ├── NP
        │   ├── D
        │   │   └── the (the)
        │   └── N
        │       └── noses (noses)
        └── VP
            ├── V
            │   └── sniff (sniff)
            └── NP
                ├── D
                │   └── the (the)
                └── N
                    └── noses (noses)

v0.9.11

02 Feb 21:08
Compare
Choose a tag to compare
  • Added command line functionality; include runnable jar with dependency included
  • Fixed parsing of rule probability from .cfg files

You can use this project as a library in your Java application or as a standalone command-line app.

By default, the parser will assume that you distinguish non-terminals from terminals by capitalizing them. You can also add a custom category handler if you call the API from Java code.

Create a UTF8-encoded .cfg file that contains your grammar, such as the following:

# grammar.cfg

S  -> NP VP  (1.0)    # specify probability between 0 and 1 by appending between parentheses
NP -> D N             # probability defaults to 1.0
VP →  V NP            # Use '->' or '→'
D  →  the
N  →  noses   (0.7)
V  →  noses   (0.3)
V  →  sniff   (0.9)
N  →  sniff   (0.1)

Execute runnable jar on the terminal:

java -jar probabilistic-earley-parser-0.9.11-jar-with-dependencies.jar -i grammar.cfg -goal S the noses sniff the noses

This will give the Viterbi parse to the Sentence "the noses sniff the noses":

0.44099999999999995 (= 0.7 * 0.7 * 0.9)
└── <start>
    └── S
        ├── NP
        │   ├── D
        │   │   └── the (the)
        │   └── N
        │       └── noses (noses)
        └── VP
            ├── V
            │   └── sniff (sniff)
            └── NP
                ├── D
                │   └── the (the)
                └── N
                    └── noses (noses)