Skip to content

Latest commit

 

History

History
206 lines (125 loc) · 11.2 KB

DEV_NOTE2.md

File metadata and controls

206 lines (125 loc) · 11.2 KB

LPhy Developer Guide 102 (LPhy in Java)

This tutorial focuses on how to implement LPhy components using Java classes.

LPhy terms

Please read the following articles before you start to write the code:

It is essential to have a thorough understanding of the following concepts:

In the Java implementation, Value and Generator classes are defined by GraphicalModelNode. Also see https://linguaphylo.github.io/programming/2020/09/22/linguaphylo-for-developers.html

LPhy data type

LPhy is a dynamic typing language. Therefore, as a developer, you need to understand how the data type is handled. For example,

  • All actual values are wrapped in the Value class, there are few classes inherit it, such as RandomVariable.

You need to use the method .value() to retrieve the actual value, and .getType() to get its data type.

Although we have already implemented some commonly used data types in LPhy, developers may still need to implement new LPhy data types for certain new generators.

LPhy data type is not sequence type

You may encounter many different "data types" in LPhy or BEAST. Please do not confuse these with sequence types. In LPhy, data types are specifically defined for the LPhy language. For example, they can be Double, Integer, Taxa, Alignment, or TimeTree.

However, any "data type" classes that inherit from JEBL SequenceType do not fall under this concept. These classes define the type of sequences.

Write your LPhy object in Java

Generative distribution

It is a Java interface to represent all types of generative distributions, such as probability distributions, tree generative distributions (e.g. Birth-death, Coalescent), and PhyloCTMC generative distributions.

To write your own generative distribution, you need to follow these steps:

  1. Design your LPhy script first, for example, Θ ~ LogNormal(meanlog=3.0, sdlog=1.0);.

  2. Create a Java class (e.g. LogNormal.java) to implement GenerativeDistribution.

Look at the example LogNormal.java. A few things are required:

  • Define its LPhy name by the annotation @GeneratorInfo for the overwritten method RandomVariable<Double> sample().

    name = "LogNormal" will allow the parser to parse it in a LPhy code into this Java object.

  • Define the arguments for this distribution using the annotation @ParameterInfo inside the constructor.

    name = "meanlog" declares one of the arguments as "meanlog". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such as Value<Number> M. We use Number so that this input can accept integer values. To make an argument optional, simply add optional = true.

  • Define the data type, e.g. LogNormal extends ParametricDistribution<Double> implements GenerativeDistribution1D<Double>, where Double replaces T and must be consistent with the returned type RandomVariable<Double> sample().

  • Implement the method RandomVariable<...> sample() which should sample a value from this distribution and then wrap it into RandomVariable.

  • Correctly implement both methods Map<String, Value> getParams() and setParam(String paramName, Value value), otherwise, it will fail when re-sampling values from the probabilistic graphical model represented by an LPhy script using this distribution.

  1. Register the distribution to SPI.

The SPI registration class for generative distributions is located at the Java package named as *.spi, for example, lphy.base.spi.LPhyBaseImpl, or phylonco.lphy.spi.PhyloncoImpl.
You can simply add your class into the list returned by the method List<Class<? extends GenerativeDistribution>> declareDistributions(). Here is the example in LPhyBaseImpl.

Please note the LPhy code will only function properly after the distribution class is registered. Therefore, it is acceptable to commit incomplete LPhy object during development (to avoid painful merges) without registering it, provided it compiles and is not included in any published unit tests.

Deterministic function

It is an abstract class and extends BasicFunction.

To write your own deterministic function, you need to follow the similar steps:

  1. Design your LPhy script first, for example, Q = hky(kappa=κ, freq=π).

  2. Create a Java class (e.g. HKY.java) to extend DeterministicFunction.

Look at the example HKY.java. A few things are required:

  • Define its LPhy name by the annotation @GeneratorInfo for the overwritten method Value<Double[][]> apply().

    name = "hky" will allow the parser to parse it in a LPhy code into this Java object.

  • Define the arguments for this distribution using the annotation @ParameterInfo inside the constructor.

    name = "kappa" declares one of the arguments as "kappa". This is also referred to as a named argument. Following the annotation, you need to declare the Java argument for this constructor, which must be a Value, such as Value<Number> kappa. We use Number so that this input can accept integer values. To make an argument optional, simply add optional = true.

  • Define the data type, e.g. extends DeterministicFunction<Double[][]>, where the 2d matrix Double[][] replaces T and must be consistent with the returned type Value<Double[][]> apply().

  • Implement the method Value<...> apply() which should return a value deterministically and then wrap it into Value.

  1. Register the distribution to SPI.

Simply add your class into the list returned by the method List<Class<? extends BasicFunction>> declareFunctions().

Method call

The method call is a special case of deterministic function, but its implementation in Java is somewhat simpler. Here is an example of an LPhy script:

data {
  D = readNexus(file="data/primate.nex");
  taxa = D.taxa();
  ...
}

In this script, the first line imports an alignment D from "primate.nex", and the second line uses the method call D.taxa() to extract the taxa object.

To implement this, simply add a Java method with the same name, taxa(), in the Alignment class. Then, add the @MethodInfo annotation with the necessary information. The script line taxa = D.taxa(); will work as long as D is an Alignment object.

It is important to note that the method call must be implemented inside an existing Java class implementing the LPhy object that calls this method.

Inheritance

You can use Java inheritance to reuse code. For example, the RateMatrix, class is the parent class of most substitution models.

Overload

LPhy allows overloading. For example, the 1st script is implemented by Bernoulli

I_siteRates ~ Bernoulli(p=0.5);

The 2nd script is implemented by BernoulliMulti

I ~ Bernoulli(p=0.5, replicates=dim, minSuccesses=dim-2);

LPhy extension mechanism

After you complete the Java implementation, you need to register it using SPI (Service Provider Interface) so that it can be used in an LPhy script.