get_aligned_BERT_emb

Get the aligned BERT embedding for sequence labeling tasks

Installing as a dependency

pip install aligned-bert-embedder

Installing dependencies

conda env create -f environment.yml

Example of usage from cmd (not recommended):

python -m aligned_bert_embedder embed aligned_bert_embedder/configs/snip.yml aligned_bert_embedder/texts/triple.txt

Example of usage from code (preferable)

from aligned_bert_embedder import AlignedBertEmbedder

embeddings = AlignedBertEmbedder(config).embed(
  (
    (
      'First', `sentence`, `or`, `other`, `context`, `chunk`
    ),
    (
      `Second`, `sentence`
    )
  )
)

The following is the content of the original README.md file from the developer repo.

Why this repo?

In the origin script extract_features.py in BERT, tokens may be splited into pieces as follows:

orig_tokens = ["John", "Johanson", "'s",  "house"]
bert_tokens = ["[CLS]", "john", "johan", "##son", "'", "s", "house", "[SEP]"]
orig_to_tok_map = [1, 2, 4, 6]

We investigate 3 align strategies (first, mean and max) to maintain an original-to-tokenized alignment. Take the "Johanson -> johan, ##son" as example:

first: take the representation of johan as the whole word Johanson
mean: take the reduce_mean value of representations of johan and ##son as the whole word Johanson
max: take the reduce_max value of representations of johan and ##son as the whole word Johanson

How to use this repo?

sh run.sh input_file outout_file BERT_BASE_DIR
# For example:
sh run.sh you_data you_data.bert path/to/bert/uncased_L-12_H-768_A-12

You can modify layers and align_strategies in the run.sh.

How to load the output embeddings?

After the above procedure, you are expected to get a output file of contextual embeddings (e.g., your_data_6_mean). Then you can load this file like conventional word embeddings. For example in a python script:

with open("your_data_6_mean", "r", encoding="utf-8") as bert_f"
    for line in bert_f:
        bert_vec = [[float(value) for value in token.split()] for token in line.strip().split("|||")]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

get_aligned_BERT_emb

Installing as a dependency

Installing dependencies

Example of usage from cmd (not recommended):

Example of usage from code (preferable)

Why this repo?

How to use this repo?

How to load the output embeddings?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
aligned_bert_embedder		aligned_bert_embedder
configs		configs
texts		texts
LICENSE		LICENSE
README.md		README.md
aligned_bert_embedder_		aligned_bert_embedder_
environment.yml		environment.yml
run.sh		run.sh
setup.py		setup.py

License

zeionara/aligned_bert_embedder

Folders and files

Latest commit

History

Repository files navigation

get_aligned_BERT_emb

Installing as a dependency

Installing dependencies

Example of usage from cmd (not recommended):

Example of usage from code (preferable)

Why this repo?

How to use this repo?

How to load the output embeddings?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages