Skip to content

This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifying Transformer Models with Trainable Representation Pooling."

Notifications You must be signed in to change notification settings

applicaai/pyramidions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sparsifying Transformer Models with Trainable Representation Pooling

PWC PWC

This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper Sparsifying Transformer Models with Trainable Representation Pooling.

The detailed README on how to use it is provided in examples. See fairseq/examples/deep_pyramidion for more.

[arXiv, BibTeX]

General information

The method we propose is inspired by this search for relevant fragments, which is an important aspect of human cognition when engaged in reading to do actions. We intend to mimic relevance judgments and hypothesize that it is possible to answer problems involving natural language with only selected passages of the input text.

These passages may be of substantially shorter length than the original text. One may compare this to a person reading the paper and highlighting in such a way that it is possible to provide a summary using only the highlighted parts.

The end-to-end mechanism we introduce performs such highlighting by scoring the representations and passes only the selected ones to the next layer of the neural network.

Figure. An illustration of sparse attention matrices assuming a three-layer encoder and decoder (separated by the dashed line). The blue color reflects the memory consumption of self-attention (encoder) and cross-attention (decoder). (A) The complete input consumed at once. (B) Memory reduced with blockwise attention and (C) pooling applied after the encoder. (D) Gradual reduction of memory by pooling after every layer.

About

This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifying Transformer Models with Trainable Representation Pooling."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published