Sequence-to-sequence Package

This package contains test suites for sequence-to-sequence models. To test your model, in your tests directory create a new class that extends pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper and builds an instance of model to be tested. Then, create a test suite that inherits from the pangolinn tester you want to use and set the attribute module_wrapper_class to the name of the wrapper class of your module. And you are done! You can run the tests for your model.

Example Usage

This example tests that the PyTorch implementation of the Transformer decoder is autoregressive. First, create a pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper for it.

from pangolinn import seq2seq

class TransformerDecoderWrapper(seq2seq.PangolinnSeq2SeqModuleWrapper):
    """
    Wrapper to test a layer that does not look at the future, so it is safe in causal models.
    """
    def build_module(self) -> nn.Module:
        return nn.TransformerDecoderLayer(
            self.num_input_channels, 1, dim_feedforward=8, batch_first=True)

    @property
    def num_input_channels(self) -> int:
        return 4

    @staticmethod
    def generate_attention_mask(sz: int, device: str = "cpu") -> torch.Tensor:
        """ Generate the attention mask for causal decoding """
        mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1).float()
        mask.masked_fill_(mask == 0, float("-inf"))
        mask.masked_fill_(mask == 1, float(0.0))
        return mask.to(device)

    def forward(self, x: Tensor, lengths: LongTensor) -> Tensor:
        fake_encoder_out = torch.ones(x.shape[0], 1, self.num_input_channels)
        fake_encoder_out = fake_encoder_out.to(x.device)
        tgt_mask = self.generate_attention_mask(x.shape[1])
        return self._module(x, memory=fake_encoder_out, tgt_mask=tgt_mask)

Then, create a test suite that uses pangolinn pangolinn.seq2seq.CausalTestCase and inform it about the wrapper class to be used.

class CausalDecoderTestCase(seq2seq.CausalTestCase):
    module_wrapper_class = TransformerDecoderWrapper

API Reference

class pangolinn.seq2seq.CausalTestCase(methodName='runTest')

This class provides unit tests to enforce that the module to be tested is safe to be used in causal models, i.e. that do not look at future elements. This property should be enforced for e.g. Transformer autoregressive decoders.

To use it to test your network:

create a PangolinnSeq2SeqModuleWrapper that wraps your module (e.g. MyWrapper);

create test class that extends CausalTestCase;

in your test class, override the class attribute module_wrapper_class by setting it to the class of your wrapper (e.g., module_wrapper_class = MyWrapper);

setUp() → None: Hook method for setting up the test fixture before exercising it.

test_gradient_not_flowing_from_future(): Checks that the gradient is not backpropagated to future input time steps, which should not be used to compute the output.

test_not_looking_at_the_future(): Tests that the module masks future elements and it does not look at them.

class pangolinn.seq2seq.EncoderPaddingTestCase(methodName='runTest')

This class provides unit tests to enforce that the module to be tested properly handles padding, i.e. returns the same results regardless of the amount of padding present in the batched input sequences.

To use it to test your network:

create a PangolinnSeq2SeqModuleWrapper that wraps your module (e.g. MyWrapper);

create test class that extends EncoderPaddingTestCase;

in your test class, override the class attribute module_wrapper_class by setting it to the class of your wrapper (e.g., module_wrapper_class = MyWrapper);

setUp() → None: Hook method for setting up the test fixture before exercising it.

test_batch_size_does_not_matter(): Tests that for the same input we get the same output regardless of the amount of padding.

test_padding_area_is_zero(): Tests that the padding area of the output contains all zeroes. Although the presence of non-zero elements in the passing area is not an issue on its own, elaborations (e.g., convolutions) on top of non-zero-padded tensors might cause issues.

class pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper

Wrapper of your module used in pangolinn tests. To test your network, extend this class by implementing at least:

build_module;

num_input_channels;

forward.

Depending on the behavior of your module you might need to override other methods. Please refer to the documentation of each method to check if you need (or not) to override it.

build_module() → Module

This method is responsible for building the module that you want to test. Override this method so that it creates an instance of the module to be tested. The module does not need to be initialized with specific weights.

Returns:: the network you want to test.

forward(x: Tensor, lengths: LongTensor) → Tensor

Processes x with the wrapped module and returns the output.

Parameters:

x – the tensor to be fed to the wrapped module with shape (batch, seq_len, channels)
lengths – tensor of shape (batch, ) that contains the length of the valid tokens for each of the sequences in the batch.

Returns:

the tensor produced by the module with shape (batch, seq_len, channels)

property input_dtype: dtype

Returns:: the dtype of the input tensor expected by the module. Defaults to torch.float.

property max_value_allowed: int

Returns:: if input_dtype is set to torch.int or another integer, this determines the maximum value supported as input of the module to be tested. Defaults to 10.

property num_input_channels: int

This property has to be overridden and returns the number of channels expected in input by the module to be tested. It is typically the third dimension of sequence-to-sequence modules, in addition to the batch size and the sequence length. In the case of modules that expect a single integer (e.g., the text token id in text processing), this should be set to 1 and when overriding the forward method you can squeeze the last dimension before feeding the module with it.

Returns:: the number of channels expected in the input tensor by the module to be tested.

property num_output_channels: int

By default, this property returns as output channels the same as the input channels. If your module changes the number of channels, override this method accordingly.

Returns:: the number of channels produced in output by the module to be tested.

output_sequence_length(input_sequence_len: int) → int

By default, this property returns as output sequence length the input_sequence_len divided by the downsampling factor. If your module has a more complicated function to determine the output sequence length from the input sequence length, override this method accordingly.

Parameters:: input_sequence_len – the length of a sequence to be fed to the module.
Returns:: the length of the valid tokens in the output sequence.

property sequence_downsampling_factor: int

If the module to be tested reduces the sequence length (e.g., as strided convolutions do), this property states the downsampling factor.

Returns:: the downsampling factor over the sequence length (e.g., in speech processing Transformer-based architectures, this is often 4). By default, this is set to 1, which means no downsampling.