Sequence-to-sequence Package
This package contains test suites for sequence-to-sequence models.
To test your model, in your tests directory create a new class
that extends pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper
and builds an instance of model to be tested. Then, create a test suite
that inherits from the pangolinn tester you want to use and set the
attribute module_wrapper_class to the name of the wrapper
class of your module. And you are done! You can run the tests for your model.
Example Usage
This example tests that the PyTorch implementation of the Transformer
decoder is autoregressive. First, create a
pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper for it.
from pangolinn import seq2seq
class TransformerDecoderWrapper(seq2seq.PangolinnSeq2SeqModuleWrapper):
"""
Wrapper to test a layer that does not look at the future, so it is safe in causal models.
"""
def build_module(self) -> nn.Module:
return nn.TransformerDecoderLayer(
self.num_input_channels, 1, dim_feedforward=8, batch_first=True)
@property
def num_input_channels(self) -> int:
return 4
@staticmethod
def generate_attention_mask(sz: int, device: str = "cpu") -> torch.Tensor:
""" Generate the attention mask for causal decoding """
mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1).float()
mask.masked_fill_(mask == 0, float("-inf"))
mask.masked_fill_(mask == 1, float(0.0))
return mask.to(device)
def forward(self, x: Tensor, lengths: LongTensor) -> Tensor:
fake_encoder_out = torch.ones(x.shape[0], 1, self.num_input_channels)
fake_encoder_out = fake_encoder_out.to(x.device)
tgt_mask = self.generate_attention_mask(x.shape[1])
return self._module(x, memory=fake_encoder_out, tgt_mask=tgt_mask)
Then, create a test suite that uses pangolinn
pangolinn.seq2seq.CausalTestCase and inform it about
the wrapper class to be used.
class CausalDecoderTestCase(seq2seq.CausalTestCase):
module_wrapper_class = TransformerDecoderWrapper
API Reference
- class pangolinn.seq2seq.CausalTestCase(methodName='runTest')
This class provides unit tests to enforce that the module to be tested is safe to be used in causal models, i.e. that do not look at future elements. This property should be enforced for e.g. Transformer autoregressive decoders.
To use it to test your network:
create a PangolinnSeq2SeqModuleWrapper that wraps your module (e.g. MyWrapper);
create test class that extends CausalTestCase;
in your test class, override the class attribute module_wrapper_class by setting it to the class of your wrapper (e.g., module_wrapper_class = MyWrapper);
- test_gradient_not_flowing_from_future()
Checks that the gradient is not backpropagated to future input time steps, which should not be used to compute the output.
- test_not_looking_at_the_future()
Tests that the module masks future elements and it does not look at them.
- class pangolinn.seq2seq.EncoderPaddingTestCase(methodName='runTest')
This class provides unit tests to enforce that the module to be tested properly handles padding, i.e. returns the same results regardless of the amount of padding present in the batched input sequences.
To use it to test your network:
create a PangolinnSeq2SeqModuleWrapper that wraps your module (e.g. MyWrapper);
create test class that extends EncoderPaddingTestCase;
in your test class, override the class attribute module_wrapper_class by setting it to the class of your wrapper (e.g., module_wrapper_class = MyWrapper);
- test_batch_size_does_not_matter()
Tests that for the same input we get the same output regardless of the amount of padding.
- test_padding_area_is_zero()
Tests that the padding area of the output contains all zeroes. Although the presence of non-zero elements in the passing area is not an issue on its own, elaborations (e.g., convolutions) on top of non-zero-padded tensors might cause issues.
- class pangolinn.seq2seq.PangolinnSeq2SeqModuleWrapper
Wrapper of your module used in pangolinn tests. To test your network, extend this class by implementing at least:
build_module;
num_input_channels;
forward.
Depending on the behavior of your module you might need to override other methods. Please refer to the documentation of each method to check if you need (or not) to override it.
- build_module() Module
This method is responsible for building the module that you want to test. Override this method so that it creates an instance of the module to be tested. The module does not need to be initialized with specific weights.
- Returns:
the network you want to test.
- forward(x: Tensor, lengths: LongTensor) Tensor
Processes x with the wrapped module and returns the output.
- Parameters:
x – the tensor to be fed to the wrapped module with shape (batch, seq_len, channels)
lengths – tensor of shape (batch, ) that contains the length of the valid tokens for each of the sequences in the batch.
- Returns:
the tensor produced by the module with shape (batch, seq_len, channels)
- property input_dtype: dtype
- Returns:
the dtype of the input tensor expected by the module. Defaults to torch.float.
- property max_value_allowed: int
- Returns:
if input_dtype is set to torch.int or another integer, this determines the maximum value supported as input of the module to be tested. Defaults to 10.
- property num_input_channels: int
This property has to be overridden and returns the number of channels expected in input by the module to be tested. It is typically the third dimension of sequence-to-sequence modules, in addition to the batch size and the sequence length. In the case of modules that expect a single integer (e.g., the text token id in text processing), this should be set to 1 and when overriding the forward method you can squeeze the last dimension before feeding the module with it.
- Returns:
the number of channels expected in the input tensor by the module to be tested.
- property num_output_channels: int
By default, this property returns as output channels the same as the input channels. If your module changes the number of channels, override this method accordingly.
- Returns:
the number of channels produced in output by the module to be tested.
- output_sequence_length(input_sequence_len: int) int
By default, this property returns as output sequence length the input_sequence_len divided by the downsampling factor. If your module has a more complicated function to determine the output sequence length from the input sequence length, override this method accordingly.
- Parameters:
input_sequence_len – the length of a sequence to be fed to the module.
- Returns:
the length of the valid tokens in the output sequence.
- property sequence_downsampling_factor: int
If the module to be tested reduces the sequence length (e.g., as strided convolutions do), this property states the downsampling factor.
- Returns:
the downsampling factor over the sequence length (e.g., in speech processing Transformer-based architectures, this is often 4). By default, this is set to 1, which means no downsampling.