Optimizer filter notes

The DAML implementation uses this filter when constructing the optimizer Adam(lr = self.meta_lr, params=filter(lambda x: x.requires_grad, self.m.parameters()),weight_decay=1e-5).

I'm unsure how important this is, but for now, I've hard coded this into the from_params for the optimizer and the constructor for the metatrainer (for the meta_optimizer).

I need to get a metatrainer working and then figure out how to make it more flexible in the allennlp framework.