返回

python-Theano-lasagne:如何覆盖lasagne中特定层的渐变?

发布时间:2022-03-03 11:46:09 410
# node.js

我有一个模型,我需要计算模型输入的输出w.r.t的梯度。但我想为模型中的一些层应用一些自定义渐变,从头开始构建可能会很乏味。所以我尝试了[这里][1]解释的想法。我添加了以下两个类:

  • 帮助类,它允许我们用一个输出相同但具有自定义梯度的Op来替换非线性
class ModifiedBackprop(object):
  def __init__(self, nonlinearity):
      self.nonlinearity = nonlinearity
      self.ops = {}  # memoizes an OpFromGraph instance per tensor type

  def __call__(self, x):
      # OpFromGraph is oblique to Theano optimizations, so we need to move
      # things to GPU ourselves if needed.
      if theano.sandbox.cuda.cuda_enabled:
          maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable
      else:
          maybe_to_gpu = lambda x: x
      # We move the input to GPU if needed.
      x = maybe_to_gpu(x)
      # We note the tensor type of the input variable to the nonlinearity
      # (mainly dimensionality and dtype); we need to create a fitting Op.
      tensor_type = x.type
      # If we did not create a suitable Op yet, this is the time to do so.
      if tensor_type not in self.ops:
          # For the graph, we create an input variable of the correct type:
          inp = tensor_type()
          # We pass it through the nonlinearity (and move to GPU if needed).
          outp = maybe_to_gpu(self.nonlinearity(inp))
          # Then we fix the forward expression...
          op = theano.OpFromGraph([inp], [outp])
          # ...and replace the gradient with our own (defined in a subclass).
          op.grad = self.grad
          # Finally, we memoize the new Op
          self.ops[tensor_type] = op
      # And apply the memoized Op to the input we got.
      return self.ops[tensor_type](x)
  • 通过非线性进行引导反向传播的子类:
class GuidedBackprop(ModifiedBackprop):
    def grad(self, inputs, out_grads):
        (inp,) = inputs
        (grd,) = out_grads
        dtype = inp.dtype
        print('It works')
        return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
  • 然后我在代码中使用它们,如下所示:
import lasagne as nn
model_in = T.tensor3()
# model_in = net['input'].input_var
nn.layers.set_all_param_values(net['l_out'], model['param_values'])

relu = nn.nonlinearities.rectify relu_layers = [layer for layer in 
          nn.layers.get_all_layers(net['l_out']) if getattr(layer,
          'nonlinearity', None) is relu] modded_relu = GuidedBackprop(relu)

for layer in relu_layers:
    layer.nonlinearity = modded_relu   

prop = nn.layers.get_output(
    net['l_out'], model_in, deterministic=True)

for sample in range(ini, batch_len):                                
    model_out = prop[sample, 'z']   # get prop for label 'z'
    gradients = theano.gradient.jacobian(model_out, wrt=model_in) 
    # gradients = theano.grad(model_out, wrt=model_in) 
    get_gradients = theano.function(inputs=[model_in],
                                        outputs=gradients)
    grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) 
    grads = np.array(grads)
    grads = grads[sample]

现在,当我运行代码时,它工作正常,没有任何错误,输出的形状也正确。但那是因为它执行默认值西奥。毕业生函数,而不是应该重写它的函数。换句话说毕业生()在课堂上的作用导轨支架从未被调用过。

  1. 我不明白是什么问题?
  2. 有解决办法吗?
  3. 如果这是一个未解决的问题,那么最简单的方法是什么来覆盖模型中部分图层的渐变?
特别声明:以上内容(图片及文字)均为互联网收集或者用户上传发布,本站仅提供信息存储服务!如有侵权或有涉及法律问题请联系我们。
举报
评论区(0)
按点赞数排序
用户头像
下一篇
Vim:在多行中插入相同的字符 2022-03-03 10:30:20