python-Theano-lasagne:如何覆盖lasagne中特定层的渐变?
发布时间:2022-03-03 11:46:09 398
相关标签: # node.js
我有一个模型,我需要计算模型输入的输出w.r.t的梯度。但我想为模型中的一些层应用一些自定义渐变,从头开始构建可能会很乏味。所以我尝试了[这里][1]解释的想法。我添加了以下两个类:
- 帮助类,它允许我们用一个输出相同但具有自定义梯度的Op来替换非线性
class ModifiedBackprop(object): def __init__(self, nonlinearity): self.nonlinearity = nonlinearity self.ops = {} # memoizes an OpFromGraph instance per tensor type def __call__(self, x): # OpFromGraph is oblique to Theano optimizations, so we need to move # things to GPU ourselves if needed. if theano.sandbox.cuda.cuda_enabled: maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable else: maybe_to_gpu = lambda x: x # We move the input to GPU if needed. x = maybe_to_gpu(x) # We note the tensor type of the input variable to the nonlinearity # (mainly dimensionality and dtype); we need to create a fitting Op. tensor_type = x.type # If we did not create a suitable Op yet, this is the time to do so. if tensor_type not in self.ops: # For the graph, we create an input variable of the correct type: inp = tensor_type() # We pass it through the nonlinearity (and move to GPU if needed). outp = maybe_to_gpu(self.nonlinearity(inp)) # Then we fix the forward expression... op = theano.OpFromGraph([inp], [outp]) # ...and replace the gradient with our own (defined in a subclass). op.grad = self.grad # Finally, we memoize the new Op self.ops[tensor_type] = op # And apply the memoized Op to the input we got. return self.ops[tensor_type](x)
- 通过非线性进行引导反向传播的子类:
class GuidedBackprop(ModifiedBackprop): def grad(self, inputs, out_grads): (inp,) = inputs (grd,) = out_grads dtype = inp.dtype print('It works') return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
- 然后我在代码中使用它们,如下所示:
import lasagne as nn model_in = T.tensor3() # model_in = net['input'].input_var nn.layers.set_all_param_values(net['l_out'], model['param_values']) relu = nn.nonlinearities.rectify relu_layers = [layer for layer in nn.layers.get_all_layers(net['l_out']) if getattr(layer, 'nonlinearity', None) is relu] modded_relu = GuidedBackprop(relu) for layer in relu_layers: layer.nonlinearity = modded_relu prop = nn.layers.get_output( net['l_out'], model_in, deterministic=True) for sample in range(ini, batch_len): model_out = prop[sample, 'z'] # get prop for label 'z' gradients = theano.gradient.jacobian(model_out, wrt=model_in) # gradients = theano.grad(model_out, wrt=model_in) get_gradients = theano.function(inputs=[model_in], outputs=gradients) grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) grads = np.array(grads) grads = grads[sample]
现在,当我运行代码时,它工作正常,没有任何错误,输出的形状也正确。但那是因为它执行默认值西奥。毕业生函数,而不是应该重写它的函数。换句话说毕业生()在课堂上的作用导轨支架从未被调用过。
- 我不明白是什么问题?
- 有解决办法吗?
- 如果这是一个未解决的问题,那么最简单的方法是什么来覆盖模型中部分图层的渐变?
特别声明:以上内容(图片及文字)均为互联网收集或者用户上传发布,本站仅提供信息存储服务!如有侵权或有涉及法律问题请联系我们。
举报