The problem
Training works ok on a CPU using config = tf.ConfigProto(device_count = {'GPU': 0}))
,
but returns and error the 'CUDA_ERROR_ILLEGAL_ADDRESS' when the GPU is enabled.
Operation system is windows
> I c:=tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.49GiB
I c:\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 11GB, pci bus id: 0000:01:00.0)
Training...
E c:\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: **CUDA_ERROR_ILLEGAL_ADDRESS**
F c:\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:198] Unexpected Event status: 1
Solution 1: Upgrade to 0.12 or higher
If you see this error try first to upgrade tensorflow to higher than 0.12. We can confirma the with tensorflow 1.0.0 the issue was fixes.
Solution 2: Monkeypatch target_one_hot function
You can use the following wrapper to avoid the error:
def one_hot_patch(x,depth):
#workaround by name-name
sparse_labels=tf.reshape(x,[-1,1])
derived_size=tf.shape(sparse_labels)[0]
indices=tf.reshape(tf.range(0,derived_size,1),[-1,1])
concated=tf.concat(1,[indices,sparse_labels])
outshape=tf.concat(0,[tf.reshape(derived_size,[1]),tf.reshape(depth,[1])])
return tf.sparse_to_dense(concated, outshape,1.0,0.0)
target_one_hot=one_hot_patch(targets,vocab_size_targets)