- NMT: Neural Machine Translation
- SMT: Static Machine Translation
- BLUE: Bilingual Evaluation Understudy
- WMT-14 dataset:
- beam-search: beam search是对贪心策略一个改进。在每一个时间步,不再只保留当前分数最高的1个输出,而是保留num_beams个。当num_beams=1时集束搜索就退化成了贪心搜索。
- <EOS>: end-of-sentence symbol
Sequence to Sequence Learning with Neural Networks
Sequential Problems: Lengths are not known a-priori.
Using a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, then another deep LSTM to decode the target sequence from the vector.
BLUE
$$BLEU = BP \times exp(\sum_{n=1}^{N}w_n\log{P_n})$$
其中 $BP$ 是惩罚因子(Brevity Penalty),为了避免评分的偏向性,所以在结果中引入。
Recurrent Neural Network
Given a sequence of inputs $(x1, \dots, x_T)$, a standard RNN computes a sequence of outputs $(y_1, \dots , y_T)$ by iterating the following equation:
$$h_t = sigm(W^{hx}x_t+W^{hh}h_{t-1})$$
$$y_t = W^{yh}h_t$$
However, it is not clear how to apply an RNN to problems whose input and the output sequences have different lengths with complicated and non-monotonic relationships.
The goal of LSTM is known to learn problems with long range temporal dependencies, and estimate the conditional probability $p(y_1, \dots, y_{T’}|x_1, \dots, x_T)$ (length $T’$ may differ from $T$)
- 用了两种不同的lstm,一种是处理输入序列,一种是处理输出序列;
- 更深的lstm会比浅的lstm效果更好,所以论文选择了四层;
- 将输入的序列翻转之后作为输入效果更好一些。
Decoding and Rescoring
Trained the model by maximizing the log probability of a correct translation $T$ given the source sentence $S$:
$$ 1/|\mathcal{S}| \sum_{(T,S)\in \mathcal{S}} \log p(T|\mathcal{S})$$