一.前言
文档图像有多个文本条目(Segment)或者词(Word)或者区域(Region),文档智能核心要解决的两个问题是:
预测这些 Segment(Word、Region)的类别:如下,左侧图的 Segment 的类别如绿色的 “Date”。
预测它们之间的配对Key-Value的关系,如下,右侧图的配对关系如 “From” 和 “Kevin Narko” 的有配对关系。
data:image/s3,"s3://crabby-images/b886c/b886ce56ff37cec24b89c5a62ab5608dd9f134e7" alt="图片"
学习 Segment(Word、Region)良好的 Embedding 表示; 基于学习的 Embedding 来进行分类从而实现类别预测; 基于学习的 Embedding 计算相似度来预测配对关系,配对的 Segment(Word、Region)相似度很高;
StrucText LayoutLMv3 GraphDoc
输入特征 特征融合 自监督任务设计
基于多头注意力的Transformer 基于图论的图卷积 GCN
data:image/s3,"s3://crabby-images/536b8/536b871fe9c56aa686d6f211ee00943c8bd8a3e9" alt="图片"
data:image/s3,"s3://crabby-images/18994/18994c0db1b7ad3ff389a459b337a1839e8b233d" alt="图片"
data:image/s3,"s3://crabby-images/fb5e4/fb5e4db595a150b463b88075373183e26d9901a2" alt="图片"
data:image/s3,"s3://crabby-images/900b4/900b4e3867f8df37092059586bb4617b1177b5b5" alt="图片"
data:image/s3,"s3://crabby-images/f4c39/f4c3973380eae66953f00da152db7280bd3b4c8f" alt="图片"
data:image/s3,"s3://crabby-images/6ffe8/6ffe8009e773bfbe4e334f9150b4b76c7ae5fa7c" alt="图片"
data:image/s3,"s3://crabby-images/37a49/37a499710908ce4a9961dc4f7beef4a5ac42e61f" alt="图片"
data:image/s3,"s3://crabby-images/0bcca/0bccaeb93a74cb61861cf41b7ccd23b01a93eebe" alt="图片"
data:image/s3,"s3://crabby-images/28ed2/28ed2fbbe6a6ff6acf60571f13bec673e3509ef5" alt="图片"
data:image/s3,"s3://crabby-images/daff0/daff0ba9c5def250726c09279fe5fe6cebc66ad5" alt="图片"
data:image/s3,"s3://crabby-images/ffd9e/ffd9ef27883c0df47f25c19782df60a6cf73e354" alt="图片"
data:image/s3,"s3://crabby-images/50f9e/50f9e26f2e25a63c33a9cb98284520a94af89077" alt="图片"
data:image/s3,"s3://crabby-images/a6e7c/a6e7ccf81bfb52609f7a91eda61b1f9866652ffa" alt="图片"
data:image/s3,"s3://crabby-images/74f6e/74f6ed63fecbc5773de5b77997f546015d324f5f" alt="图片"
data:image/s3,"s3://crabby-images/b49d4/b49d4833ca3a9a6b1c2218939b276872627587ed" alt="图片"
data:image/s3,"s3://crabby-images/8427e/8427e16ca63807fd135ec5e211d99cf62adc9e4d" alt="图片"
data:image/s3,"s3://crabby-images/5f5f7/5f5f7fe929d55af650927813c4f7023e3ac5b471" alt="图片"
data:image/s3,"s3://crabby-images/83bb0/83bb033ca5c37733bf8dcd54bb3f0adbe36d46f4" alt="图片"
为了减少计算量和避免过拟合,邻接矩阵中每个节点只会保留数值最大的k条边; 为了让每个节点都能学到全局特征,显式的增加一个全局节点G,让这个G和其他所有节点都有边;
data:image/s3,"s3://crabby-images/45bfd/45bfdb34dd2c62d7f7662200f1ed2d7af3751cf0" alt="图片"