一.前言
文档图像有多个文本条目(Segment)或者词(Word)或者区域(Region),文档智能核心要解决的两个问题是:
预测这些 Segment(Word、Region)的类别:如下,左侧图的 Segment 的类别如绿色的 “Date”。
预测它们之间的配对Key-Value的关系,如下,右侧图的配对关系如 “From” 和 “Kevin Narko” 的有配对关系。
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/0317feecea1a44089a80479355b6ec3d.png)
学习 Segment(Word、Region)良好的 Embedding 表示; 基于学习的 Embedding 来进行分类从而实现类别预测; 基于学习的 Embedding 计算相似度来预测配对关系,配对的 Segment(Word、Region)相似度很高;
StrucText LayoutLMv3 GraphDoc
输入特征 特征融合 自监督任务设计
基于多头注意力的Transformer 基于图论的图卷积 GCN
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/2f32ab38c8a44a9f93be2d3545d1f8b4.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/4cc9a2fdb4314834ab1cafbb4ae1212c.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/aaff1614c6ac455b95fc78aa1c772646.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/5bc2f0def4604b4d8c75913864590ac0.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/b644747b59f94c3c828d4a0b4df016ce.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/cbc85c6e9412480a8ca86f869f54da9b.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/2730f2eef17b4b3a9a97b5467e90952b.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/97d51aa5629f4fddbaaa34e31d2f1c38.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/2d9f31b29732439aa6cdb7e0ecaccc7d.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/65397d53f4fe4d1e88afa75cabcc908d.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/6e5c9a67a38a4349b4199fe83c7f207c.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/0ba438bc8a1d45cf821536b668dccd89.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/3e32bd462c124fc88efb4e76ee42af08.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/12cd8955b1544e5bb4410b57c73e9a3c.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/ff04660936b64e58bd2fbac8ac554f8b.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/c221a259c7fe4fb08517c5ddbd4f44c1.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/2ffcdb1896db4355aa6f4786ed6b72b8.png)
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/d9efe27aeac84a79baf16df8e3f71f81.png)
为了减少计算量和避免过拟合,邻接矩阵中每个节点只会保留数值最大的k条边; 为了让每个节点都能学到全局特征,显式的增加一个全局节点G,让这个G和其他所有节点都有边;
![图片](https://bj-res.laiye.com/LaiYeProduction/UEditor/20220519/3fd42938688142d4871a3fc80b84af8a.png)