「CV」文本分析资源汇总

6 minute read

光学字符识别；感觉在 2017 年爆炸了；
相关资源：目标检测资源

1 综述

Feature extraction methods for character recognition-A survey
1995-07-19 paper
A Survey Paper on Scene Text Detection Methods
2012 paper
An Overview of Feature Extraction Techniques in OCR for Indian Scripts Focused on Offline Handwriting
2013 paper
Deep Learning for Text Spotting
2014 paper
研究生课题（斯坦福）
Scene Text Detection and Recognition: Recent Advances and Future Trends
2014 paper
Text Detection and Recognition in Imagery: A Survey
2015-07 paper
Context Modeling for Semantic Text Matching and Scene Text Detection
2016 paper
研究生课题；
A Survey on Scene Text Detection and Text Recognition
2016-03 paper
Text Detection and Recognition in Images: A Survey
2018-03-20 paper
学生写的，还不完整；
Scene Text Detection and Recognition: The Deep Learning Era
2018-11-10 paper | github

2 理论

3 其他

Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
ICDAR 2017 2017-10-28 paper | code4downloads
ICDAR 2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019
ICDAR 2019 2019-07-01 paper | competition

4 文本检测

找到文字区域的位置；

4.1 字符

需要分离出两个模型，一个负责检测字符，一个负责合并；除了速度上的降低，还存在分步累计误差，且无法进行端到端的训练；
感觉都好复杂；

Text-Attentional Convolutional Neural Network for Scene Text Detection
2015-10-12 paper
用 CE-MSERs 检测出字符，外加其他方法尽可能多地检测出字符，然后再滤除掉不是字符的区域；后接文本行合成处理；
Text Flow: A Unified Text Detection System in Natural Scene Images
ICCV 2015 2016-04-23 paper
Adaboost 检测字符区域，然后用图进行合并；
Detecting oriented text in natural images by linking segments
CVPR 2017 2017-03-19 paper | tensorflow
SegLink: 提出文本行检测，基于 SSD 进行的改进；先检测字符，再拼接；
WordSup: Exploiting Word Annotations for Character based Text Detection
ICCV 2017 2017-08-22 paper
使用预训练的字符检测网络，对结果进行聚类的到文本块，后接识别模块；
WeText: Scene Text Detection under Weak Supervision
ICCV 2017 2017-10-13 paper
$\bullet \bullet$
SSD 基础上使用弱监督和半监督来扩充训练数据；单数速度较慢，且只能处理水平方向字符；

文章的亮点就在与怎么训练 SSD，但是又没有给出训练的具体细节；

4.2 文本行

4.2.1 常规

Reading Text in the Wild with Convolutional Neural Networks
2014-12-04 paper
Deep Convolutional Neural Networks for Text Spotting in Natural Images
2015 paper
DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
2016-05-24 paper
改进 FasterRCNN，提出了 Inception RPN；
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
AAAI 2017 2016-11-21 paper | caffe-offical
SSD 基础上做的修改以适应文字检测；
Improving Text Proposal for Scene Images with Fully Convolutional Networks
ICPR 2016 2017-02-16 paper | caffe-other
Self-organized Text Detection with Minimal Post-processing via Border Learning
ICCV 2017 paper
$\bullet \bullet$
SODT: 为了解决 character based pipeline 的复杂后处理，提出了基于边界的方法，简化了解码过程；
同时还分析了用二分类问题做检测的局限性；公布了 PPT 文本检测的数据集；
对水平和倾斜文字效果都不错，但是对于扭曲尤其是垂直文字效果不好；

TextField， TextMoutain， pixellink，psenet 等方法，实际上都是基于文字区域的分割然后加上对边界的校准，提升算法的性能，即检测+分割+回归；同时，也可以看出，为了让算法检测出更为复杂的文本，边界的设计尤为重要；
PixelLink: Detecting Scene Text via Instance Segmentation
AAAI 2018 2018-01-04 paper | tensorflow-offical

4.2.2 多角度

Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork
2016-03-31 paper
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
2017-03-03 paper | caffe-offical | pytorch
RRPN
旋转 pooling；
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
CVPR 2017 2017-03-04 paper
使用四边形检测文本；
Deep Direct Regression for Multi-Oriented Scene Text Detection
2017-03-24 paper
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
2017-04-03 paper
基于 YOLO，先用分割的方法进行文字区域粗定位，再检测问题具体的位置； 1080Ti 上 1000×5600 的图像需要 450ms；
EAST: An Efficient and Accurate Scene Text Detector
CVPR 2017 2017-04-11 paper | tensorflow | pytorch
参考了 DenseBox，使用 FCN 和 SSD 的多层输出；针对多方向问题，使用了旋转框和四边形；最后使用的是局部感知 nms；
在 ICDAR 数据集上还可以，但是在其他数据集上效果很不理想；
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
2017-06-29 paper | tensorflow | caffe | 一些改进代码
FasterRCNN 基础上做扩展；调整了 ROIPooling 的尺寸，引进了倾斜检测框，当然也针对倾斜的情况修正了 NMS；
Single Shot Text Detector with Regional Attention
ICCV 2017 2017-09-01 paper | caffe-offical | pytorch
SSD + attention，通过修改网络结构提升检测效果；所谓的 attention 就是融入了分割，来指导模型训练；

方案本身让工程实践变得更复杂，当然如果本身就有大量分割的标注，可以用这个方法；
TextBoxes++: A Single-Shot Oriented Scene Text Detector
2018-01-09 paper | torch + caffe
多 anchor box 进行了倾斜和增多；
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
CVPR 2018 2018-02-25 白翔组 paper | pytorch-offical
针对多方向文本识别；检测到文本区域的四个顶点，以提高检测任务对文本问题的适应性，同时减小了后处理过程；

怎么感觉那么像 cornernet 呢；

4.2.3 不规则文字

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
2019-10-10 paper

5 文字识别

识别出文字的内容；可以说是图片转文字；

5.1 常规

5.1.1 CNN

5.1.2 CRNN

Reading Scene Text in Deep Convolutional Sequences
AAAI 2016 2015-06-14 汤晓鸥组 paper
使用 RNN 进行字符识别，可以应对形变大和有歧义的字符串，并且可以识别新组合的字符串；和白翔组的 CRNN 很相似；
文章还对比了 Maxout 和 RELU；
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
2015-07-21 paper
CRNN:

5.1.3 其他

Focusing Attention: Towards Accurate Text Recognition in Natural Images
ICCV 2017 2017-09-07 海康威视、复旦大学与上交 paper
$\bullet \bullet$
FAN: 注意力聚焦网络；作者发现对于低质图像，注意力会失效，也就是 attention drift——注意力不能精确联系特征向量与输入图像对应的目标区域；

咋发现漂移这个事的

5.2 不规则

5.2.1 CNN

Scene Text Recognition from Two-Dimensional Perspective
AAAI 2019 2018-09-18 白翔组 paper
检测时加入分割，定位到每个字符的位置，然后对单个字符进行分类，以分类代替是识别；同时使用了可变形卷积，以提取文字区域不同形状的特征；

5.2.2 其他

Robust Scene Text Recognition with Automatic Rectification
CVPR 2016 2016-03-12 paper
使用 STN 空间仿射网络对不规则文本进行仿射变换，修正后再送入识别网络；
SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoderdecoder Network
AAAI 2018 2018 paper
Handwriting Recognition in Low-resource Scripts using Adversarial Learning
CVPR 2019 2018-11-04 paper
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
CVPR 2019 2018-12-14 paper
ESIR:设计了仿射网络；

5.3 其他

6 端到端文字识别

检测 + 识别不中断；

6.1 LSTM

Unambiguous Text Localization and Retrieval for Cluttered Scenes
CVPR 2017 paper

6.2 其他

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
ICCV 2017 2017-07-13 paper
FasterRCNN 接 LSTM；该方法在复杂任务上效果不是很好，原因是检测结果对整个任务起到决定性作用；
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
ICCV 2017 2017 paper
检测任务中，除了坐标位置，还加入了旋转角度参数；但仍受检测结果限制；
Attention-based Extraction of Structured Information from Street View Imagery
ICDAR 2017 2017-04-11 Google paper
$\bullet \bullet$
针对多视角的街景采集数据进行 OCR；使用空间注意力代替检测；
STN-OCR: A single Neural Network for Text Detection and Text Recognition
2017-07-27 paper
Detection and Recognition of Text Embedding in Online Images via Neural Context Models
AAAI 2017 paper
CTSN:
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
AAAI 2018 2017-12-14 paper | tensorflow
SEE:
FOTS: Fast Oriented Text Spotting with a Unified Network
CVPR 2018 2018-01-05 paper | pytorch
FOTS: FPN + 角度；插值后送入双向 LSTM；单阶段检测网络，所以速度快一点；
An end-to-end TextSpotter with Explicit Alignment and Attention
CVPR 2018 2018-03-09 paper | caffe-offical
与 TextSpotter 类似，加入了角度信息，然后进行 ROIPooling，再送入 RNN + Seq2Seq + Attention；
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
ECCV 2018 2018-07-06 旷视 paper
不规则文本处理，借鉴 MaskRCNN；
Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline
ECCV 2018 2018 paper
PRNet:发布了一个包含 250k 图的中国车牌数据集 CCPD，每个标注包含 1 个 box，4 个定位点，及文字 GT；
对不同尺度的特征图进行 ROIPooling，拼接后进行识别；只是一个 BaseLine 方案；

7 数据集

7.1 数据集

MIDV-2019: Challenges of the modern mobile-based document OCR
2019-10-09 paper | 数据集
公开了一个自然场景下拍摄的身份证件数据集；

7.2 数据生成

A Method to Generate Synthetically Warped Document Image
2019-10-15 paper
文档图像仿射变换；

感觉一个仿射变换就够了；
TextRecognitionDataGenerator
awesome-SynthText
Synthetic Data for Text Localisation in Natural Images
CVPR 2016 2016-04-22 paper | github | home

TOP

附录

A 研究员

白翔小组
廖明辉，石葆光，白翔, 王兴刚，刘文予
汤晓欧组
黄伟林, 乔宇，

B 参考资料

中文字幕分析
SSD 做的检测，然后自适应与之分割，最后进行识别；
awesome-ocr
paper with code
场景文本位置感知与识别的论文资源汇总
52CV-文字识别
52CV-文本检测
Scene Text Recognition Resources
deep-text-recognition-benchmark
现有方法的对比；

C 开源代码

a 库

OCRE(OCR Easy)
Clara OCR
99 年起的一个项目，C 语言开发的，支持 win 和 linux；提供了 GUI 和 Web 接口；
OCRAD
2019-01-11 分布的，基于特征提取的方式分析文本，可以按列或者空格对文字进行分割；
JOCR
2013-04-15 最后一次更新；
OCR-CHIE
2001-03；
TESSERACT-OCR
2005年由 HP 开源，2006年之后，由 Google 维护； 4.0 版本中加入了 LSTM；目前支持 100 多种语言的文本识别；支持文档格式包括文本、HTML 和 PDF 等；提供 C 和 C++ 接口；

b 工程

textDetectionWithScriptID

D 数据集

名称	语言	特点	字符检测	单词检测	文本行检测	字符识别	单词识别	端到端	数量	训练	测试	大小(G)
MNIST	数字	水平手写					$\checkmark$
USPS	数字	水平手写					$\checkmark$
UCI 1991	英文				$\checkmark$				20000
ICDAR 2011 Web	英文			$\checkmark$			$\checkmark$	$\checkmark$
ICDAR 2011 Scene
ICDAR 2013 Web	英文		$\checkmark$	$\checkmark$		$\checkmark$	$\checkmark$	$\checkmark$
ICDAR 2013 Scene
ICDAR 2013 videos
SVHN 2011	数字	自然场景	$\checkmark$			$\checkmark$				33,402 （73,257）	13,068 （26,032）
COCO-Text 2016	英文字符？	打印手写水平		$\checkmark$			$\checkmark$	$\checkmark$		43,686	20,000
SynthText 2016	英文	自然场景	$\checkmark$	$\checkmark$	$\checkmark$				80万（800万）			40
Chars74K

文字检测与识别资料整理（数据库，代码，博客）【持续更新】
PubTabNet
表格识别； github 比赛

E 报告

OCR and Text Spotting

Twitter Facebook LinkedIn

「CV」文本分析资源汇总

1 综述

2 理论

3 其他

4 文本检测

4.1 字符

4.2 文本行

4.2.1 常规

4.2.2 多角度

4.2.3 不规则文字

5 文字识别

5.1 常规

5.1.1 CNN

5.1.2 CRNN

5.1.3 其他

5.2 不规则

5.2.1 CNN

5.2.2 其他

5.3 其他

6 端到端文字识别

6.1 LSTM

6.2 其他

7 数据集

7.1 数据集

7.2 数据生成

附录

A 研究员

B 参考资料

C 开源代码

a 库

b 工程

D 数据集

E 报告

Comments

You May Also Enjoy

「论文解读」Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

「CV」深度估计概述

「工具」 Zotero

「DLFramework」 A311D NPU Demo 使用

1 综述

2 理论

3 其他

4 文本检测

4.1 字符

4.2 文本行

4.2.1 常规

4.2.2 多角度

4.2.3 不规则文字

5 文字识别

5.1 常规

5.1.1 CNN

5.1.2 CRNN

5.1.3 其他

5.2 不规则

5.2.1 CNN

5.2.2 其他

5.3 其他

6 端到端文字识别

6.1 LSTM

6.2 其他

7 数据集

7.1 数据集

7.2 数据生成

附录

A 研究员

B 参考资料

C 开源代码

a 库

b 工程

D 数据集

E 报告

Comments

You May Also Enjoy

「论文解读」Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

「CV」 深度估计概述

「工具」 Zotero

「DLFramework」 A311D NPU Demo 使用

「CV」深度估计概述