「CV」 文本分析资源汇总
光学字符识别;感觉在 2017 年爆炸了;
相关资源:目标检测资源
1 综述
-
Feature extraction methods for character recognition-A survey
1995-07-19 paper -
An Overview of Feature Extraction Techniques in OCR for Indian Scripts Focused on Offline Handwriting
2013 paper -
Deep Learning for Text Spotting
2014 paper
研究生课题(斯坦福) -
Scene Text Detection and Recognition: Recent Advances and Future Trends
2014 paper -
Text Detection and Recognition in Imagery: A Survey
2015-07 paper -
Context Modeling for Semantic Text Matching and Scene Text Detection
2016 paper
研究生课题; -
A Survey on Scene Text Detection and Text Recognition
2016-03 paper -
Text Detection and Recognition in Images: A Survey
2018-03-20 paper
学生写的,还不完整; -
Scene Text Detection and Recognition: The Deep Learning Era
2018-11-10 paper | github
2 理论
3 其他
-
Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
ICDAR 2017 2017-10-28 paper | code4downloads -
ICDAR 2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019
ICDAR 2019 2019-07-01 paper | competition
4 文本检测
找到文字区域的位置;
4.1 字符
需要分离出两个模型,一个负责检测字符,一个负责合并;除了速度上的降低,还存在分步累计误差,且无法进行端到端的训练;
感觉都好复杂;
-
Text-Attentional Convolutional Neural Network for Scene Text Detection
2015-10-12 paper
用 CE-MSERs 检测出字符,外加其他方法尽可能多地检测出字符,然后再滤除掉不是字符的区域;后接文本行合成处理; -
Text Flow: A Unified Text Detection System in Natural Scene Images
ICCV 2015 2016-04-23 paper
Adaboost 检测字符区域,然后用图进行合并; -
Detecting oriented text in natural images by linking segments
CVPR 2017 2017-03-19 paper | tensorflow
SegLink: 提出文本行检测,基于 SSD 进行的改进;先检测字符,再拼接; -
WordSup: Exploiting Word Annotations for Character based Text Detection
ICCV 2017 2017-08-22 paper
使用预训练的字符检测网络,对结果进行聚类的到文本块,后接识别模块; -
WeText: Scene Text Detection under Weak Supervision
ICCV 2017 2017-10-13 paper
$\bullet \bullet$
SSD 基础上使用弱监督和半监督来扩充训练数据;单数速度较慢,且只能处理水平方向字符;
文章的亮点就在与怎么训练 SSD,但是又没有给出训练的具体细节;
4.2 文本行
4.2.1 常规
-
Reading Text in the Wild with Convolutional Neural Networks
2014-12-04 paper -
Deep Convolutional Neural Networks for Text Spotting in Natural Images
2015 paper -
DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
2016-05-24 paper
改进 FasterRCNN,提出了 Inception RPN; -
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
AAAI 2017 2016-11-21 paper | caffe-offical
SSD 基础上做的修改以适应文字检测; -
Improving Text Proposal for Scene Images with Fully Convolutional Networks
ICPR 2016 2017-02-16 paper | caffe-other - Self-organized Text Detection with Minimal Post-processing via Border Learning
ICCV 2017 paper
$\bullet \bullet$
SODT: 为了解决 character based pipeline 的复杂后处理,提出了基于边界的方法,简化了解码过程;
同时还分析了用二分类问题做检测的局限性;公布了 PPT 文本检测的数据集;
对水平和倾斜文字效果都不错,但是对于扭曲尤其是垂直文字效果不好;TextField, TextMoutain, pixellink,psenet 等方法,实际上都是基于文字区域的分割然后加上对边界的校准,提升算法的性能,即检测+分割+回归;同时,也可以看出,为了让算法检测出更为复杂的文本,边界的设计尤为重要;
- PixelLink: Detecting Scene Text via Instance Segmentation
AAAI 2018 2018-01-04 paper | tensorflow-offical
4.2.2 多角度
-
Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork
2016-03-31 paper -
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
2017-03-03 paper | caffe-offical | pytorch
RRPN
旋转 pooling; -
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
CVPR 2017 2017-03-04 paper
使用四边形检测文本; -
Deep Direct Regression for Multi-Oriented Scene Text Detection
2017-03-24 paper -
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
2017-04-03 paper
基于 YOLO,先用分割的方法进行文字区域粗定位,再检测问题具体的位置; 1080Ti 上 1000×5600 的图像需要 450ms; -
EAST: An Efficient and Accurate Scene Text Detector
CVPR 2017 2017-04-11 paper | tensorflow | pytorch
参考了 DenseBox,使用 FCN 和 SSD 的多层输出;针对多方向问题,使用了旋转框和四边形;最后使用的是局部感知 nms;
在 ICDAR 数据集上还可以,但是在其他数据集上效果很不理想; -
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
2017-06-29 paper | tensorflow | caffe | 一些改进代码
FasterRCNN 基础上做扩展;调整了 ROIPooling 的尺寸,引进了倾斜检测框,当然也针对倾斜的情况修正了 NMS; - Single Shot Text Detector with Regional Attention
ICCV 2017 2017-09-01 paper | caffe-offical | pytorch
SSD + attention,通过修改网络结构提升检测效果;所谓的 attention 就是融入了分割,来指导模型训练;方案本身让工程实践变得更复杂,当然如果本身就有大量分割的标注,可以用这个方法;
-
TextBoxes++: A Single-Shot Oriented Scene Text Detector
2018-01-09 paper | torch + caffe
多 anchor box 进行了倾斜和增多; - Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
CVPR 2018 2018-02-25 白翔组 paper | pytorch-offical
针对多方向文本识别;检测到文本区域的四个顶点,以提高检测任务对文本问题的适应性,同时减小了后处理过程;怎么感觉那么像 cornernet 呢;
4.2.3 不规则文字
5 文字识别
识别出文字的内容;可以说是图片转文字;
5.1 常规
5.1.1 CNN
5.1.2 CRNN
-
Reading Scene Text in Deep Convolutional Sequences
AAAI 2016 2015-06-14 汤晓鸥组 paper
使用 RNN 进行字符识别,可以应对形变大和有歧义的字符串,并且可以识别新组合的字符串;和白翔组的 CRNN 很相似;
文章还对比了 Maxout 和 RELU; -
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
2015-07-21 paper
CRNN:
5.1.3 其他
- Focusing Attention: Towards Accurate Text Recognition in Natural Images
ICCV 2017 2017-09-07 海康威视、复旦大学与上交 paper
$\bullet \bullet$
FAN: 注意力聚焦网络;作者发现对于低质图像,注意力会失效,也就是 attention drift——注意力不能精确联系特征向量与输入图像对应的目标区域;咋发现漂移这个事的
5.2 不规则
5.2.1 CNN
- Scene Text Recognition from Two-Dimensional Perspective
AAAI 2019 2018-09-18 白翔组 paper
检测时加入分割,定位到每个字符的位置,然后对单个字符进行分类,以分类代替是识别;同时使用了可变形卷积,以提取文字区域不同形状的特征;
5.2.2 其他
-
Robust Scene Text Recognition with Automatic Rectification
CVPR 2016 2016-03-12 paper
使用 STN 空间仿射网络对不规则文本进行仿射变换,修正后再送入识别网络; -
SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoderdecoder Network
AAAI 2018 2018 paper -
Handwriting Recognition in Low-resource Scripts using Adversarial Learning
CVPR 2019 2018-11-04 paper -
ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
CVPR 2019 2018-12-14 paper
ESIR:设计了仿射网络;
5.3 其他
-
Study on Feature Extraction Methods for Character Recognition of Balinese Script on Palm Leaf Manuscript Images
2016 paper -
Full-Page TextRecognition : Learning Where to Start and When to Stop
2017-04-27 paper -
Reading Text in the Wild with Convolutional Neural Networks
IJCV 2016 -
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
CVPR 2016 -
Robust Scene Text Recognition with Automatic Rectification
CVPR 2016 -
Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
NIPS -
Automatic Script Identification in the Wild
ICDAR 2015 -
Deep structured output learning for unconstrained text recognition
ICLR 2015 -
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
NIPS 2014 -
A Unified Framework for Multi-Oriented Text Detection and Recognition
TIP -
End-to-End Text Recognition with Convolutional Neural Networks
ICPR 2012
6 端到端文字识别
检测 + 识别 不中断;
6.1 LSTM
6.2 其他
-
Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
ICCV 2017 2017-07-13 paper
FasterRCNN 接 LSTM;该方法在复杂任务上效果不是很好,原因是检测结果对整个任务起到决定性作用; -
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
ICCV 2017 2017 paper
检测任务中,除了坐标位置,还加入了旋转角度参数;但仍受检测结果限制; -
Attention-based Extraction of Structured Information from Street View Imagery
ICDAR 2017 2017-04-11 Google paper
$\bullet \bullet$
针对多视角的街景采集数据进行 OCR;使用空间注意力代替检测; -
STN-OCR: A single Neural Network for Text Detection and Text Recognition
2017-07-27 paper -
Detection and Recognition of Text Embedding in Online Images via Neural Context Models
AAAI 2017 paper
CTSN: -
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
AAAI 2018 2017-12-14 paper | tensorflow
SEE: -
FOTS: Fast Oriented Text Spotting with a Unified Network
CVPR 2018 2018-01-05 paper | pytorch
FOTS: FPN + 角度;插值后送入双向 LSTM;单阶段检测网络,所以速度快一点; -
An end-to-end TextSpotter with Explicit Alignment and Attention
CVPR 2018 2018-03-09 paper | caffe-offical
与 TextSpotter 类似,加入了角度信息,然后进行 ROIPooling,再送入 RNN + Seq2Seq + Attention; -
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
ECCV 2018 2018-07-06 旷视 paper
不规则文本处理,借鉴 MaskRCNN; -
Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline
ECCV 2018 2018 paper
PRNet:发布了一个包含 250k 图的中国车牌数据集 CCPD,每个标注包含 1 个 box,4 个定位点,及文字 GT;
对不同尺度的特征图进行 ROIPooling,拼接后进行识别;只是一个 BaseLine 方案;
7 数据集
7.1 数据集
- MIDV-2019: Challenges of the modern mobile-based document OCR
2019-10-09 paper | 数据集
公开了一个自然场景下拍摄的身份证件数据集;
7.2 数据生成
- A Method to Generate Synthetically Warped Document Image
2019-10-15 paper
文档图像仿射变换;感觉一个仿射变换就够了;
- Synthetic Data for Text Localisation in Natural Images
CVPR 2016 2016-04-22 paper | github | home
附录
A 研究员
B 参考资料
- 中文字幕分析
SSD 做的检测,然后自适应与之分割,最后进行识别; - awesome-ocr
- paper with code
- 场景文本位置感知与识别的论文资源汇总
- 52CV-文字识别
- 52CV-文本检测
- Scene Text Recognition Resources
- deep-text-recognition-benchmark
现有方法的对比;
C 开源代码
a 库
-
Clara OCR
99 年起的一个项目,C 语言开发的,支持 win 和 linux;提供了 GUI 和 Web 接口; -
OCRAD
2019-01-11 分布的,基于特征提取的方式分析文本,可以按列或者空格对文字进行分割; -
JOCR
2013-04-15 最后一次更新; -
OCR-CHIE
2001-03; -
TESSERACT-OCR
2005年 由 HP 开源,2006年之后,由 Google 维护; 4.0 版本中加入了 LSTM;目前支持 100 多种语言的文本识别;支持文档格式包括文本、HTML 和 PDF 等;提供 C 和 C++ 接口;
b 工程
D 数据集
名称 | 语言 | 特点 | 字符检测 | 单词检测 | 文本行检测 | 字符识别 | 单词识别 | 端到端 | 数量 | 训练 | 测试 | 大小(G) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MNIST | 数字 | 水平 手写 |
$\checkmark$ | |||||||||
USPS | 数字 | 水平 手写 |
$\checkmark$ | |||||||||
UCI 1991 |
英文 | $\checkmark$ | 20000 | |||||||||
ICDAR 2011 Web |
英文 | $\checkmark$ | $\checkmark$ | $\checkmark$ | ||||||||
ICDAR 2011 Scene |
||||||||||||
ICDAR 2013 Web |
英文 | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | ||||||
ICDAR 2013 Scene |
||||||||||||
ICDAR 2013 videos |
||||||||||||
SVHN 2011 |
数字 | 自然场景 | $\checkmark$ | $\checkmark$ | 33,402 (73,257) |
13,068 (26,032) |
||||||
COCO-Text 2016 |
英文 字符? |
打印 手写 水平 |
$\checkmark$ | $\checkmark$ | $\checkmark$ | 43,686 | 20,000 | |||||
SynthText 2016 |
英文 | 自然场景 | $\checkmark$ | $\checkmark$ | $\checkmark$ | 80万 (800万) |
40 | |||||
Chars74K | ||||||||||||
Comments