6 minute read

光学字符识别;感觉在 2017 年爆炸了;
相关资源:目标检测资源

1 综述

  1. Feature extraction methods for character recognition-A survey
    1995-07-19 paper

  2. A Survey Paper on Scene Text Detection Methods
    2012 paper

  3. An Overview of Feature Extraction Techniques in OCR for Indian Scripts Focused on Offline Handwriting
    2013 paper

  4. Deep Learning for Text Spotting
    2014 paper
    研究生课题(斯坦福)

  5. Scene Text Detection and Recognition: Recent Advances and Future Trends
    2014 paper

  6. Text Detection and Recognition in Imagery: A Survey
    2015-07 paper

  7. Context Modeling for Semantic Text Matching and Scene Text Detection
    2016 paper
    研究生课题;

  8. A Survey on Scene Text Detection and Text Recognition
    2016-03 paper

  9. Text Detection and Recognition in Images: A Survey
    2018-03-20 paper
    学生写的,还不完整;

  10. Scene Text Detection and Recognition: The Deep Learning Era
    2018-11-10 paper | github

2 理论

3 其他

  1. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
    ICDAR 2017 2017-10-28 paper | code4downloads

  2. ICDAR 2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019
    ICDAR 2019 2019-07-01 paper | competition

4 文本检测

找到文字区域的位置;

4.1 字符

需要分离出两个模型,一个负责检测字符,一个负责合并;除了速度上的降低,还存在分步累计误差,且无法进行端到端的训练;
感觉都好复杂;

  1. Text-Attentional Convolutional Neural Network for Scene Text Detection
    2015-10-12 paper
    用 CE-MSERs 检测出字符,外加其他方法尽可能多地检测出字符,然后再滤除掉不是字符的区域;后接文本行合成处理;

  2. Text Flow: A Unified Text Detection System in Natural Scene Images
    ICCV 2015 2016-04-23 paper
    Adaboost 检测字符区域,然后用图进行合并;

  3. Detecting oriented text in natural images by linking segments
    CVPR 2017 2017-03-19 paper | tensorflow
    SegLink: 提出文本行检测,基于 SSD 进行的改进;先检测字符,再拼接;

  4. WordSup: Exploiting Word Annotations for Character based Text Detection
    ICCV 2017 2017-08-22 paper
    使用预训练的字符检测网络,对结果进行聚类的到文本块,后接识别模块;

  5. WeText: Scene Text Detection under Weak Supervision
    ICCV 2017 2017-10-13 paper
    $\bullet \bullet$
    SSD 基础上使用弱监督和半监督来扩充训练数据;单数速度较慢,且只能处理水平方向字符;

文章的亮点就在与怎么训练 SSD,但是又没有给出训练的具体细节;

4.2 文本行

4.2.1 常规

  1. Reading Text in the Wild with Convolutional Neural Networks
    2014-12-04 paper

  2. Deep Convolutional Neural Networks for Text Spotting in Natural Images
    2015 paper

  3. DeepText:A Unified Framework for Text Proposal Generation and Text Detection in Natural Images
    2016-05-24 paper
    改进 FasterRCNN,提出了 Inception RPN;

  4. TextBoxes: A Fast Text Detector with a Single Deep Neural Network
    AAAI 2017 2016-11-21 paper | caffe-offical
    SSD 基础上做的修改以适应文字检测;

  5. Improving Text Proposal for Scene Images with Fully Convolutional Networks
    ICPR 2016 2017-02-16 paper | caffe-other

  6. Self-organized Text Detection with Minimal Post-processing via Border Learning
    ICCV 2017 paper
    $\bullet \bullet$
    SODT: 为了解决 character based pipeline 的复杂后处理,提出了基于边界的方法,简化了解码过程;
    同时还分析了用二分类问题做检测的局限性;公布了 PPT 文本检测的数据集;
    对水平和倾斜文字效果都不错,但是对于扭曲尤其是垂直文字效果不好;

    TextField, TextMoutain, pixellink,psenet 等方法,实际上都是基于文字区域的分割然后加上对边界的校准,提升算法的性能,即检测+分割+回归;同时,也可以看出,为了让算法检测出更为复杂的文本,边界的设计尤为重要;

  7. PixelLink: Detecting Scene Text via Instance Segmentation
    AAAI 2018 2018-01-04 paper | tensorflow-offical

4.2.2 多角度

  1. Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork
    2016-03-31 paper

  2. Arbitrary-Oriented Scene Text Detection via Rotation Proposals
    2017-03-03 paper | caffe-offical | pytorch
    RRPN
    旋转 pooling;

  3. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
    CVPR 2017 2017-03-04 paper
    使用四边形检测文本;

  4. Deep Direct Regression for Multi-Oriented Scene Text Detection
    2017-03-24 paper

  5. Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
    2017-04-03 paper
    基于 YOLO,先用分割的方法进行文字区域粗定位,再检测问题具体的位置; 1080Ti 上 1000×5600 的图像需要 450ms;

  6. EAST: An Efficient and Accurate Scene Text Detector
    CVPR 2017 2017-04-11 paper | tensorflow | pytorch
    参考了 DenseBox,使用 FCN 和 SSD 的多层输出;针对多方向问题,使用了旋转框和四边形;最后使用的是局部感知 nms;
    在 ICDAR 数据集上还可以,但是在其他数据集上效果很不理想;

  7. R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
    2017-06-29 paper | tensorflow | caffe | 一些改进代码
    FasterRCNN 基础上做扩展;调整了 ROIPooling 的尺寸,引进了倾斜检测框,当然也针对倾斜的情况修正了 NMS;

  8. Single Shot Text Detector with Regional Attention
    ICCV 2017 2017-09-01 paper | caffe-offical | pytorch
    SSD + attention,通过修改网络结构提升检测效果;所谓的 attention 就是融入了分割,来指导模型训练;

    方案本身让工程实践变得更复杂,当然如果本身就有大量分割的标注,可以用这个方法;

  9. TextBoxes++: A Single-Shot Oriented Scene Text Detector
    2018-01-09 paper | torch + caffe
    多 anchor box 进行了倾斜和增多;

  10. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
    CVPR 2018 2018-02-25 白翔组 paper | pytorch-offical
    针对多方向文本识别;检测到文本区域的四个顶点,以提高检测任务对文本问题的适应性,同时减小了后处理过程;

    怎么感觉那么像 cornernet 呢;

4.2.3 不规则文字

  1. On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
    2019-10-10 paper

5 文字识别

识别出文字的内容;可以说是图片转文字;

5.1 常规

5.1.1 CNN

5.1.2 CRNN

  1. Reading Scene Text in Deep Convolutional Sequences
    AAAI 2016 2015-06-14 汤晓鸥组 paper
    使用 RNN 进行字符识别,可以应对形变大和有歧义的字符串,并且可以识别新组合的字符串;和白翔组的 CRNN 很相似;
    文章还对比了 Maxout 和 RELU;

  2. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
    2015-07-21 paper
    CRNN:

5.1.3 其他

  1. Focusing Attention: Towards Accurate Text Recognition in Natural Images
    ICCV 2017 2017-09-07 海康威视、复旦大学与上交 paper
    $\bullet \bullet$
    FAN: 注意力聚焦网络;作者发现对于低质图像,注意力会失效,也就是 attention drift——注意力不能精确联系特征向量与输入图像对应的目标区域;

    咋发现漂移这个事的

5.2 不规则

5.2.1 CNN

  1. Scene Text Recognition from Two-Dimensional Perspective
    AAAI 2019 2018-09-18 白翔组 paper
    检测时加入分割,定位到每个字符的位置,然后对单个字符进行分类,以分类代替是识别;同时使用了可变形卷积,以提取文字区域不同形状的特征;

5.2.2 其他

  1. Robust Scene Text Recognition with Automatic Rectification
    CVPR 2016 2016-03-12 paper
    使用 STN 空间仿射网络对不规则文本进行仿射变换,修正后再送入识别网络;

  2. SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoderdecoder Network
    AAAI 2018 2018 paper

  3. Handwriting Recognition in Low-resource Scripts using Adversarial Learning
    CVPR 2019 2018-11-04 paper

  4. ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
    CVPR 2019 2018-12-14 paper
    ESIR:设计了仿射网络;

5.3 其他

  1. Study on Feature Extraction Methods for Character Recognition of Balinese Script on Palm Leaf Manuscript Images
    2016 paper

  2. Full-Page TextRecognition : Learning Where to Start and When to Stop
    2017-04-27 paper

  3. Reading Text in the Wild with Convolutional Neural Networks
    IJCV 2016

  4. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
    CVPR 2016

  5. Robust Scene Text Recognition with Automatic Rectification
    CVPR 2016

  6. Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
    NIPS

  7. AnEnd-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
    CoRR 2015

  8. Automatic Script Identification in the Wild
    ICDAR 2015

  9. Deep structured output learning for unconstrained text recognition
    ICLR 2015

  10. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
    NIPS 2014

  11. A Unified Framework for Multi-Oriented Text Detection and Recognition
    TIP

  12. End-to-End Text Recognition with Convolutional Neural Networks
    ICPR 2012

6 端到端文字识别

检测 + 识别 不中断;

6.1 LSTM

  1. Unambiguous Text Localization and Retrieval for Cluttered Scenes
    CVPR 2017 paper

6.2 其他

  1. Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
    ICCV 2017 2017-07-13 paper
    FasterRCNN 接 LSTM;该方法在复杂任务上效果不是很好,原因是检测结果对整个任务起到决定性作用;

  2. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework
    ICCV 2017 2017 paper
    检测任务中,除了坐标位置,还加入了旋转角度参数;但仍受检测结果限制;

  3. Attention-based Extraction of Structured Information from Street View Imagery
    ICDAR 2017 2017-04-11 Google paper
    $\bullet \bullet$
    针对多视角的街景采集数据进行 OCR;使用空间注意力代替检测;

  4. STN-OCR: A single Neural Network for Text Detection and Text Recognition
    2017-07-27 paper

  5. Detection and Recognition of Text Embedding in Online Images via Neural Context Models
    AAAI 2017 paper
    CTSN:

  6. SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
    AAAI 2018 2017-12-14 paper | tensorflow
    SEE:

  7. FOTS: Fast Oriented Text Spotting with a Unified Network
    CVPR 2018 2018-01-05 paper | pytorch
    FOTS: FPN + 角度;插值后送入双向 LSTM;单阶段检测网络,所以速度快一点;

  8. An end-to-end TextSpotter with Explicit Alignment and Attention
    CVPR 2018 2018-03-09 paper | caffe-offical
    与 TextSpotter 类似,加入了角度信息,然后进行 ROIPooling,再送入 RNN + Seq2Seq + Attention;

  9. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
    ECCV 2018 2018-07-06 旷视 paper
    不规则文本处理,借鉴 MaskRCNN;

  10. Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline
    ECCV 2018 2018 paper
    PRNet:发布了一个包含 250k 图的中国车牌数据集 CCPD,每个标注包含 1 个 box,4 个定位点,及文字 GT;
    对不同尺度的特征图进行 ROIPooling,拼接后进行识别;只是一个 BaseLine 方案;

7 数据集

7.1 数据集

  1. MIDV-2019: Challenges of the modern mobile-based document OCR
    2019-10-09 paper | 数据集
    公开了一个自然场景下拍摄的身份证件数据集;

7.2 数据生成

  1. A Method to Generate Synthetically Warped Document Image
    2019-10-15 paper
    文档图像仿射变换;

    感觉一个仿射变换就够了;

  2. TextRecognitionDataGenerator

  3. awesome-SynthText

  4. Synthetic Data for Text Localisation in Natural Images
    CVPR 2016 2016-04-22 paper | github | home

TOP

附录

A 研究员

  1. 白翔小组
    廖明辉,石葆光, 白翔, 王兴刚 ,刘文予
  2. 汤晓欧
    黄伟林, 乔宇

B 参考资料

  1. 中文字幕分析
    SSD 做的检测,然后自适应与之分割,最后进行识别;
  2. awesome-ocr
  3. paper with code
  4. 场景文本位置感知与识别的论文资源汇总
  5. 52CV-文字识别
  6. 52CV-文本检测
  7. Scene Text Recognition Resources
  8. deep-text-recognition-benchmark
    现有方法的对比;

C 开源代码

a 库

  1. OCRE(OCR Easy)

  2. Clara OCR
    99 年起的一个项目,C 语言开发的,支持 win 和 linux;提供了 GUI 和 Web 接口;

  3. OCRAD
    2019-01-11 分布的,基于特征提取的方式分析文本,可以按列或者空格对文字进行分割;

  4. JOCR
    2013-04-15 最后一次更新;

  5. OCR-CHIE
    2001-03;

  6. TESSERACT-OCR
    2005年 由 HP 开源,2006年之后,由 Google 维护; 4.0 版本中加入了 LSTM;目前支持 100 多种语言的文本识别;支持文档格式包括文本、HTML 和 PDF 等;提供 C 和 C++ 接口;

b 工程

  1. textDetectionWithScriptID

D 数据集

名称 语言 特点 字符检测 单词检测 文本行检测 字符识别 单词识别 端到端 数量 训练 测试 大小(G)
MNIST 数字 水平
手写
        $\checkmark$          
USPS 数字 水平
手写
        $\checkmark$          
UCI
1991
英文       $\checkmark$       20000      
ICDAR 2011
Web
英文     $\checkmark$     $\checkmark$ $\checkmark$        
ICDAR 2011
Scene
                       
ICDAR 2013
Web
英文   $\checkmark$ $\checkmark$   $\checkmark$ $\checkmark$ $\checkmark$        
ICDAR 2013
Scene
                       
ICDAR 2013
videos
                       
SVHN
2011
数字 自然场景 $\checkmark$     $\checkmark$       33,402
(73,257)
13,068
(26,032)
 
COCO-Text
2016
英文
字符?
打印
手写
水平
  $\checkmark$     $\checkmark$ $\checkmark$   43,686 20,000  
SynthText
2016
英文 自然场景 $\checkmark$ $\checkmark$ $\checkmark$       80万
(800万)
    40
Chars74K                        
                       
                       
  1. 文字检测与识别资料整理(数据库,代码,博客)【持续更新】

  2. PubTabNet
    表格识别; github 比赛

E 报告

  1. OCR and Text Spotting

Comments