2 minute read

计算两个序列的相似度,可找到两者公共的部分,可带容错;
字符串匹配

序列匹配算法 · 序列匹配
sequence alignment · significant alignment · global alignment · local alignment
不同关键字在搜索引擎上搜索结果不同

1 综述

2 理论

3 全局

3.1 串模式匹配

4 局部

4.1 最长公共子序列

4.2 沃特曼

  1. Identification of Common Molecular Subsequences
    1981 paper
    $\bullet \bullet$ Waterman

  2. An Improved Algorithm For Matching Biological Sequences
    1982 paper
    $\bullet \bullet$ Waterman

  3. Optimal Sequence Alignment Using Affine Gap Costs
    1986

  4. Optimal alignments in linear space
    1988 paper
    $\bullet \bullet$ Waterman

5 启发式

速度更快;

  1. Basic Local Alignment Search Tool
    1990 paper
    BLAST

6 其他

  1. The similarity metric
    2001-11-20 paper

  2. Approximate word matches between two random sequences
    2008-01-21 paper

  3. Empirical distribution of k-word matches in biological sequences
    2008-03-14 paper

  4. An Algorithm for Alignment-free Sequence Comparison using Logical Match
    ICCAE 2010 2014-07-06 paper

  5. An Alignment Algorithm for Sequences
    2012-10-31 paper
    $\bullet \bullet$ Sequence Alignment

  6. Ultra-fast Multiple Genome Sequence Matching Using GPU
    2013-03-15 paper
    $\bullet \bullet$

  7. SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching
    2013-07-31 paper

  8. Geometry-based Adaptive Symbolic Approximation for Fast Sequence Matching on Manifolds
    2014-03-04 paper

  9. Measuring Congruence on High Dimensional Time Series
    2018-05-27 paper
    针对高维序列度量;

  10. Efficient Measuring of Congruence on High Dimensional Time Series
    2018-11-27 paper
    比 DTW 更稳健;

7 应用

7.1 生物序列

  1. Measuring the similarity of protein structures by means of the universal similarity metric
    2004 paper
    蛋白质序列相似度计算;

  2. A Comparative Study on String Matching Algorithm of Biological Sequences
    2014-01-29 paper
    $\bullet \bullet$ string match study

  3. An optimized Parallel Failure-less Aho-Corasick algorithm for DNA sequence matching
    ICIAfS 2016 2018-11-26 paper

  4. FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM technology
    2019-07-11 paper

7.2 购买意向

  1. Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching
    KDD 2019 oral 2019-03-24 paper
    购买意向匹配;

7.3 股票预测

7.4 视频序列匹配

7.5 手势识别


TOP

附录

A 讲义

  1. Linear-Space Alignment
  2. Space-Efficient Alignment
  3. Alignment in linear space
  4. Sequence Alignment

B 参考资料

  1. wikipedia. Sequence alignment[EB/OL]. https://en.wikipedia.org/wiki/Sequence_alignment. -/2019-11-05.
  2. 维基百科. 序列比对[EB/OL]. https://zh.wikipedia.org/wiki/%E5%BA%8F%E5%88%97%E6%AF%94%E5%B0%8D. -/2019-11-05.
  3. 维基百科. 尼德曼-翁施算法[EB/OL]. https://zh.wikipedia.org/wiki/%E5%B0%BC%E5%BE%B7%E6%9B%BC-%E7%BF%81%E6%96%BD%E7%AE%97%E6%B3%95. -/2019-11-05.
    Saul B. Needleman; and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970, 48: 443–453. doi:10.1016/0022-2836(70)90057-4;
    启发式的同源算法;
  4. 维基百科. 史密斯-沃特曼算法[EB/OL]. https://zh.wikipedia.org/wiki/%E5%8F%B2%E5%AF%86%E6%96%AF-%E6%B2%83%E7%89%B9%E6%9B%BC%E7%AE%97%E6%B3%95. -/2019-11-05.
    M.S Waterman; T.F Smith; and W.A Beyer. Some biological sequence metrics. Advances in Mathematics. 1976, 20: 367–387. doi:10.1016/0001-8708(76)90202-4;
    在 Peter H. Sellers. On the Theory and Computation of Evolutionary Distances. Journal of Applied Mathematics. 1974, 26: 787–793. doi:10.1137/0126070 基础上加了插入和删除;
  5. 维基百科. 莱文斯坦距离[EB/OL]. https://zh.wikipedia.org/wiki/%E8%90%8A%E6%96%87%E6%96%AF%E5%9D%A6%E8%B7%9D%E9%9B%A2. -/2019-11-05.
  6. 小狗贤. 从零开始生物信息学(2):序列比对-Needleman-Wunsch算法[EB/OL]. https://zhuanlan.zhihu.com/p/54142276. 2019-01-17/2019-11-05.
  7. 小狗贤. 从零开始生物信息学(3):序列比对-Smith–Waterman算法[EB/OL]. https://zhuanlan.zhihu.com/p/54194091. 2019-01-17/2019-11-05.
  8. wikipedia. FASTA[EB/OL]. https://en.wikipedia.org/wiki/FASTA. -/2019-11-05.
  9. wikipedia. BLAST[EB/OL]. https://en.wikipedia.org/wiki/BLAST_(biotechnology). -/2019-11-05.
  10. Paul Reiners. 动态编程和基因序列比对[EB/OL]. https://www.ibm.com/developerworks/cn/java/j-seqalign/index.html. 2008-04-17/2019-11-05.
  11. 相似性和距离度量

Comments