intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Lecture Algorithm design - Chapter 6: Dynamic programming II

Chia sẻ: You Can | Ngày: | Loại File: PDF | Số trang:50

37
lượt xem
3
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Lecture Algorithm design - Chapter 6: Dynamic programming II include all of the following: Sequence alignment, hirschberg's algorithm, Bellman-Ford algorithm, distance vector protocols, negative cycles in a digraph.

Chủ đề:
Lưu

Nội dung Text: Lecture Algorithm design - Chapter 6: Dynamic programming II

  1. 6. D YNAMIC P ROGRAMMING II ‣ sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph Lecture slides by Kevin Wayne Copyright © 2005 Pearson-Addison Wesley Copyright © 2013 Kevin Wayne http://www.cs.princeton.edu/~wayne/kleinberg-tardos Last updated on Oct 11, 2014, 9:26 AM
  2. 6. D YNAMIC P ROGRAMMING II ‣ sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph SECTION 6.6
  3. String similarity Q. How similar are two strings? Ex. ocurrance and occurrence. o c u r r a n c e – o c – u r r a n c e o c c u r r e n c e o c c u r r e n c e 6 mismatches, 1 gap 1 mismatch, 1 gap o c – u r r – a n c e o c c u r r e – n c e 0 mismatches, 3 gaps 3
  4. Edit distance Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] ・Gap penalty δ; mismatch penalty αpq. ・Cost = sum of gap and mismatch penalties. C T – G A C C T A C G C T G G A C G A A C G cost = δ + αCG + αTA Applications. Unix diff, speech recognition, computational biology, ... 4
  5. Sequence alignment Goal. Given two strings x1 x2 ... xm and y1 y2 ... yn find min cost alignment. Def. An alignment M is a set of ordered pairs xi – yj such that each item occurs in at most one pair and no crossings. xi – yj and xi' – yj' cross if i < i ', but j > j ' Def. The cost of an alignment M is: cost(M ) = ∑ α xi y j + ∑ δ+ ∑ δ (x , y j ) ∈ M i : x unmatched j : y unmatched !i## "## $ !#i### #"#j#### $ mismatch gap € x1 x2 x3 x4 x5 x6 C T A C C – G – T A C A T G y1 y2 y3 y4 y5 y6 an alignment of CTACCG and TACATG: M = { x2–y1, x3–y2, x4–y3, x5–y4, x6–y6 } 5
  6. Sequence alignment: problem structure Def. OPT(i, j) = min cost of aligning prefix strings x1 x2 ... xi and y1 y2 ... yj. Case 1. OPT matches xi – yj. Pay mismatch for xi – yj + min cost of aligning x1 x2 ... xi–1 and y1 y2 ... yj–1. Case 2a. OPT leaves xi unmatched. Pay gap for xi + min cost of aligning x1 x2 ... xi–1 and y1 y2 ... yj. optimal substructure property Case 2b. OPT leaves yj unmatched. (proof via exchange argument) Pay gap for yj + min cost of aligning x1 x2 ... xi and y1 y2 ... yj–1. ⎧ jδ if i = 0 ⎪ ⎧ α x i y j + OPT (i −1, j −1) ⎪⎪ ⎪ OPT (i, j) = ⎨ min ⎨ δ + OPT (i −1, j) otherwise ⎪ ⎪ δ + OPT (i, j −1) ⎪ ⎩ ⎪⎩ iδ if j = 0 6
  7. Sequence alignment: algorithm SEQUENCE-ALIGNMENT (m, n, x1, …, xm, y1, …, yn, δ, α) ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ FOR i = 0 TO m M [i, 0] ← i δ. FOR j = 0 TO n M [0, j] ← j δ. FOR i = 1 TO m FOR j = 1 TO n M [i, j] ← min { α[xi, yj] + M [i – 1, j – 1], δ + M [i – 1, j], δ + M [i, j – 1]). RETURN M [m, n]. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ 7
  8. Sequence alignment: analysis Theorem. The dynamic programming algorithm computes the edit distance (and optimal alignment) of two strings of length m and n in Θ(mn) time and Θ(mn) space. Pf. ・Algorithm computes edit distance. ・Can trace back to extract optimal alignment itself. ▪ Q. Can we avoid using quadratic space? A. Easy to compute optimal value in O(mn) time and O(m + n) space. ・Compute OPT(i, •) from OPT(i – 1, •). ・But, no longer easy to recover optimal alignment itself. 8
  9. 6. D YNAMIC P ROGRAMMING II ‣ sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph SECTION 6.7
  10. Sequence alignment in linear space Theorem. There exist an algorithm to find an optimal alignment in O(mn) time and O(m + n) space. ・Clever combination of divide-and-conquer and dynamic programming. ・Inspired by idea of Savitch from complexity theory. A = a x a 2 . . . a m if and only if there is a mapping F: {1, 2, . . . , p} ~ {1, 2, . . . , m} such that f(i) = k only if c~ is ak and F is a m o n o t o n e strictly increasing func- tion (i.e. F(i) = u, F ( j ) = v, and i < j imply that u
  11. Hirschberg's algorithm Edit distance graph. ・Let f (i, j) be shortest path from (0,0) to (i, j). ・Lemma: f (i, j) = OPT(i, j) for all i and j. ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 α xi y j δ x2 i-j € δ x3 m-n 11
  12. Hirschberg's algorithm Edit distance graph. ・Let f (i, j) be shortest path from (0,0) to (i, j). ・Lemma: f (i, j) = OPT(i, j) for all i and j. Pf of Lemma. [ by strong induction on i + j ] ・Base case: f (0, 0) = OPT (0, 0) = 0. ・Inductive hypothesis: assume true for all (i', j') with i' + j' < i + j. ・Last edge on shortest path to (i, j) is from (i – 1, j – 1), (i – 1, j), or (i, j – 1). ・Thus, ff (i, (i, j) j) = = min{ min{ xii y x y xi yjjj + ff (i + (i 1, jj 1, 1), 1), + ff (i + (i 1, j), 1, j), + ff (i, + (i, jj 1)} 1)} = = min{ min{ xii y x y + OP + OP T T (i (i 1, jj 1, 1), 1), + OP + OP T T (i (i 1, j), 1, j), + OP + OP T T (i, (i, jj 1)} 1)} xi yjjj = = OP T OP T (i, (i, j) j) ▪ α xi y j δ i-j € δ 12
  13. Hirschberg's algorithm Edit distance graph. ・Let f (i, j) be shortest path from (0,0) to (i, j). ・Lemma: f (i, j) = OPT(i, j) for all i and j. ・Can compute f (•, j) for any j in O(mn) time and O(m + n) space. j ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 x2 i-j x3 m-n 13
  14. Hirschberg's algorithm Edit distance graph. ・Let g (i, j) be shortest path from (i, j) to (m, n). ・Can compute by reversing the edge orientations and inverting the roles of (0, 0) and (m, n). ε y1 y2 y3 y4 y5 y6 ε 0-0 δ x1 i-j δ xi+1 yj+1 x2 x3 m-n 14
  15. Hirschberg's algorithm Edit distance graph. ・Let g (i, j) be shortest path from (i, j) to (m, n). ・Can compute g(•, j) for any j in O(mn) time and O(m + n) space. j ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 i-j x2 x3 m-n 15
  16. Hirschberg's algorithm Observation 1. The cost of the shortest path that uses (i, j) is f (i, j) + g(i, j). ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 i-j x2 x3 m-n 16
  17. Hirschberg's algorithm Observation 2. let q be an index that minimizes f (q, n/2) + g (q, n/2). Then, there exists a shortest path from (0, 0) to (m, n) uses (q, n/2). n/2 ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 i-j q x2 x3 m-n 17
  18. Hirschberg's algorithm Divide. Find index q that minimizes f (q, n/2) + g(q, n/2); align xq and yn / 2. Conquer. Recursively compute optimal alignment in each piece. n/2 ε y1 y2 y3 y4 y5 y6 ε 0-0 x1 i-j q x2 x3 m-n 18
  19. Hirschberg's algorithm: running time analysis warmup Theorem. Let T(m, n) = max running time of Hirschberg's algorithm on strings of length at most m and n. Then, T(m, n) = O(m n log n). Pf. T(m, n) ≤ 2 T(m, n / 2) + O(m n) ⇒ T(m, n) = O(m n log n). Remark. Analysis is not tight because two subproblems are of size (q, n/2) and (m – q, n / 2). In next slide, we save log n factor. 19
  20. Hirschberg's algorithm: running time analysis Theorem. Let T(m, n) = max running time of Hirschberg's algorithm on strings of length at most m and n. Then, T(m, n) = O(m n). Pf. [ by induction on n ] ・O(m n) time to compute f ( •, n / 2) and g ( •, n / 2) and find index q. ・T(q, n / 2) + T(m – q, n / 2) time for two recursive calls. ・Choose constant c so that: T(m, 2) ≤ c m T(2, n) ≤ c n T(m, n) ≤ c m n + T(q, n / 2) + T(m – q, n / 2) ・Claim. T(m, n) ≤ 2 c m n. ・Base cases: m = 2 or n = 2. ・Inductive hypothesis: T(m, n) ≤ 2 c m n for all (m', n') with m' + n' < m + n. T(m, n) ≤ T(q, n / 2) + T(m – q, n / 2) + c m n ≤ 2 c q n / 2 + 2 c (m – q) n / 2 + c m n = cq n + cmn – cqn + cmn = 2 cmn ▪ 20
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2