[ACCEPTED]-Java library for free-text diff-diff
Depending on your exact requirements, the 1 StringUtils
class of the Apache Commons Lang component might be helpful, e.g.:
- StringUtils#difference: Compares two Strings, and returns the portion where they differ
- StringUtils#getLevenshteinDistance: Find the Levenshtein distance between two Strings
Here's a (lightly-tested) version of code 15 that does what you asked. You can easily 14 traverse the result in parallel with the 13 inputs to locate insertions and deletions.
public class StringDiff {
private static int length(String s) { return s == null ? 0 : s.length(); }
private static char[] chars(String s) { return s == null ? new char[0] : s.toCharArray(); }
private final String left;
private final String right;
private final char[] lccs;
private final String lcs;
public StringDiff(String left, String right) {
this.left = left;
this.right = right;
lccs = init();
lcs = new String(lccs);
}
public String getLcs() { return lcs; }
public char[] getLccs() { return lccs.clone(); }
private char[] init() {
int lLength = length(left);
int rLength = length(right);
char[] lChars = chars(left);
char[] rChars = chars(right);
int [][] t = new int [lLength + 1][rLength + 1];
for (int i = lLength - 1; i >= 0; --i) {
for (int j = rLength - 1; j >= 0; --j) {
if (lChars[i] == rChars[j]) {
t[i][j] = t[i + 1][j + 1] + 1;
} else {
t[i][j] = Math.max(t[i + 1][j], t[i][j + 1]);
}
}
}
char[] result = new char[t[0][0]];
int l = 0, r = 0, p = 0;
while (l < lLength && r < rLength) {
if (lChars[l] == rChars[r]) {
result[p++] = lChars[l++];
r++;
} else {
if (t[l + 1][r] > t[l][r + 1]) {
++l;
} else {
++r;
}
}
}
return result;
}
}
According 12 to it, the actual longest subsequence of 11 your original inputs:
The quick brown fox jumped over the lazy dog.
The quick yellow fox jumped over the well-bred dog.
is:
The quick ow fox jumped over the l dog.
(because "brown" and 10 "yellow" have "ow" in common, etc.)
It's 9 relatively straightforward to modify the 8 above to split on whitespace (instead of 7 into char arrays) and substitute String#equals 6 for == to get a version that finds the longest 5 common subsequence of words instead of characters. For 4 your example above that change would produce 3 the obvious result:
found 7 words
'The'
'quick'
'fox'
'jumped'
'over'
'the'
'dog.'
(Your question implied 2 character comparisons, as you matched the 1 spaces between words.)
If you're example is really what you want 5 to do - ie subsequences only match if they 4 start at the same index (which is different 3 from how diffs normally operate) - this 2 is all you need to do:
import java.util.*;
class StringDiff {
public static List<int[]> from(String s1, String s2) {
int start = -1;
int pos = 0;
LinkedList<int[]> list = new LinkedList<int[]>();
for(; pos < s1.length() && pos < s2.length(); ++pos) {
if(s1.charAt(pos) == s2.charAt(pos)) {
if(start < 0) start = pos;
}
else {
if(start >= 0) list.add(new int[] { start, pos });
start = -1;
}
}
if(start >= 0) list.add(new int[] { start, pos });
return list;
}
public static void main(String[] args) {
for(int[] idx : from(args[0], args[1]))
System.out.println(args[0].substring(idx[0], idx[1]));
}
}
An actual diff implementation 1 will be far more sophisticated.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.