public class StringDistances extends Object
Constructor and Description |
---|
StringDistances() |
Modifier and Type | Method and Description |
---|---|
static double |
equalDistance(String s,
String t) |
static double |
hammingDistance(String s,
String t) |
static boolean |
isAlpha(char c) |
static boolean |
isAlphaCap(char c) |
static boolean |
isAlphaNum(char c) |
static boolean |
isAlphaSmall(char c) |
static boolean |
isNum(char c) |
static double |
jaroMeasure(String s,
String t) |
static double |
jaroWinklerMeasure(String s,
String t) |
static double |
levenshteinDistance(String s,
String t) |
static double |
needlemanWunsch2Distance(String s,
String t) |
static double |
needlemanWunschDistance(String s,
String t,
int gap) |
static double |
ngramDistance(String s,
String t) |
static double |
smoaDistance(String s1,
String s2) |
static String |
stripQuotations(String s) |
static double |
subStringDistance(String s1,
String s2) |
static Vector<String> |
tokenize(String s)
JE//: This should return a BagOfWords
the new tokenizer
first looks for non-alphanumeric chars in the string
if any, they will be taken as the only delimiters
otherwise the standard naming convention will be assumed:
words start with a capital letter
substring of capital letters will be seen as a whole
if it is a suffix
otherwise the last letter will be taken as the new token
start
Would be useful to parameterise with stop words as well
|
public static String stripQuotations(String s)
s
- a Stringpublic static Vector<String> tokenize(String s)
public static boolean isAlphaNum(char c)
public static boolean isAlpha(char c)
public static boolean isAlphaCap(char c)
public static boolean isAlphaSmall(char c)
public static boolean isNum(char c)
(C) INRIA, UPMF & friends, 2008-2015