public class StringDistances
extends java.lang.Object
Constructor and Description |
---|
StringDistances() |
Modifier and Type | Method and Description |
---|---|
static double |
equalDistance(java.lang.String s,
java.lang.String t)
equalDistance
|
static double |
hammingDistance(java.lang.String s,
java.lang.String t)
hammingDistance
|
static boolean |
isAlpha(char c) |
static boolean |
isAlphaCap(char c) |
static boolean |
isAlphaNum(char c) |
static boolean |
isAlphaSmall(char c) |
static boolean |
isNum(char c) |
static double |
jaroMeasure(java.lang.String s,
java.lang.String t)
jaroMeasure as a dissimilarity (identical have 0.)
return:
Original algorithm by Jérôme Euzenat.
|
static double |
jaroWinklerMeasure(java.lang.String s,
java.lang.String t)
jaroWinklerMeasure
|
static double |
levenshteinDistance(java.lang.String s,
java.lang.String t) |
static double |
needlemanWunsch2Distance(java.lang.String s,
java.lang.String t) |
static double |
needlemanWunschDistance(java.lang.String s,
java.lang.String t,
int gap)
Needleman-Wunsch: a generalisation of edit distances
Pointer was provided in Todd Hugues (Lockheed)
Taken from http://www.merriampark.com/ldjava.htm
Initial algorithm by Michael Gilleland
Integrated in Apache Jakarta Commons
Improved by Chas Emerick
This algorithm should be taken appart of this file and reset in the
context of a proper package name with an acceptable license terms.
|
static double |
ngramDistance(java.lang.String s,
java.lang.String t)
ngrammDistance
In fact 3-gramm distance
|
static double |
ngramOccurDistance(java.lang.String s,
java.lang.String t)
ngrammOccurDistance
In fact 3-gramm distance
|
static double |
smoaDistance(java.lang.String s1,
java.lang.String s2)
smoaDistance = commonality - dissimilarity + winklerImprovement;
A specialized distance for ontology matching identifiers
Calls the string matching method proposed in the paper
"A String Metric For Ontology Alignment", published in ISWC 2005
It is implemented in a separate class provided by the authors and
available with this package
|
static java.lang.String |
stripQuotations(java.lang.String s)
Strips quotation within a string
|
static double |
subStringDistance(java.lang.String s1,
java.lang.String s2)
subStringDistance
computes substring distance:
= 1 - (2 | length of longest common substring | / |s1|+|s2|)
return: 0 if both string are equal, 1 otherwise
|
static java.util.Vector<java.lang.String> |
tokenize(java.lang.String s)
JE//: This should return a BagOfWords
the new tokenizer
first looks for non-alphanumeric chars in the string
if any, they will be taken as the only delimiters
otherwise the standard naming convention will be assumed:
words start with a capital letter
substring of capital letters will be seen as a whole
if it is a suffix
otherwise the last letter will be taken as the new token
start
Would be useful to parameterise with stop words as well
|
public static double subStringDistance(java.lang.String s1, java.lang.String s2)
s1
- ands2
- : two strings on which substring distance is computedpublic static double equalDistance(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which equality distance is computedpublic static double hammingDistance(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which the Hamming distance is computedpublic static double jaroMeasure(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which the Jaro measure is computedpublic static double jaroWinklerMeasure(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which the Jaro-Winkler measure is computedpublic static double ngramDistance(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which the n-gramm distance is computedpublic static double ngramOccurDistance(java.lang.String s, java.lang.String t)
s
- andt:
- two strings on which the n-gramm distance is computedpublic static double levenshteinDistance(java.lang.String s, java.lang.String t)
public static double needlemanWunsch2Distance(java.lang.String s, java.lang.String t)
public static double needlemanWunschDistance(java.lang.String s, java.lang.String t, int gap)
s
- andt:
- two strings on which the Needleman-Wunsch distance is computedgap:
- see the definitionpublic static double smoaDistance(java.lang.String s1, java.lang.String s2)
s1
- ands2:
- two strings on which the Needleman-Wunsch distance is computedpublic static java.lang.String stripQuotations(java.lang.String s)
s
- a Stringpublic static java.util.Vector<java.lang.String> tokenize(java.lang.String s)
s:
- a string to tokenizepublic static boolean isAlphaNum(char c)
public static boolean isAlpha(char c)
public static boolean isAlphaCap(char c)
public static boolean isAlphaSmall(char c)
public static boolean isNum(char c)
(C) INRIA, Univ. Grenoble Alpes & friends, 2008-2017