JE//: This is independent from WordNet and should go to StringDistances
JE//: This should return a BagOfWords
the new tokenizer
first looks for non-alphanumeric chars in the string
if any, they will be taken as the only delimiters
otherwise the standard naming convention will be assumed:
words start with a capital letter
substring of capital letters will be seen as a whole
if it is a suffix
otherwise the last letter will be taken as the new token
start
Would be useful to parameterise with stop words as well
This is a depth first search traversal
From an ontology (current) and a set of entity still to preserve
it follows the alignments after having transformed the entities and
recorded those it failed to transform
it will return w