deeperlib.entity_resolution package

Submodules

deeperlib.entity_resolution.simjoin module

class deeperlib.entity_resolution.simjoin.InvertedIndex[source]
get(*words)[source]
insert(word, docid)[source]
class deeperlib.entity_resolution.simjoin.SimJoin(k_o_list)[source]

Use jaccard coefficient and tf-idf to do similarity join.

join(other_k_o_list, threshold, weight_on=False)[source]
selfjoin(threshold, weight_on=False)[source]
deeperlib.entity_resolution.simjoin.alphnum(s)[source]
deeperlib.entity_resolution.simjoin.editsim(s, t)[source]
deeperlib.entity_resolution.simjoin.gramset(s, gram_size, lower_case=True, alphanum_only=True)[source]
deeperlib.entity_resolution.simjoin.jaccard(s, t)[source]
deeperlib.entity_resolution.simjoin.jaccard_g(s, t, gram_size, lower_case=True, alphanum_only=True)[source]
deeperlib.entity_resolution.simjoin.jaccard_w(s, t, lower_case=True, alphanum_only=True)[source]
deeperlib.entity_resolution.simjoin.wordset(s, lower_case=True, alphanum_only=True)[source]

Module contents