deeperlib.core.smartcrawl.
smartCrawl
(top_k, count, pool_thre, jaccard_thre, threads, budget, api, sampledata, localdata, hiddendata)[source]¶Given a budget ofb queries, SMARTCRAWL first constructs a query pool based on the local database and then iteratively issues b queries to the hidden database such that the union of the query results can cover the maximum number of records in the local database. Finally, it performs entity resolution between the local database and the crawled records. —-DeepER: Deep Entity Resolution
Parameters: |
|
---|---|
Returns: |
deeperlib.core.utils.
add_naiveIndex
(queries, data, index)[source]¶To improve the efficiency of building index, naive queries would be added to query pool and inverted index after processing the queries whose frequency are larger than threshold.
Parameters: |
|
---|---|
Returns: | query pool and inverted index with naive queries |
deeperlib.core.utils.
forwardIndex
(D1index)[source]¶A forward index maps a local record to all the queries that the record satisfies. Such a list is called a forward list. To build the index, we initialize a hash map F and let F(d)denote the forward list for d.
Parameters: | D1index – inverted index of local database. |
---|---|
Returns: | a dict of forward index. |
deeperlib.core.utils.
initScore_biased
(sampleindex, k, sr, Dratio, queries)[source]¶Biased benefit estimation.
Parameters: |
|
---|---|
Returns: | query pool with biased benefit |
deeperlib.core.utils.
initScore_unbiased
(sampleindex, D1index, k, sr, queries)[source]¶Unbiased benefit estimation.
Parameters: |
|
---|---|
Returns: | query pool with biased benefit |
deeperlib.core.utils.
invertedIndex
(queries, data)[source]¶An inverted index maps each keyword to a list of local records that contain the keyword. Such a list is called an inverted list. To build the index, we initialize a hash map I and let I(w) denote the inverted list of key-word w. For each local record d belongs to D, we enumerate each keyword in document(d) and add d into I(w). Given a query q, we generate q(D) by getting the intersection of the inverted list of each keyword in the query.
Parameters: |
|
---|---|
Returns: | an inverted index {query: set(uniqueid)} |
deeperlib.core.utils.
queryGene
(D1, thre)[source]¶Use fpgrowth to generate a finite queries pool
Parameters: |
|
---|---|
Returns: | a closed frequency itemset of local database |