deeperlib.estimator package¶

Submodules¶

deeperlib.estimator.aggregation module¶

deeperlib.estimator.aggregation.sota_estimator(query_pool, api, match_term, uniqueid, query_num)[source]¶

A method to estimate the aggregation of a search engine’s corpus efficient ——Efficient search engine measurements

Parameters:	query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5} api – An implementation of simapi for specific api. match_term – Some fields for matching queries and returned document. uniqueid – The uniqueid of returned messages. query_num – The number of queries you want to estimate
Returns:	count(*) of the search engine

deeperlib.estimator.aggregation.stratified_estimator(query_pool, api, match_term, candidate_rate, query_num, layer=5)[source]¶

A method to estimate the aggregation of a search engine’s corpus efficient yet unbiased ——Mining a search engine’s corpus: efficient yet unbiased sampling and aggregate estimation

Parameters:

Parameters:	query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5} api – An implementation of simapi for specific api. match_term – Some fields for matching queries and returned document. candidate_rate – A proportion of match query would be the candidate_rate.. query_num – The number of queries you want to estimate layer – The number of queries you want to estimate
Returns:	count(*) of the search engine

query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5}
api – An implementation of simapi for specific api.
match_term – Some fields for matching queries and returned document.
candidate_rate – A proportion of match query would be the candidate_rate..
query_num – The number of queries you want to estimate
layer – The number of queries you want to estimate

Returns:

count(*) of the search engine

deeperlib.estimator.sampler module¶

deeperlib.estimator.sampler.sota_sampler(query_pool, api, match_term, top_k, adjustment=1, samplenum=500)[source]¶

A method to crawl each document from a search engine’s corpus in the same probability ——Random sampling from a search engine’s index