deeperlib.estimator package

Submodules

deeperlib.estimator.aggregation module

deeperlib.estimator.aggregation.sota_estimator(query_pool, api, match_term, uniqueid, query_num)[source]

A method to estimate the aggregation of a search engine’s corpus efficient ——Efficient search engine measurements

Parameters:
  • query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5}
  • api – An implementation of simapi for specific api.
  • match_term – Some fields for matching queries and returned document.
  • uniqueid – The uniqueid of returned messages.
  • query_num – The number of queries you want to estimate
Returns:

count(*) of the search engine

deeperlib.estimator.aggregation.stratified_estimator(query_pool, api, match_term, candidate_rate, query_num, layer=5)[source]

A method to estimate the aggregation of a search engine’s corpus efficient yet unbiased ——Mining a search engine’s corpus: efficient yet unbiased sampling and aggregate estimation

Parameters:
  • query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5}
  • api – An implementation of simapi for specific api.
  • match_term – Some fields for matching queries and returned document.
  • candidate_rate – A proportion of match query would be the candidate_rate..
  • query_num – The number of queries you want to estimate
  • layer – The number of queries you want to estimate
Returns:

count(*) of the search engine

deeperlib.estimator.sampler module

deeperlib.estimator.sampler.sota_sampler(query_pool, api, match_term, top_k, adjustment=1, samplenum=500)[source]

A method to crawl each document from a search engine’s corpus in the same probability ——Random sampling from a search engine’s index

Parameters:
  • query_pool – A dict contains the queries and their benefits. {set([‘yong’,’jun’]):5}
  • api – An implementation of simapi for specific api.
  • match_term – Some fields for matching queries and returned document.
  • top_k – Only top_k documents would be returned by api.
  • adjustment – A paramters used to improve the probability of accepting a document
  • samplenum – The size of the sample
Returns:

A list of sample documents returned by api

Module contents