reprowd.crowdcontext module¶
-
class
reprowd.crowdcontext.
CrowdContext
(endpoint=None, api_key=None, local_db='reprowd.db')[source]¶ Main entry point for Reprowd functionality. Intuitively, a CrowdContext can be thought of as a fault-tolerant and reproducible environment for doing crowdsourced data processing tasks. Once a CrowdContext is created, it will connect to a pybossa server and a local database, providing APIs for creating crowd operators(e.g., CrowdJoin), and manipulating cached crowddata.
-
__init__
(endpoint=None, api_key=None, local_db='reprowd.db')[source]¶ Create a new CrowdContext. The endpoint and api_key should be set, either through the named parameters here or through environment variables ( REPROWD_ENDPOINT, REPROWD_API_KEY)
Parameters: - endpoint – Pybossa server URL (e.g. http://localhost:7000).
- api_key – An api_key to access the pybossa server. You can get an api_key by creating an account in the pybossa server, and check the api_key of the account by clicking the “account name” –> “My Profile” on the top right of the page.
- local_db – The local database name
Returns: A CrowdContext object
>>> from reprowd.crowdcontext import CrowdContext >>> CrowdContext("http://localhost:7000", api_key = "test", local_db = "reprowd.test.db") <reprowd.crowdcontext.CrowdContext instance at 0x...>
-
CrowdData
(object_list, table_name)[source]¶ Return
CrowdData
objectParameters: - object_list – A list of objects where an object can be anything (e.g., int, string, dict)
- table_name – The table used for caching the crowd tasks/results related to the
CrowdData
object
>>> # Create a CrowdData object for image labeling >>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp") <reprowd.operators.crowddata.CrowdData instance at 0x...>
-
CrowdJoin
(object_list, table_name)[source]¶ Return
CrowdJoin
objectParameters: - object_list – A list of objects where an object can be anything (e.g., int, string, dict)
- table_name – The table used for caching the crowd tasks/results related to the
CrowdJoin
object
>>> # Create a CrowdJoin object for deduplication >>> cc.CrowdJoin(["iphone 4", "ipad 2", "ipad two"], "tmp") <reprowd.operators.crowdjoin.CrowdJoin instance at 0x...>
-
show_tables
()[source]¶ Return the list of the tables cached in the local database
>>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp1") <reprowd.operators.crowddata.CrowdData instance at 0x...> >>> cc.CrowdJoin(["iphone 4", "ipad 2", "ipad two"], "tmp2") <reprowd.operators.crowdjoin.CrowdJoin instance at 0x...> >>> tables = cc.show_tables() >>> print ", ".join(tables) tmp1, tmp2 >>> cc.delete_tmp_tables() 2
-
print_tables
()[source]¶ Print a sorted list of the tables cached in the local database (alphabetical order)
>>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp2") <reprowd.operators.crowddata.CrowdData instance at 0x...> >>> cc.CrowdJoin(["iphone 4", "ipad 2", "ipad two"], "tmp1") <reprowd.operators.crowdjoin.CrowdJoin instance at 0x...> >>> cc.print_tables() 1 tmp1 2 tmp2 >>> cc.delete_tmp_tables() 2
-
rename_table
(oldname, newname)[source]¶ Rename a cached table
>>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp1") <reprowd.operators.crowddata.CrowdData instance at 0x...> >>> cc.rename_table("tmp1", "tmp2") True >>> cc.print_tables() 1 tmp2 >>> cc.delete_tmp_tables() 1
-
delete_table
(table_name)[source]¶ Delete a cached table
>>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp1") <reprowd.operators.crowddata.CrowdData instance at 0x...> >>> cc.CrowdJoin(["iphone 4", "ipad 2", "ipad two"], "tmp2") <reprowd.operators.crowdjoin.CrowdJoin instance at 0x...> >>> cc.print_tables() 1 tmp1 2 tmp2 >>> cc.delete_table("tmp1") True >>> cc.print_tables() 1 tmp2 >>> cc.delete_tmp_tables() 1
-
delete_tmp_tables
()[source]¶ The function deletes all the tables whose names start with “tmp”, and returns the number of deleted tables
>>> cc.CrowdData(["image1.jpg", "image2.jpg"], "tmp1") <reprowd.operators.crowddata.CrowdData instance at 0x...> >>> cc.CrowdJoin(["iphone 4", "ipad 2", "ipad two"], "not_tmp") <reprowd.operators.crowdjoin.CrowdJoin instance at 0x...> >>> cc.delete_tmp_tables() 1 >>> cc.delete_table("not_tmp") True
-