Scraping/Mining data from 5 Public Websites
Description
The scope of this task involves one particular section scraping of 5 Public websites which allow such via robots.txt. Some details:
-The section of each website can be determined via a keyword in the URL.
-The # of documents varies per site but on average it is 10K.
-Some structured data will need to be extracted from each page, such as URL, page title and other information within HTML or CSS tags. Approximately 9 attributes will be extracted.
-Data should be delivered as CSV or XML file in previously agreed upon format.
-If task gets completed in a high quality manner then weekly or bi-weekly refreshes can be negotiated for an additional cost.
Please contact me if you have any questions or concerns. I look forward to working with you.
Project Bids
| Expert | Location | Message | last login | ||
|---|---|---|---|---|---|
|
|
3
|
Xing910 |
Beijing China |
|
|
|
|
2
|
Itgenes (Win Bid) |
Pakistan |
|
|
|
|
2
|
Gdinnovative |
Pakistan |
|
|
|
|
2
|
Talentmainly |
Iowa USA |
|
|
|
|
2
|
Djworth |
Pakistan |
|
|
|
|
2
|
Infomediatech |
Pakistan |
|
|
|
|
2
|
Cmaxo |
Georgia USA |
|


