所有软件外包项目 Gray arrow bg Crawling 10 websites & downloa...

Crawling 10 websites & download subtitles 资金已经托管 线上项目,线下洽谈,智城安排

发包方 : Kimberly walker 接包方 : Xercestechnologies 状态 :完成
项目编号 : 91273
项目预算 : 多于$100
开发周期 : 7 天
技能 : Perl
发布日期 : 2009-11-06

描述

I need a Perl script to craw 10 subtitles websites and download all their subtitles.



http://www.opensubtitles.org/

http://subscene.com/

http://www.divxsubtitles.net/

http://www.subtitlesource.org/

http://www.podnapisi.net/

http://www.mysubtitles.com/

http://www.allsubs.org/

http://www.subtitleonline.com/

http://www.tvsubtitles.net/

http://www.sub-titles.net/



In some websites the subtitles are .SRT, .SUB, .TXT, .SSA, .SMI, or .MPL while in other websites they are inside .ZIP files. Your script should accommodate for that.



The downloads should be in 10 different folders (one folder per each web site, named accordingly).



Your scripts should be able to avoid downloading unnecessary files (no need to download JPGs if all I need is to crawl the HTML in order to identify the SRT files). So please make a list of downlaodable / parsable files (that I can modify later on).



The script should be commented so that I can modify it later on.



竞标

请您先登录,然后提交此项目的竞标方案。
还不是智城用户? 智城期待您的加入,请注册成为我们的一员吧!
Project ad2