Getting HTML response fails respectively after first fail
我有一个程序,每5分钟可以获取大约500个网页的HTML代码
它运行正常,直到第一次失败(6秒内无法下载源)
之后,所有线程都将失败
如果我重新启动程序,它会再次正确运行,直到…
如果我错了,我该怎么做才能做得更好?
此功能每5分钟运行一次:
1 2 3 4 5 6 7 8 9 10 11 12 | foreach (Company company in companies) { string link = company.GetLink(); Thread t = new Thread(() => F(company, link)); t.Start(); if (!t.Join(TimeSpan.FromSeconds(6))) { Debug.WriteLine( company.Name +" Fails"); t.Abort(); } } |
这个函数下载HTML代码
1 2 3 4 5 6 7 8 9 10 11 | private void F(Company company, string link) { try { string htmlCode = GetInformationFromWeb.GetHtmlRequest(link); company.HtmlCode = htmlCode; } catch (Exception ex) { } } |
号
这门课:
1 2 3 4 5 6 7 8 9 10 11 12 | public class GetInformationFromWeb { public static string GetHtmlRequest(string url) { using (MyWebClient client = new MyWebClient()) { client.Encoding = Encoding.UTF8; string htmlCode = client.DownloadString(url); return htmlCode; } } } |
和Web客户端类
1 2 3 4 5 6 7 8 9 | public class MyWebClient : WebClient { protected override WebRequest GetWebRequest(Uri address) { HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest; request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip; return request; } } |
。
我有四个基本建议:
OT:这是一个学术项目吗?
如果您的foreach循环了500多家公司,并且每个公司都在创建一个新的线程,那么您的互联网速度可能会成为瓶颈,您将收到超过6秒的超时,并且经常失败。
我建议你尝试并行。注意
1 2 3 4 5 6 7 8 9 10 11 12 | Parallel.ForEach(companies, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (company) => { try { string htmlCode = GetInformationFromWeb.GetHtmlRequest(company.link); company.HtmlCode = htmlCode; } catch(Exception ex) { //ignore or process exception } }); |