-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-request proxies in HttpClient #35992
Comments
What should I do about this? |
same issue here. Think about this case: |
Now how do you solve this problem? |
See discussion here: dotnet/extensions#521 (comment) This is a fundamental limitation of HttpClient's design. We're going to look at a solution for this during 5.0 |
I ran into a similar problem related to contextual application of primary handler properties at runtime. I had a robust typed HttpClient with Polly and delegating handlers chained, but some property of the primary handler needed to be different based on a runtime attribute (i.e. different certificate based on URL, endpoint, user, etc.) and trying to register a half dozen versions of the same typed client pipeline did not make sense, even if I did want to set up all the handler configurations and certs within the composition root. I wrote a library to extend the DefaultHttpClientFactory to resolve this: It takes advantage of the fact that the primary handler management and pooling uses the name of the upstream named/typed client. By wrapping the options and tacking in an IHttpMessageHandlerBuilderFilter, we can parse out the original client name to resolve the client's pipeline and options, but then create/use a more granular primary handler in the handler pool based on some provided context when the typed client is requested. The library has two interfaces that have to be implemented, one to define the typed client and the context that you'll pass in (and translate it to a string), and then a second that can use that translated string to return a well-formed primary handler when a new one is requested by the DefaultHttpClientFactory. Hope this helps -- feedback or comments welcome. |
Tagging subscribers to this area: @dotnet/ncl |
I still think this is important but it doesn't seem like we have the time to address this in 5.0. What do you think @dotnet/ncl? cc @davidfowl |
You can implement a custom |
Using IWebProxy is far from being enough. For example, I hope to remove unavailable proxies, to select better proxies as a spare proxy pool, it requires the ‘explicit’ control of each proxy and flexibility of the strategy developers take. |
Sorry to see that, but 'latter' usually brings 'better'. The design needs a careful thinking indeed.
|
Thanks for the example, that makes sense. We won't have anything for this in .NET 5, but can consider it for .NET 6. I'd love to see some more upvotes on the top comment to see interest. |
Tagging subscribers to this area: @dotnet/ncl |
This is why more people use python instead of c# to write crawlers, because .net httpclient is really lame. |
@scalablecory please, please add this feature! I remember I did a websites' parser. It' was very confusing for me to realize that there was no a simple, straight forward way to do that. One of the requirements was to use a new one proxy for each request. There was also a dynamic proxy adding demand, 'cause it's obviously if we are talking about web parsers. If the proxies stop working, does it mean that I have to update the application config (that contains that list of proxies) and redeploy the app? My customer just asked me: "Give me an admin panel where I could add or remove proxies". Dotnet is perfect, but web parsing with proxy is still convoluted. |
Doing anything per-request is going to be expensive. This may force creation of new connection and that is basically similar to creating new HttpClient. Would people mostly use multiple proxies for load balancing or is it functional e.g. different subdomains/destinations need different exit points? |
Excuse me, that's not the point. The point is that, when you have many proxies, you must prefer only one connection to one proxy, which means reusable. For now, if I have a proxy pool, before sending a request with some proxy, I need to create a If the proxy works, I hope that next time, when my app uses it again, I can re-use the same connection last time as possible. So now the problem becomes that I have to create a If I just create a transient instance for one proxy every time, will In the past time, I didn't know a lot about Suppose that
.NET works well for almost all areas, and web crawler is a relatively cold area. What I report is the most common dev-scenario for web crawlers, but I think .NET does not provide enough best practice about it. |
I appreciate the problem, @LeaFrock. One possibility is that we update |
A workaround to this issue for my use case is potentially to make a new named client with different credentials set. |
I don't think 50 clients would create problem. This is far bellow common OS limits and GC capabilities. It should be fairly easy IMHO to create test setup and try it @Thomas-GH-CA. |
Thanks @wfurt , thanks for the response. Looking back my comment is vague with little details so apologies but I just wanted to get an idea if 50 seemed wacky or not and seems it isn't so i will go ahead and try it out. |
Generally, setting the proxy, cookie container or client certs requires managing handler instances on your own. You can no longer follow the guidance of "use a single handler for your application" as you're now storing "per user" state on the handler instance itself. |
This is true in essence. But I hope there will be some improvements here, to offload upper-level developers and help them move in the correct direction of best practices. |
Does any workaround for this issue exist? It seems like either re-create the client before making a new request or change I think if I add client per proxy to |
@bugproof I did a POC on using a single HttpClient instance with multiple proxy configurations in a round-robin faction, it works perfectly. But I haven't tested it in my actual application yet. |
@Vijay-Nirmal By changing |
@bugproof Implementing The below code is just a quick POC, there might be issues in the code. Let me know if anyone knows how to improve the below code. public class ManagedProxy : IWebProxy
{
private readonly IProxyProvider _proxyProvider;
public ManagedProxy(IProxyProvider proxyProvider)
{
_proxyProvider = proxyProvider;
}
public ICredentials? Credentials { get; set; }
public Uri? GetProxy(Uri destination)
{
return _proxyProvider.GetProxy();
}
public bool IsBypassed(Uri host)
{
return false;
}
}
public class ProxyProvider : IProxyProvider
{
private Object proxyLock = new Object();
private int _currentProxyIndex = 0;
private readonly List<Uri?> _proxies = new List<Uri?>();
public Uri? GetProxy()
{
lock (proxyLock) // May affect the performance
{
var proxy = _proxies.Count > 0 ? _proxies[_currentProxyIndex] : null;
_currentProxyIndex = (_currentProxyIndex + 1) >= _proxies.Count ? 0 : _currentProxyIndex + 1;
return proxy;
}
}
// Codes to add proxies to the list. I had a code to web scrape proxies from a website and add it to the list
} |
@Vijay-Nirmal what does this accomplish? How does one set the proxy? |
@davidfowl I had a background service that periodically add new proxies to the list and check for bad proxies from the list and removed them.
I need to web scrape a website which has IP based rate limiter. So, using this method, I could overcome that :) |
I appreciate the problem, can we add |
@gitlsl Do you have a concrete API proposal we can work through?
Do you have a direct comparison for this scenario? |
@davidfowl Here is an interesting phenomenon. You can observe that many people in this post are from the same eastern country (including me). If I remember correctly, the people who pr For me,I wish the api looks like
Maybe we can maintain the compatibility of httpclient and add new feature into |
With Python one can set/configure a proxy per-request at the time of request, for example: import requests
url = "https://example.com"
proxy = "http://some-proxy.example.com"
proxies = {"http": proxy, "https": proxy}
response = requests.get(url, proxies=proxies, verify=False) The issue I'm hitting is that the proxy information is NOT available at app startup. It may come from the database or some other way or it should be dynamically selected. As a general rule, if something cannot be configured dynamically (i.e. via a database query later) then it is too restrictive (i.e. not usable for a class of solutions). The other issue is an architectural one: If I have to configure the proxy in one project while the actual usage is in another one then I'm spreading the logic into multiple places. It would be a lot cleaner (for some use cases) to isolate that logic to a single place. |
yes |
hi, .net is going to have version 9 |
this is not going to happen in 9. |
I'm developing a web crawler framework. As usual, the web crawler needs a pool of proxies to send HTTP messages.
For example.
As we all know, 'new HttpClient()' is not a recommended way of creating clients and it'll cause exceptions related to exhaustion of sockets while too many clients are created.
Therefore, i want to use IHttpFactory instead. But there're still some problems. As I know, DefaultHttpFactory realizes the reuse of HttpMessageHandler. But one handler must be created for one proxy. If I have thousands of proxies, it means I have to create thousands of handlers.
Also, I don't know how to use DI at this situation. I've searched StackOverFlow for solutions, like below,
That's too ugly and I need to create clients dynamically because the amount of proxies is unknown before running. I can't write these hard codes.
What shall we do to enjoy the benefits of HttpClient(with thousands of webproxies) and avoid the side effect? I'm looking forward to suggestions and guidance.
Thank you in advance.
The text was updated successfully, but these errors were encountered: