Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: Add IPEndPoint.Parse() & .TryParse() #26916

Closed
NickCraver opened this issue Jul 24, 2018 · 22 comments
Closed

API Proposal: Add IPEndPoint.Parse() & .TryParse() #26916

NickCraver opened this issue Jul 24, 2018 · 22 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Net
Milestone

Comments

@NickCraver
Copy link
Member

namespace System.Net
{
    public class IPEndPoint
    {
        public static IPEndPoint Parse(string s);
        public static IPEndPoint Parse(ReadOnlySpan<char> s);
        public static bool TryParse(string s, out IPEndPoint result);
        public static bool TryParse(ReadOnlySpan<char> s, out IPEndPoint result);
    }
}

Origin issue: #23289

Rationale

Parsing a user-provided endpoint is often needed for client libraries doing connections with Sockets as well as other usages like parsing for valid endpoints in things like HTTP headers. The ASP.NET already has the implementation in IPEndPointParser.TryParse for the HTTP headers use case.

User implementations are also often buggy, inefficient, outdated, or all 3 (lots of examples on GitHub as well). A lot of this has to do with legacy code. For example, a lot of code in the wild is splitting on :, because when IPv4 was our main protocol you're only talking about 1.1.1.1:1234 (<ip>:<port>), so a bunch of code does a .Split(':') or a .IndexOf(':'). It worked well enough. Now that we have IPv6 you have examples like ::1 or [::1]:123, or [2001:db7:85a3:8d2:1319:8a2e:370:7348]:1000...and all that code is broken.

A framework method for parsing would allow users to (with zero ongoing effort in some or many cases) keep up with any change in IP parsing as well as do it efficiency and correctly. For example, the common .Split(':') and its array allocations are inefficient and unnecessary. With the [...] delimiters and such it's also easy to get wrong.

cc @stephentoub @terrajobst @Tratcher @benaadams

@vcsjones
Copy link
Member

Item for discussion... is there a use case for overloads accepting ReadOnlySpan<char>?

@NickCraver
Copy link
Member Author

To @vcsjones's point, we do already have the case. For example, a connection string to StackExchange.Redis could be something like: "[::1]:6379,syncTimeout=10000"...so we would be parsing a part of the string for the most efficient case, as would others. To be fair, that's a trivial amount of cost compared to the off-the-box network IP.

HTTP headers has the same issue, for example X-Forwarded-For can be one of many comma-delimited IPs, or the Forwarded header would again be partials.

So...yeah, I can definitely see the use case. I'll go ahead and add them to the proposal.

@GrabYourPitchforks
Copy link
Member

What are the byte-based overloads for?

@NickCraver
Copy link
Member Author

@GrabYourPitchforks me switching between coding and GitHub too much - those should be ReadOnlySpan<char>...fixed.

@stephentoub
Copy link
Member

Seems reasonable; marking as ready for review.

@terrajobst
Copy link
Contributor

terrajobst commented Sep 25, 2018

Video

We should probably also expose this for DnsEndPoint, but that should be another issue. This one looks good as proposed.

@jbhensley
Copy link
Contributor

It seems the easiest implementation would be to start at the end of the string and extract the port, working backwards until you reach the address/port delimiter. Whatever is left can be passed off to the existing Parse routines under IPAddress. Question: is there any precedent as to what may constitute the delimiter? RFC 5952, section 6 lists several forms for IPv6:

  • [2001:db8::1]:80
  • 2001:db8::1:80
  • 2001:db8::1.80
  • 2001:db8::1 port 80
  • 2001:db8::1p80
  • 2001:db8::1#80

It's not clear that we would want to support all of these.

@Tratcher
Copy link
Member

[2001:db8::1]:80, [2001:db8::1], and 2001:db8::1 are the primary use cases. I've never seen any of those other formats used.

@stephentoub
Copy link
Member

If there's a port, we should only accept the []: syntax. The RFC you linked to had a similar sentiment just after the quoted list:

"The [] style as expressed in [RFC3986] SHOULD be
employed, and is the default unless otherwise specified. Other
styles are acceptable when there is exactly one style for the given
context and cross-platform portability does not become an issue. For
URIs containing IPv6 address literals, [RFC3986] MUST be followed, as
well as the rules defined in this document."

@NickCraver
Copy link
Member Author

Completely agree with the above 3 only. The other formats are ambiguous, I've never seen them in the wild (but don't claim to be an expert), and I bet they die off. If we haven't seen similar calls for support on the ASP.NET Core side, we shouldn't assume they're needed here. They could always be added later, if assumptions are wrong here.

@jbhensley
Copy link
Contributor

Easy enough. The existing IPAddress parser for IPv6 already handles that syntax, we just need to make sure the syntax is used.

Our advantage is in knowing that the string value must include a port since this is IPEndPoint parse. No port == invalid. Basically, I envision Parse/TryParse using something like this at its core.

if (epSpan != null)
{
	int port = 0;
	int multiplier = 1;
	for (int i = epSpan.Length - 1; i >= 0; i--)
	{
		char digit = epSpan[i];
		if(digit == AddressPortDelimiter)
		{
			// We've reached the end.
			Console.WriteLine($"Port: {port}");
			Console.WriteLine(epSpan.Slice(0, i).ToString());   // "Address" (pass to IPAddress Parse/TryParse)
			// If we get an IPv6 address back, ensure that epSpan[i-1] == ']'
		}
		else if ('0' <= digit && digit <= '9')
		{
			port += multiplier * (digit - '0');
			if(port > MaxPort)
			{
				// If we've already exceeded the max port value then bail. No point going until we (potentially) hit an overflow
				break;
			}
			multiplier *= 10;
		}
		else
		{
			break;  // Invalid digit
		}
	}

// If the loop runs out without hitting a delimiter then we don't have both an address and a port

If not wildly off track, I'd like to take a crack at this one.

@Tratcher
Copy link
Member

We already have an initial implementation here.

Also, I expect the port to be optional and default to 0.

@jbhensley
Copy link
Contributor

Ah, I see. Yes, the approach I outlined only works when it can be assumed that the port is included, which makes implementation fairly trivial.

@jbhensley
Copy link
Contributor

jbhensley commented Oct 28, 2018

@Tratcher You're implementation looks pretty good. After pondering a bit more though, I do think that the "backwards" way outlined above could be modified to handle optional port and avoid multiple IndexOf calls or any Substring calls (i.e. hopefully process faster). Will look at this a bit later (when not in the middle of carving pumpkins with my kids ;) ... sorry).

@jbhensley
Copy link
Contributor

Ok, I think we're probably looking at something more like below (minimally tested). There's a complication here if we need to support port numbers in non-decimal numeric base (e.g. "[Fe08::1]:0xFA") which neither this nor the implementation presented by @Tratcher does. I think I can do that in parallel with processing as decimal as I will be unsure (working backward) of the base until I hit the leading "0x" sequence, a colon or an alpha hex digit. Either the decimal or hex value can be discarded once the base is determined.

I'll continue work on this method unless there are any major objections.

public static bool TryParse(ReadOnlySpan<char> epSpan, out IPEndPoint endPoint)
{
	endPoint = null;
	if (epSpan != null)
	{
		// Determine whether to process as IPv4 or IPv6
		bool processAsIPv4 = false;
		if (epSpan.Length > 4)
		{
			// Check to see if this might be IPv4
			for (int i = 1; i < 4; i++) // Skip position zero. If a dot shows up there, this thing is invalid anyway
			{
				if (epSpan[i] == '.')
				{
					processAsIPv4 = true;
					break;
				}
			}
		}

		// Start at the end of the string and work backward, looking for a port delimiter
		int portDecimal = 0;
		int multiplier = 1;
		// TODO: we need to process cases where the port is expressed as Hex
		int sliceLength = epSpan.Length;
		for (int i = epSpan.Length - 1; i >= 0; i--)
		{
			char digit = epSpan[i];

			// Locating a delimiter ends the sequence. We either have IPv4 with a port or IPv6 (with or without) or we have garbage
			if (digit == ':')
			{
				// Determine how far to slice into the span (pre-set to entire length)
				if (processAsIPv4 || (i > 0 && epSpan[i - 1] == ']'))
				{
					sliceLength = i;
				}

				break;
			}
			else if ('0' <= digit && digit <= '9')
			{
				// We'll avoid overflow in cases where someone passes in garbage
				if (portDecimal < MaxPort)
				{
					portDecimal += multiplier * (digit - '0');
					multiplier *= 10;
				}
			}
		}

		// We've either hit the delimiter or ran out of characters. Let's see what the IP parser thinks.
		// TODO: this can likey be optimized in core since we already know what kind of address we have
		if (IPAddress.TryParse(epSpan.Slice(0, sliceLength), out IPAddress address))
		{
			// If we did not hit a delimiter then default to port 0
			if(sliceLength == epSpan.Length)
			{
				portDecimal = 0;
			}

			// Avoid tossing on invalid port
			if (portDecimal <= MaxPort)
			{
				endPoint = new IPEndPoint(address, portDecimal);
				return true;
			}
		}
	}

	return false;
}

@stephentoub
Copy link
Member

if we need to support port numbers in non-decimal numeric base

Why would we need to do that?

@jbhensley
Copy link
Contributor

I noticed that hex port numbers are already included in the unit tests for IPv6 parsing:

https://github.com/dotnet/corefx/blob/73bc1413112c0d60f295ccffb40e2cf85638f177/src/System.Net.Primitives/tests/FunctionalTests/IPAddressParsing.cs#L279

We should reach a consensus on whether to support that here (the IP parser just drops the port, it does not actually do anything with it).

There are some other minor issues as well. IPv4 encoded into IPv6 ("::192.168.0.010") fails my "is this IPv4" test in the code above, but that should be an easy fix. Few other small things as well. Just wanted to point that one out in case anyone notices it.

@Tratcher
Copy link
Member

@jbhensley note that test is only verifying IP parsing, not port parsing. The ports are ignored.

@jbhensley
Copy link
Contributor

Noted. The presence of a hex port, however, does raise the issue of whether it is an expected, valid input. Don't misunderstand, I'm all in favor of base-10 only support; it makes coding this much easier. Just so long as we all know that it's an intentional decision.

@jbhensley
Copy link
Contributor

@Tratcher FYI: There are some errors in the logic on your parser, mainly centered around relative position of ":" and "]". Try running this address through:

"[3731:54:65fe:2::1]IAmNotValid65535"

You'll get address 3731:54:65fe:2::1 on port 0 because the code assumes it's IPv6 without a port. This is due to the fact that the last position of "]" is greater than the last position of ":". The input is invalid anyway, so no harm no foul. I'll put stuff like this in our unit tests to make sure core returns false/tosses (for TryParse/Parse respectively).

@jbhensley
Copy link
Contributor

PR dotnet/corefx#33119 opened for feedback.

@davidsh
Copy link
Contributor

davidsh commented Nov 13, 2018

PR dotnet/corefx#33119 has been merged into master already.

@davidsh davidsh closed this as completed Nov 13, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Net
Projects
None yet
Development

No branches or pull requests

9 participants