1

I'm trying to test if the value of HTTP_HOST exists as part of the URI. First, I capture the value into a backreference with:

RewriteCond "%{HTTP_HOST}" "(.+)"

Then I test for its presence with:

RewriteCond "%{REQUEST_URI}" "/foo/%1/bar"

If the 2nd RewriteCond succeeds, then remove the value in the rewrite:

RewriteRule "^foo/(.+)/(.+)$" "foo/$2" [L]

But, with a URI like:

http://foo/localhost/bar

The 2nd RewriteCond never matches, and the trace shows:

applying pattern '^foo/(.+)/(.+)$' to uri 'foo/localhost/bar'

RewriteCond: input='localhost' pattern='(.+)' => matched

RewriteCond: input='/foo/localhost/bar' pattern='/foo/%1/bar' => not-matched

So, why won't the %1 backreference match? AND Should rewrite:trace4 be expanding that %1 in the 2nd RewriteCond?

1
  • "with a URI like: http://foo/localhost/bar" - I assume you mean a URI like http://localhost/foo/localhost/bar (you seem to be missing the hostname)?
    – MrWhite
    Oct 8 at 23:28

1 Answer 1

0

You can't use Apache backreferences in the regex itself (for the same reason you cannot use %{VAR} syntax directly in the regex). In the regex /foo/%1/bar the characters % and 1 are matched literally. (If Apache performed some kind of variable expansion before applying the regex - which it doesn't - then it wouldn't strictly be a PCRE-regex.)

(Which is why you are not seeing %1 expanded in the log. Regardless of the LogLevel.)

You can, however, use an internal backreference (regex-syntax) to match the requested hostname (HTTP_HOST) with the 2nd path-segment in the requested URL-path. For example:

RewriteCond %{HTTP_HOST}@$1 ^([^@]+)@\1$
RewriteRule ^foo/([^/]+)/(.+) foo/$2 [L]

Note that I changed the first capturing group from .+ to [^/]+ since this is only intended to match a single path-segment. Otherwise, if you had the URL /foo/localhost/bar/baz then localhost/bar would be captured (since + is greedy), which would fail to match the hostname (when it perhaps should).

(Aside: No need to surround all the arguments in ", unless it contains spaces. Too many " arguably makes it harder to read IMO. And I removed the unnecessary trailing $ on the RewriteRule pattern.)

In the TestString %{HTTP_HOST}@$1 (an "ordinary" string that supports variable expansion):

  • %{HTTP_HOST} is the requested Host header
  • @ is just an arbitrary character that is not expected to appear in the either the hostname or path-segment
  • $1 is the value of the first backreference (the second path-segment) as captured from the RewriteRule pattern.

The TestString is then matched against the regex ^([^@]+)@\1$ where:

  • ([^@]+) Matches (and captures) against the HTTP_HOST server variable.
  • @ matches the literal @ in the TestString.
  • \1 is an internal backreference (in the regex itself) that matches against the first capturing group in the regex, ie. the value captured by ([^@]+) (above).

So, a request of the form http://localhost/foo/localhost/bar (which is what I assume you meant in the question) would result in a condition that tests:

  • localhost@localhost against the regex ^([^@]+)@\1$ - successful

Whereas, a request for the form http://localhost/foo/something/bar would result in:

  • localhost@something =~ ^([^@]+)@\1$ - FAIL

Aside:

However, there could be other issues here, depending on what (and why) you are doing here. If the resulting /foo/bar is a "virtual" URL-path then you presumably have a front-controller pattern in the .htaccess file also - but this rewritten URL is unlikely to be picked up by the front-controller - depending on how this is implemented.

If the resulting URL-path is intended to map to a physical file (eg. /foo/bar.html) then "malformed" URLs of the form /foo/localhost/localhost/bar.html would also map to the same resource (potentially creating a duplicate content issue). This could be resolved by either using the END flag (on Apache 2.4+) to prevent further rewrites, or perhaps test for the existence of the file before rewriting.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .