Muse® Proxy Source Profiling: A Complex Username/Password Authentication Scenario

Username/Password authentication is the second most popular authentication type after IP authentication. There are many cases when content providers offer only Username/Password for the institutional subscriptions. The subscribers are starting to prefer Username/Password type of authentication too over IP due to IPs shortage, as it is becoming an issue to obtain new IPs.

In this article we describe a new feature of Muse Proxy 4.1: Extract and Navigate. This new feature is useful for implementing a Username/Password authentication flow for complex websites that use dynamic state information in the authentication process.

In the simplest case it suffices to provide in a single request the Username/Password values along with other CGI parameters requested by the native website.

But there are websites which store and make use of state information, a good example are the websites created using Microsoft’s ASP.NET technologies. In these cases it is not working to submit directly the authentication form with the static values, hidden fields values must be dynamically parsed from the current page and passed along with the static parameter values. Moreover, the dynamically parsed values must be URL encoded.

The Extract and Navigate feature available from Muse Proxy version 4.1 is based on sequences of regular expression EXTRACTORs, URLs and POST_PARAMETERSelements which must be specified in the order of execution. URL and POST_PARAMETERS can make use of parameters extracted by the previous EXTRACTORs as well as static parameters defined in the Source Profile via PARAMETERS.
The parameters used are specified through the construct of ${parameterName}.

The EXTRACTOR elements are containing JDK regular expression including capturing groups as described in the JDK documentation, for example, in “Summary of regular-expression constructs” [http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html]. Capturing groups can be created via parenthesis ( “(groupPattern)”) are normally referred through numbers via $1$2. But because we will be saving these captured groups in the global source parameters we need to differentiate between multiple extractors. The differentiation is done via the ref attribute of the EXTRACTOR element and a variable will become accessible as ${ref_group}, that is the ref attribute value concatenated with underline (_), concatenated with the desired number. If a variable that does not exist is referenced than it will be left literally as the variable, hence make sure a number greater than the number of groups is not used. Also make sure that each EXTRACTOR has a ref attribute and its value is unique at the source profile level.

To exemplify we considered the https://legal.com.tr/ website which has Username/Password authentication.
The page https://legal.com.tr/uye-girisi contains the logon form. We consider YYY and ZZZ the Username/Password values.

In the page source we see the logon HTML form with parameters (hidden fields, view state, control state):

We start the profiling process by creating a new Muse Proxy source XML from an existing one and changing the values for the following nodes with the appropriate values:

  • URL
  • NAME
  • DESCRIPTION
  • REWRITING_PATTERNS

E.g.

<URL>https://legal.com.tr/uye-girisi</URL>
<NAME>Legal Yay&#305;nevi - Hukuk Kitaplar&#305; ve Dergileri</NAME>
<DESCRIPTION>Legal Yay&#305;nevi - Hukuk Kitaplar&#305; ve Dergileri - E-Kitap / E-Dergi / Online Kitap /
Online Dergi / Bas&#305;l&#305; Yay&#305;n - Legal Yay&#305;nevi</DESCRIPTION>
<REWRITING_PATTERNS>*legal.com.tr;</REWRITING_PATTERNS>

We name the new Muse Proxy Source: Legal_tr_UP.xml . UP is the convention we are using for labeling a Username/Password authentication type profile, tr is the 2 letter language code of the website. It is not specified when the website content is English.
Next we define and profile the steps made in the authentication process:

  1. Get the page with the logon form (https://legal.com.tr/uye-girisi) and parse from it the dynamic parameters: __EVENTTARGET, __EVENTARGUMENT, __VIEWSTATE, __VIEWSTATEGENERATOR, __EVENTVALIDATION.
    <URL>https://legal.com.tr/uye-girisi</URL>
    <POST_PARAMETERS></POST_PARAMETERS>
    <EXTRACTOR ref="eventtarget" refProcess="urlEncode">type="hidden"\sname="__EVENTTARGET"\sid="__EVENTTARGET"\svalue="([^"]*?)"</EXTRACTOR>
    <EXTRACTOR ref="eventargument" refProcess="urlEncode">type="hidden"\sname="__EVENTARGUMENT"\sid="__EVENTARGUMENT"\svalue="([^"]*?)"</EXTRACTOR>
    <EXTRACTOR ref="viewstate" refProcess="urlEncode">type="hidden"\sname="__VIEWSTATE"\sid="__VIEWSTATE"\svalue="([^"]+?)"</EXTRACTOR>
    <EXTRACTOR ref="viewstategenerator" refProcess="urlEncode">type="hidden"\sname="__VIEWSTATEGENERATOR"\sid="__VIEWSTATEGENERATOR"\svalue="([^"]+?)"</EXTRACTOR>
    <EXTRACTOR ref="eventvalidation" refProcess="urlEncode">type="hidden"\sname="__EVENTVALIDATION"\sid="__EVENTVALIDATION"\svalue="([^"]+?)"</EXTRACTOR>

    Description: For each value to be parsed we define an EXTRACTOR field. In the ref attribute we specify the name of the extractor rule, name with which we will further refer the value. In the refProcess attribute we specify a processing method, in this case we want to URL encode it.
    The value of the EXTRACTOR field is the regular expression for matching the desired value.
  2. Use the values extracted above and along with the Username/Password and other static CGI parameters and values submit to achieve the authentication.
    <URL>https://legal.com.tr/uye-girisi</URL>
    <POST_PARAMETERS><![CDATA[__EVENTTARGET=${eventtarget_1}&__EVENTARGUMENT=${eventargument_1}&__VIEWSTATE=${viewstate_1}&
    __VIEWSTATEGENERATOR=${viewstategenerator_1}&__EVENTVALIDATION=${eventvalidation_1}&
    ctl00%24MainContent%24userlogin%24UserName=${userName}&ctl00%24MainContent%24userlogin%24Password=${userPassword}&
    ctl00%24MainContent%24userlogin%24Button1=Giri%C5%9F]]></POST_PARAMETERS>

    The ${userName} and ${userPassword} metavariables are declared as following:
    <PARAMETERS>
    <PARAMETER>
    <NAME>userName</NAME>
    <VALUE>YYY</VALUE>
    </PARAMETER>
    <PARAMETER>
    <NAME>userPassword</NAME>
    <VALUE>ZZZ</VALUE>
    </PARAMETER>
    </PARAMETERS>

    The full content of the xml configuration file is below, it can be downloaded from the EduLib website too by clicking here
    <ICE-CONFIG>
    <URL>https://legal.com.tr/uye-girisi</URL>
    <POST_PARAMETERS></POST_PARAMETERS>
    <EXTRACTOR ref="eventtarget" refProcess="urlEncode">type="hidden"\sname="__EVENTTARGET"\sid="__EVENTTARGET"\svalue="([^"]*?)"</EXTRACTOR>
    <EXTRACTOR ref="eventargument" refProcess="urlEncode">type="hidden"\sname="__EVENTARGUMENT"\sid="__EVENTARGUMENT"\svalue="([^"]*?)"</EXTRACTOR>
    <EXTRACTOR ref="viewstate" refProcess="urlEncode">type="hidden"\sname="__VIEWSTATE"\sid="__VIEWSTATE"\svalue="([^"]+?)"</EXTRACTOR>
    <EXTRACTOR ref="viewstategenerator" refProcess="urlEncode">type="hidden"\sname="__VIEWSTATEGENERATOR"\sid="__VIEWSTATEGENERATOR"\svalue="([^"]+?)"</EXTRACTOR>
    <EXTRACTOR ref="eventvalidation" refProcess="urlEncode">type="hidden"\sname="__EVENTVALIDATION"\sid="__EVENTVALIDATION"\svalue="([^"]+?)"</EXTRACTOR><URL>https://legal.com.tr/uye-girisi</URL>
    <POST_PARAMETERS><![CDATA[__EVENTTARGET=${eventtarget_1}&__EVENTARGUMENT=${eventargument_1}&__VIEWSTATE=${viewstate_1}&amp
    ;__VIEWSTATEGENERATOR=${viewstategenerator_1}&__EVENTVALIDATION=${eventvalidation_1}&
    ctl00%24MainContent%24userlogin%24UserName=${userName}&ctl00%24MainContent%24userlogin%24Password=${userPassword}&
    ctl00%24MainContent%24userlogin%24Button1=Giri%C5%9F]]></POST_PARAMETERS>
    <USER_AGENT>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36</USER_AGENT>
    <CONNECT_TIMEOUT>180000</CONNECT_TIMEOUT>
    <READ_TIMEOUT>180000</READ_TIMEOUT>
    <NAME>Legal Yay&#305;nevi - Hukuk Kitaplar&#305; ve Dergileri</NAME>
    <DESCRIPTION>Legal Yay&#305;nevi - Hukuk Kitaplar&#305; ve Dergileri - E-Kitap / E-Dergi /
    Online Kitap / Online Dergi / Bas&#305;l&#305; Yay&#305;n - Legal Yay&#305;nevi</DESCRIPTION>
    <AUTHENTICATION_TYPE>User/Password</AUTHENTICATION_TYPE>
    <HTTP_AUTHORIZATION_USER_NAME/>
    <HTTP_AUTHORIZATION_USER_PASSWORD/>
    <HTTP_AUTHORIZATION_SCHEME/>
    <PROXY_USED>SOURCE_LEVEL</PROXY_USED>
    <PROXY_HOST></PROXY_HOST>
    <PROXY_PORT></PROXY_PORT>
    <PROXY_PAC></PROXY_PAC>
    <PROXY_AUTHORIZATION_USER_NAME/>
    <PROXY_AUTHORIZATION_USER_PASSWORD/>
    <PROXY_AUTHORIZATION_SCHEME/>
    <REWRITING_PATTERNS>*legal.com.tr;</REWRITING_PATTERNS>
    <TRANSPARENT_CONTENT_PATTERNS>*/*.js*;</TRANSPARENT_CONTENT_PATTERNS>
    <ENCODING>UTF-8</ENCODING>
    <REFERER></REFERER>
    <COOKIES></COOKIES>
    <CUSTOM_HTTP_HEADERS/>
    <REPLACE_HOST/>
    <REPLACE_PATH></REPLACE_PATH>
    <SSL_CERTIFICATES/>
    <SSL_ALIASES/>
    <FOLLOW_REDIRECTS>true</FOLLOW_REDIRECTS>
    <PARAMETERS>
    <PARAMETER>
    <NAME>userName</NAME>
    <VALUE>YYY</VALUE>
    </PARAMETER>
    <PARAMETER>
    <NAME>userPassword</NAME>
    <VALUE>ZZZ</VALUE>
    </PARAMETER>
    </PARAMETERS>
    <LAST_UPDATED>2015-09-08</LAST_UPDATED>
    </ICE-CONFIG>

General observations:

  • There are many online tools for testing the regular expressions, use them to make sure that the defined extractor rules behave correctly;
  • If a defined extractor rule does not match, the flow stops and the last page is displayed;
  • At any time make sure that the source profile is XML well formed.