# Detection Rules - Admin Panel & Login Page Identification This file contains the core detection rules for identifying admin panels and login pages. ## Login Page Indicators These patterns indicate a page is likely a login page. They look for password input fields, login forms, and authentication-related content. ### Password Field Patterns ``` [type="password"] [type='password'] [type:"password"] [Type="password"] [Type='password'] [input[type=password]] ["a-input-password"] [type=password] [type: \"password\"] [Type:"password"] [type="Password"] ``` ### Login Configuration Patterns ``` [logIn:"登录",username:"账号",password:"密码"] ["input-password"] [/_app.config.js?v=] [auth.password] ``` ### Product-Specific Markers ``` [金蝶云] # Kingdee Cloud ``` **Detection Logic:** If ANY of these patterns are found in the HTML response (wrapped in brackets for context matching), the page is flagged as a potential login page. --- ## CAPTCHA Detection Rules These patterns identify the presence of CAPTCHA/verification code fields on login pages. ### CAPTCHA Indicators ``` [请输入验证码] # "Please enter verification code" [placeholder="验证码"] # Placeholder with "verification code" [placeholder="captcha"] # English captcha placeholder [title="验证码"] # Title with "verification code" [title="captcha"] # English captcha title [alt="验证码"] # Alt text with "verification code" [alt="captcha"] # English captcha alt [name="captcha"] # Input name "captcha" [请输入手机验证码] # "Please enter SMS verification code" [login.VerificationCode"] # JavaScript verification code variable [validateEvent] # Validation event [validateState] # Validation state ``` **Usage:** CAPTCHA detection increases the confidence that a page is a login page and provides additional context for the user. --- ## False Positive Exclusion Rules These patterns identify pages that should be EXCLUDED from results (not admin/login pages). ### Exclusion Patterns ``` 页面不存在 # "Page not found" in title

页面不存在.

# "Page not found" heading (with period)

页面不存在

# "Page not found" heading

页面不存在.

# "Page not found" subheading (with period)

页面不存在

# "Page not found" subheading AIHelp Web Portal # AIHelp portal (not admin) ``` **Detection Logic:** If ANY of these patterns are found, the page is IMMEDIATELY excluded from results, even if other login indicators are present. --- ## Confidence Scoring System ### High Confidence - Matches admin condition regex patterns (product fingerprints) - Contains both password field AND admin-related title - Matches multiple product-specific patterns - Known admin URL paths (e.g., /admin, /wp-admin) ### Medium Confidence - Matches login CSS conditions (password fields) - Contains login-related text but no admin indicators - Generic login paths (e.g., /login, /signin) - Does NOT match any exclusion patterns ### Low Confidence - Partial matches or ambiguous indicators - Generic paths that could be admin (e.g., /dashboard, /portal) - Requires manual verification --- ## Detection Workflow ``` 1. Fetch URL | 2. Check exclusion patterns (false positives) |--> If matched: EXCLUDE and stop | 3. Check login CSS conditions | 4. Check admin condition regex patterns | 5. Check CAPTCHA conditions | 6. Calculate confidence level | 7. Extract page title | 8. Record result with metadata ``` --- ## HTTP Request Configuration ### Headers ```python { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, sdch", "Accept-Language": "en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,zh-TW;q=0.2", "Connection": "keep-alive", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 12_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.2357.130 Safari/537.36", } ``` ### Request Settings - Timeout: 10 seconds - Follow redirects: Yes - SSL verification: No (ignore certificate errors) - Accept status codes: 200, 401 - Max response size: 5MB --- ## URL Extraction & Following ### Redirect URL Patterns Extract redirect URLs from: - `location.href = "..."` - `` - `window.open("...")` ### JavaScript/CSS Link Extraction - `