From 87b5a6a1c8bcab350e10fe609f83c6313c00134e Mon Sep 17 00:00:00 2001 From: Diego Heras Date: Sun, 13 Dec 2020 20:46:05 +0100 Subject: [PATCH] Clean up readme (#19) --- README.md | 127 +++++++++++++++++++++++------------------------------- 1 file changed, 54 insertions(+), 73 deletions(-) diff --git a/README.md b/README.md index 2d04159..ea1b054 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,23 @@ # FlareSolverr -Proxy server to bypass Cloudflare protection +FlareSolverr is a proxy server to bypass Cloudflare protection :warning: This project is in beta state. Some things may not work and the API can change at any time. -See the known issues section. ## How it works FlareSolverr starts a proxy server and it waits for user requests in an idle state using few resources. When some request arrives, it uses [puppeteer](https://github.com/puppeteer/puppeteer) with the [stealth plugin](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth) -to create a headless browser (Chrome). It opens the URL with user parameters and waits until the -Cloudflare challenge is solved (or timeout). The HTML code and the cookies are sent back to the -user and those cookies can be used to bypass Cloudflare using other HTTP clients. +to create a headless browser (Chrome). It opens the URL with user parameters and waits until the Cloudflare challenge +is solved (or timeout). The HTML code and the cookies are sent back to the user, and those cookies can be used to +bypass Cloudflare using other HTTP clients. -**NOTE**: Web browsers consume a lot of memory. If you are running FlareSolverr on a machine with few RAM, -do not make many requests at once. With each request a new browser is launched. -(It is possible to use a permanent session. However, if you use sessions, you should make sure to close them as soon as you are done using them.) +**NOTE**: Web browsers consume a lot of memory. If you are running FlareSolverr on a machine with few RAM, do not make +many requests at once. With each request a new browser is launched. + +It is also possible to use a permanent session. However, if you use sessions, you should make sure to close them as +soon as you are done using them. ## Installation @@ -35,7 +36,7 @@ curl -L -X POST 'http://localhost:8191/v1' \ --data-raw '{ "cmd": "request.get", "url":"http://www.google.com/", - "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36", + "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleW...", "maxTimeout": 60000, "headers": { "X-Test": "Testing 123..." @@ -47,23 +48,22 @@ curl -L -X POST 'http://localhost:8191/v1' \ #### + `sessions.create` -This will launch a new browser instance which will retain cookies until you destroy it -with `sessions.destroy`. This comes in handy so you don't have to keep solving challenges -over and over and you won't need to keep sending cookies for the browser to use. +This will launch a new browser instance which will retain cookies until you destroy it with `sessions.destroy`. +This comes in handy, so you don't have to keep solving challenges over and over and you won't need to keep sending +cookies for the browser to use. -This also speeds up the requests since it won't have to launch a new browser instance for -every request. +This also speeds up the requests since it won't have to launch a new browser instance for every request. Parameter | Notes |--|--| -session | Optional. The session ID that you want to be assinged to the instance. If one isn't set a random UUID will be assigned. +session | Optional. The session ID that you want to be assigned to the instance. If isn't set a random UUID will be assigned. userAgent | Optional. Will be used by the headless browser. #### + `sessions.list` -Returns a list of all the active sessions. More for debuging if you are curious to see -how many sessions are running. You should always make sure to properly close each -session when you are done using them as too many may slow your computer down. +Returns a list of all the active sessions. More for debugging if you are curious to see how many sessions are running. +You should always make sure to properly close each session when you are done using them as too many may slow your +computer down. Example response: @@ -79,9 +79,8 @@ Example response: #### + `sessions.destroy` -This will properly shutdown a browser instance and remove all files associaded with it -to free up resources for a new session. Whenever you no longer need to use a session you -should make sure to close it. +This will properly shutdown a browser instance and remove all files associated with it to free up resources for a new +session. When you no longer need to use a session you should make sure to close it. Parameter | Notes |--|--| @@ -117,14 +116,13 @@ Example response from running the `curl` above: "content-length": "61587", "x-xss-protection": "0", "x-frame-options": "SAMEORIGIN", - "set-cookie": "1P_JAR=2020-07-16-04; expires=Sat, 15-Aug-2020 04:15:49 GMT; path=/; domain=.google.com; Secure; SameSite=none\nNID=204=QE3Ocq15XalczqjuDy52HeseG3zAZuJzID3R57g_oeQHyoV5DuvDhpWc4r9IcPoeIYmkr_ZTX_MNOU8IAbtXmVO7Bmq0adb-hpIHaTBIdBk3Ofifp4gO6vZleVuFYfj7ePkHeHdzGoX-en0FvKtd9iofX4O6RiAdEIAnpL7Wge4; expires=Fri, 15-Jan-2021 04:15:49 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=none", - "alt-svc": "h3-29=\":443\"; ma=2592000,h3-27=\":443\"; ma=2592000,h3-25=\":443\"; ma=2592000,h3-T050=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"" + "set-cookie": "1P_JAR=2020-07-16-04; expires=Sat..." }, "response":"...", "cookies": [ { "name": "NID", - "value": "204=QE3Ocq15XalczqjuDy52HeseG3zAZuJzID3R57g_oeQHyoV5DuvDhpWc4r9IcPoeIYmkr_ZTX_MNOU8IAbtXmVO7Bmq0adb-hpIHaTBIdBk3Ofifp4gO6vZleVuFYfj7ePkHeHdzGoX-en0FvKtd9iofX4O6RiAdEIAnpL7Wge4", + "value": "204=QE3Ocq15XalczqjuDy52HeseG3zAZuJzID3R57...", "domain": ".google.com", "path": "/", "expires": 1610684149.307722, @@ -147,7 +145,7 @@ Example response from running the `curl` above: "sameSite": "None" } ], - "userAgent": "Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" + "userAgent": "Windows NT 10.0; Win64; x64) AppleWebKit/5..." }, "status": "ok", "message": "", @@ -167,13 +165,12 @@ postData | Must be a string. If you want to POST a form, don't forget to set the ## Downloading Images and PDFs (small files) -If you need to access an image/pdf or small file, you should pass the `download` parameter to -`request.get` setting it to `true`. Rather than access the html and return text it will -return a the buffer **base64** encoded which you will be able to decode and save the image/pdf. +If you need to access an image/pdf or small file, you should pass the `download` parameter to `request.get` setting it +to `true`. Rather than access the html and return text it will return the buffer **base64** encoded which you will be +able to decode and save the image/pdf. -This method isn't recommended for videos or anything larger. As that should be streamed back to -the client and at the moment there is nothing setup to do so. If this is something you need feel -free to create an issue and/or submit a PR. +This method isn't recommended for videos or anything larger. As that should be streamed back to the client and at the +moment there is nothing setup to do so. If this is something you need feel free to create an issue and/or submit a PR. ## Environment variables @@ -182,24 +179,36 @@ To set the environment vars in Linux run `export LOG_LEVEL=debug` and then start Name | Default | Notes |--|--|--| LOG_LEVEL | info | Used to change the verbosity of the logging. -LOG_HTML | false | Used for debugging. If `true` all html that passes through the proxy will be logged to the console. +LOG_HTML | false | Used for debugging. If `true` all HTML that passes through the proxy will be logged to the console in `debug` level. PORT | 8191 | Change this if you already have a process running on port `8191`. HOST | 0.0.0.0 | This shouldn't need to be messed with but if you insist, it's here! -CAPTCHA_SOLVER | None | This is used to select which captcha solving method it used when a captcha is encounted. +CAPTCHA_SOLVER | None | This is used to select which captcha solving method it used when a captcha is encountered. HEADLESS | true | This is used to debug the browser by not running it in headless mode. ## Captcha Solvers -Sometimes CF not only gives mathmatical computations and browser tests, sometimes they also require -the user to solve a captcha. If this is the case, FlareSolverr will return the captcha page. But that's -not very helpful to you is it? +Sometimes CF not only gives mathematical computations and browser tests, sometimes they also require the user to solve +a captcha. If this is the case, FlareSolverr will return the captcha page. But that's not very helpful to you is it? -FlareSolverr can be customized to solve the captcha's automatically by setting the environment variable -`CAPTCHA_SOLVER` to the file name of one of the adapters inside the [/captcha](src/captcha) directory. +FlareSolverr can be customized to solve the captchas automatically by setting the environment variable `CAPTCHA_SOLVER` +to the file name of one of the adapters inside the [/captcha](src/captcha) directory. -### [CaptchaHarvester](https://github.com/NoahCardoza/CaptchaHarvester) +### hcaptcha-solver -This method makes use of the [CaptchaHarvester](https://github.com/NoahCardoza/CaptchaHarvester) project which allows users to collect thier own tokens from ReCaptcha V2/V3 and hCaptcha for free. +This method makes use of the [hcaptcha-solver](https://github.com/JimmyLaurent/hcaptcha-solver) project which attempts +to solve hCaptcha by randomly selecting images. + +To use this solver you must first install it and then set it as the `CAPTCHA_SOLVER`. + +```bash +npm i hcaptcha-solver +CAPTCHA_SOLVER=hcaptcha-solver +``` + +### CaptchaHarvester + +This method makes use of the [CaptchaHarvester](https://github.com/NoahCardoza/CaptchaHarvester) project which allows +users to collect thier own tokens from ReCaptcha V2/V3 and hCaptcha for free. To use this method you must set these ENV variables: @@ -208,21 +217,9 @@ CAPTCHA_SOLVER=harvester HARVESTER_ENDPOINT=https://127.0.0.1:5000/token ``` -**Note**: above I set `HARVESTER_ENDPOINT` to the default configureation -of the captcha harvester's server, but that could change if -you customize the command line flags. Simply put, `HARVESTER_ENDPOINT` -should be set to the URI of the route that returns a token in plain text when called. - -### [hcaptcha-solver](https://github.com/JimmyLaurent/hcaptcha-solver) - -This method makes use of the [hcaptcha-solver](https://github.com/JimmyLaurent/hcaptcha-solver) project which attempts to solve hcaptcha by randomly selecting images. - -To use this solver you must first install it and then set it as the `CAPTCHA_SOLVER`. - -```bash -npm i hcaptcha-solver -CAPTCHA_SOLVER=hcaptcha-solver -``` +**Note**: above I set `HARVESTER_ENDPOINT` to the default configuration of the captcha harvester's server, but that +could change if you customize the command line flags. Simply put, `HARVESTER_ENDPOINT` should be set to the URI of the +route that returns a token in plain text when called. ## Docker @@ -233,22 +230,6 @@ docker build -t flaresolverr:latest . docker run --restart=always --name flaresolverr -p 8191:8191 -d flaresolverr:latest ``` -## TypeScript +## Related projects -I'm quite new to TypeScript. If you spot any funny business or anything that is or isn't being -used properly feel free to submit a PR or open an issue. - -## Known issues / Roadmap - -The current implementation seems to be working on the sites I have been testing them on. However, if you find it unable to access a site, open an issue and I'd be happy to investigate. - -That being said, the project uses the [puppeteer stealth plugin](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth). If Cloudflare is able to detect the headless browser, it's more that projects domain to fix. - -TODO: - -* Fix remaining issues in the code (see TODOs in code) -* Make the maxTimeout more accurate (count the time to open the first page / maybe count the captcha solve time?) -* Hide sensitive information in logs -* Reduce Docker image size -* Docker image for ARM architecture -* Install instructions for Windows +* C# implementation => https://github.com/FlareSolverr/FlareSolverrSharp