[by pass recaptcha] Use selenium-webdriver to make MeWe Scraper

Clipversity
5 min readDec 28, 2020

Our Article: https://clipversity.com/article/-by-pass-recaptcha--Use-selenium-webdriver-to-make-MeWe-Scraper-MOL6Ledf35RHzZ6Pxqw/en

As more and more people use MeWe, many developers are thinking about whether MeWe can become the next Facebook. However, this is not the content of this article, and I want to share with you Web Scraper this time.

If you have used MeWe, you know that you need to log in before viewing the page or group posts. In addition, MeWe advocates data security and no ads, so they do not provide APIs for us to use. In other words, writing a Web Scraper is the only way.

Final goal

If you need to make MeWe Scraper, our Scraper needs to meet the following two requirements:

Step 1: Scraper needs to log in automatically (ReCaptcha issues need to be handled)

Step 2: Scraper can automatically fill in text and scroll down

Existing Web Scraper

This time we will use the selenium-web node library as the library of our Web Scraper. I will explain why we use it and how I tried other Web Scraper.

Chrome Extension Scraper-webscraper.io

At the beginning, I planned to use webscraper.io as MeWe’s Scraper. However, I found out that MeWe must be logged in, and webscraper.io did not allow users to fill in text on the Text Field automatically. Although the text field of some web pages can be filled with text through Javascript and then the search function will be triggered, the search field of MeWe is controlled by Javascript, so keyboard input must be simulated.

Node library — puppeteer and chrome-aws-lambda

Puppeteer is a headless Chrome developed by Google. Headless Chrome is to execute Chrome on the server. We can control Chrome through the command-line interface to do different automation tasks.

And chrome-aws-lambda is very similar to puppeteer, it is a simplified version of puppeteer, mostly used to simulate the browser in the backend.

Their functions are also very rich, for example in Nodejs:

let page = await browser.newPage();
await page.goto('https://example.com');

Open a tab and go to https://example.com.

You can also choose how long to wait before reaching the next step.

If you are interested, please refer to the their github link here

puppeteer/puppeteer

alixaxel/chrome-aws-lambda

The reason why puppeteer and chrome-aws-lambda are not used in the end is that they are not as convenient as selenium in executing Scraper, and selenium is faster than both.

selenium-webdriver

This time I will use the nodejs version of selenium-webdriver, it also has a python version, the syntax is similar, so don’t worry.

The first step: create a new Node project

npm init

After completing some settings, you can install the packages that MeWe Scraper needs

npm i -g chromedrive
npm i selenium-webdriver

The second step: create start.js

We will write all the programs here, we need to do three actions:

  1. Open MeWe.com and automatically fill in the account and password
  2. Use the “manual” method to by pass recaptcha and complete the login
  3. Fill in the search string in the MeWe search bar and get the search results.

start.js

const {Builder, By, Key, until} = require('selenium-webdriver');
const fs = require('fs');
const C = {
username: "YOUR_MEWE_ACCOUNT",
password: "YOUR_MEWE_PASSWORD"
};
const start = async (searchStr) => {
let driver = await new Builder().forBrowser('chrome').build();

/*
Three actions
*/
}
start("香港人")

The first action: open MeWe.com and automatically fill in the account and password

await driver.get('http://www.mewe.com/');   //open MeWe.com 
await driver.wait(until.elementLocated(By.xpath('//*[@id="login-fake-btn"]')), 10000);
await driver.findElement(By.xpath('//*[@id="login-fake-btn"]')).click();
await driver.sleep(100);
await driver.findElement(By.xpath('//*[@id="email"]')).sendKeys(C.username); //fill email
await driver.sleep(100);
await driver.findElement(By.xpath('//*[@id="password"]')).sendKeys(C.password); //fill password
await driver.findElement(By.xpath('//*[@id="login-overlay"]/div/form/button')).click(); //login
await driver.wait(until.elementLocated(By.xpath('//*[@id="ember24"]')), 20000); //wait until login is completed

The second action: use the “manual” method to by pass recaptcha and complete the login

Since MeWe has Recaptcha in the login page, we need to “manually” complete recaptcha before executing the last sentence of the program.

Some websites provide “automatic hands-on” or other libraries to complete recaptcha, I won’t introduce too much here.

The third action: fill in the search string in the MeWe search bar, and then get the search results.

start.js

...
await new Promise(r => setTimeout(r, 1000));
await driver.findElement(By.xpath('//*[@id="ember24"]')).sendKeys(searchStr, Key.RETURN);
await driver.sleep(500);
await driver.findElement(By.xpath('//div[text() =\'Groups\']')).click();
await driver.sleep(1000);
let scrollHeight = 2407
let numberOfLoop = 5
for (let i = 0; i < numberOfLoop; i++) {
await driver.executeScript("document.getElementsByClassName(\"smart-search_result smart-search_result--groups win-scrollbar\")[0].scrollBy(0, " + scrollHeight + ")")
await driver.sleep(1500);
}
let scrollResult = await driver.findElement(By.className('smart-search_result smart-search_result--groups win-scrollbar'))
let children = await scrollResult.findElements(By.className('smart-search_group c-mw-smart-search-group ember-view'))

In the above code, we will automatically select the category of the group ( will support page on github later). Since MeWe only lists 30 results at a time, we need to use selenium-webdriver to scroll down at the specified time to get more results.

We found at least five data for each group in the search:

  • Group avatar
  • Group name
  • Group link
  • Number of people in the group
  • Group type (public/non-public)

So we can use Forloop to obtain these data

let jsonArr = []
let allPromise = []
if (Array.isArray(children)) {
children.map(async webEle => {
allPromise.push(new Promise(async (resolve, reject) => {
let imgDom = await webEle.findElement(By.className("profile_img usr-avatar-small"))
let aDom = await webEle.findElement(By.className("smart-search-group_img ember-view"))
let titleDom = await webEle.findElement(By.className("h-trim ember-view"))
let numberOfMemberDom = await webEle.findElement(By.className("smart-search-group_members"))
let groupType = await webEle.findElement(By.className("h-flex_center_x_y"))
jsonArr.push({
url: await driver.executeScript("return arguments[0].attributes['href'].value", aDom), //return join link
imageSrc: await driver.executeScript("return arguments[0].attributes['src'].value", imgDom), //return image src from img dom,
title: await driver.executeScript("return arguments[0].innerText", titleDom), //return group title,
numberOfMember: parseInt(await driver.executeScript("return arguments[0].innerText.split(\"Members (\")[1].split(\")\")[0]", numberOfMemberDom)), //return numberOf member,
public: await driver.executeScript("return arguments[0].innerText", groupType) === "Join Group", //return is a public group,
description: "",
country: "",
category: "",
subCategory: "",
})
resolve()
}))
})
}
Promise.all(allPromise)
.then(res => {
fs.writeFile('example.json', JSON.stringify(jsonArr), 'utf8', function(err) {
if (err) return console.log(err);
});
})
.catch(err => console.log(err))

When the data is obtained, it is packaged into json format and then output to our project.

example results:

[
{
"url": "/group/5fc36bfa7f1d500f69a484be",
"imageSrc": "https://img.mewe.com/api/v2/group/5fc36bfa7f1d500f69a484be/public-image/5fcc51ebda6a0364ec119b82/400x400/img",
"title": "香港人里數/獎賞/旅遊分享區",
"numberOfMember": 816,
"public": false,
"description": "",
"country": "",
"category": "",
"subCategory": ""
},
{
"url": "/group/5fbe3d3ec057695a0a69610a",
"imageSrc": "https://img.mewe.com/api/v2/group/5fbe3d3ec057695a0a69610a/public-image/5fbe3d3e67b8dd74597c2ace/400x400/img",
"title": "香港人@英國💛互助圈",
"numberOfMember": 263,
"public": false,
"description": "",
"country": "",
"category": "",
"subCategory": ""
}
]

Github Link

If you want to contribute, you can go to my github project to share your thoughts😜. You can also clap your hands below to support us.

MoMoWongHK/mewe-scraper

Medium: https://medium.com/@clipversity

Facebook: http://facebook.com/clipversity

Instagram: https://www.instagram.com/clipversity

Youtube: https://www.youtube.com/channel/UCmQ3rCf5O9vnuU7_IBrIR2w

patreon: https://www.patreon.com/clipversity

--

--