project cover image project mobile cover image

Developing a massive scraper where users could search more than 200 million people

Wolfy lets users quickly find contact information from targeted prospects. Users just needed to enter their ideal customer profile, and they received verified emails to feed their outbound email campaign. JP and Santiago were working together in a prestigious software agency and decided to look for their own adventure. They started building Wolfy simultaneously with Eagerworks, with the purpose of walking the same path as other technological entrepreneurs. They believed that having this experience as product founders would make them more empathetic with their customers, and therefore, a better agency.
project summary image
review author image
“With Wolfy, we really understood how creating a product is much more than just writing code. Thanks to having launched our own tech startup, we are a better agency today.”

Juan Pablo Balarini, CTO

project feature icon image
Team

3 Developers

project feature icon image
Services

Inception - Evolve

project feature icon image
Architecture

Massive scraping with scalable search engine

project feature icon image
Tech

Ruby on Rails & Python

Challenge

The idea behind Wolfy was to use all the public data that’s already available on the internet to create a search engine where people could find information about companies and their employees.
One of Wolfy’s main technical challenges was to create a massive scraper that could process all the information we needed. This needed to be able to run while being able to run periodically in a cost-efficient way, in order to keep Wolfy’s information up-to-date.
Another important requirement was to find a person’s work email address, given their name and the company where they worked, while being polite to email service providers.

We also faced the challenge of finding product/market fit. Wolfy targeted the Latin American market and what we found was that in order for companies to use a product like Wolfy, most of the companies needed to change at least some part of their sales pipeline. They were using traditional tools like buying outdated databases and making phone calls, instead of using more modern approaches like cold emailing. This made the sales process difficult because we first had to show customers that they could get better results by introducing new tools/processes to their sales pipeline.

Solution

We designed and implemented a web application where companies could apply advanced filters to find their target prospects. This was powered by a massive scraper and search engine that could handle complex, real-time queries.
In order to obtain all the information we needed, we had to create a massive parallel scraping architecture. The project used Python and Scrapy for the scraper part, since it allows for quick iterations and has a great community behind it.

The backend was implemented in Ruby on Rails, which served as an API for our frontend. For the frontend we used Angular, since as a Single Page Application fitted perfectly our use case: less than 10 different screens with lots of expected user interaction on/between them. In retrospect, it was a great choice since the web application performed almost as fast as a desktop application, without any lag/delay between page changes. The code turned out to be easily maintainable since everything was divided into not-so-big components.
In order to handle the amount of data that Wolfy had to query and to improve speed, we had to design a sharded database using PostgreSQL. Each shard was responsible for handling data related to one specific country.
In order to improve scraping speed, we used a pool of thousands of IPs that were rotated between scrapers using proxies.

Outcome

We launched a platform where users could search more than 200 million people and their work emails, using data publicly obtained from the internet.
Wolfy allowed users to apply simple filters to search over millions of data points to find their target audience. A massive data pipeline was designed and implemented to scrape millions of web pages periodically, to maintain Wolfy’s information up to date. Our marketing strategy was to offer users a free trial and after trying our product, offer subscription to a recurring plan using their credit card.
It was our first time raising capital, and we learned when to raise private funding vs public capital, how to make a good and effective pitch, and what’s important to an investor. Now we can transmit our experience to entrepreneurs and startups.
Another enriching experience was defining our business model, choosing between B2C or B2B. We went for the B2B model after concluding it would be the most efficient way to scale. This experience is key to helping entrepreneurs conclude which option is most suitable for their project.
Wolfy was one of the first products that we crafted at Eagerworks, where we took an idea, and we created a product that customers were paying for. We learned a lot of lessons that up until this day, we are happy and proud to apply them on a daily basis to our customers’ products.

project side image

Let's start our journey together

CONTACT US