Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet.
Parse Petabytes of data from CommonCrawl in seconds
CommonCrawl is a non-profit organization that crawls millions of websites every month and stores all the data on Amazon S3. We'll take a look at how we can use the power of Amazon Athena to get all the URLS of all the websites that have been crawled by CommonCrawl.