Tuesday 3 May 2016

Hadoop Datasets for Practising

To practise Hadoop you can use below ways to generate the big data (GB),So that you can get the real feel/power of the Hadoop.

1. clearbits.net


From clearbits.net, you can get quarterly full data set of stack exchange so that you can use it while you are practising the hadoop . it contains around 10 GB data.

2. grouplens.org

      grouplens.org collected different rating data sets ,you can use it for practicing the hadoop.

If you have Hadoop installed on your machine,you can use the following two ways to generate data.

3. hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomwriter /random-data


  generates 10 GB data per node under folder /random-data in HDFS.
                 

4. hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar randomtextwriter /random-text-data

 generates 10 GB textual data per node under folder /random-text-data in HDFS.

path of hadoop-examples.jar may change as per your hadoop installation.

5. Amazon provides so many data sets ,you can use them.

6. Check answers of the same question on stackoverflow

7.From University of Waikato ,many data sets available for practicing machine learning.

8.See answers for the similar question on Quora.

Hope this may be useful for you. 
1. Airline Dataset Project 
http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html
 2. GB's of data on Airlines
 https://github.com/0xdata/h2o/wiki/Hacking-Airline-DataSet-with-H2O
 3. SFO - Airline Sample data 
http://www.flysfo.com/web/page/about/news/pressres/airtrafficdata.html 
4. Data Storage Online.
 http://datahub.io/en/ 
5. Lots of Gov Data http://data.gov.uk/ 
6. http://blog.gopivotal.com/news-2/20-examples-of-getting-results-with-big-data
7. US Weather Data - 1990 to 2013 complete data. http://ftp3.ncdc.noaa.gov/pub/data/noaa/


If you know any free data sets ,please share in comments

No comments:

Post a Comment