Hive data types in hadoop

HIVE DATA TYPES

In This Article you are going to see different types of data types which are involved in the table creation.
Hive data types are divided into string types,numeric types,misc types and complex types
Numeric types:
TINYINT : 1 byte signed integer from -128 to 127
SMALLINT : 2 byte signed integer from -32,768 to 32,767
INT :4 byte signed integer from -2,147,483,648 to 2,147,483,647
BIGINT : 8 type signed integer from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
FLOAT : 4 type single precision floating point number
DOUBLE : 8 type double precision floating point number
DECIMAL
String Types: STRING
String type data types can be specified using single quotes or double quotes it contains two data types VARCHAR and CHAR.
VARCHAR
CHAR
Misc Types
BOOLEAN
BINARY
Date/Time Types
TIMESTAMP
It supports traditional UNIX timestamp with optional nanosecond precision.it supports java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.ffffff" and format "YYYY-mm-dd hh:mm:ss.fffff". DATE values are described in year/month/day format.
COMPLEX Type
maps: MAP
arrays :ARRAY
structs :STRUCT union: UNIONTYPE
Union is a collection of heterogeneous data types.you can create an instance using create Union

what is hive in hadoop

Hive is a data warehouse infrastructure tool to process structured data in Hadoop.
It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not ⦁ A relational database
⦁ A design for On Line Transaction Processing ( OLTP )
⦁ A language for real-time queries and row-level updates Features of Hive
⦁ It stores schema in a database and processed data into HDFS.
⦁ It is designed for OLAP.
⦁ It provides SQL type language for querying called HiveQL or HQL.
⦁It is familiar, fast, scalable, and extensible

Hadoop top 20 important commands

Hadoop commands
$hadoop version ===> print the version
$hadoop namenode -format ===> format the DFS filesystem
$hadoop secondarynamenode ====> run the DFS secondary namenode
$hadoop namenode =====> run the DFS namenode
$hadoop datanode ======> run a DFS datanode
$hadoop jobtracker ======> run the MapReduce job Tracker node
$hadoop tasktracker ======> run a MapReduce task Tracker node
$hadoop jar =======> run a jar file
$hadoop fs =====> run a generic filesystem user client
$hadoop fs -ls / =====> list of files and directories
$hadoop fs -mkdir /myhadoop ==> to create a directory
$hadoop fs -mkdir /test
$hadoop fs -mkdir /srujan
$hadoop fs -ls /
$hadoop fs -lsr /
$hadoop fs -cp ==> source and destination both are HDFS
$hadoop fs -mv ==> source and destination both are HDFS
$hadoop fs -rm [-skipTrash] ==> to remove the files.
$hadoop fs -rmr -skipTrash ==> ==> to remove the files including sub directories
$hadoop fs [-expunge] ==> to empty recycle bin.
$hadoop fs -put ... to copy from local to HDFS
$hadoop fs -copyFromLocal ... ==> local to HDFS
$hadoop fs -moveFromLocal ... ==> move from local to HDFS
$hadoop fs -get ==> HDFS to local
$hadoop fs -copyToLocal ==> copy from HDFS to local
$hadoop fs -moveToLocal ==> move from HDFS to local
$hadoop fs -cat ==> to display the files as it is.
$hadoop fs -text ==> to convert into text format and display
$hadoop fs -touchz ] ==> to create empty file
$hadoop fs -tail -f ==> to read lines from the end.
$hadoop fs -test [-ezd] ] ==> toi test type of file
$hadoop fs -getmerge [addnl]] ==> to merge the files and copy to local filesystem.
$hadoop fs -chown OWNER:GROUP ==> to change owner
$hadoop fs -chown -R OWNER:GROUP ==> to change owner including contents
$hadoop fs -chgrp GROUP ==> to change the group.
$hadoop fs -chgrp -R GROUP ==> to change the group including contents.
$hadoop fs -help [cmd] ==> help on commands.
$hadoop fs -du ===> disk usage.
$hadoop fs -dus ===> dsik usage sum ( including sub directories and files )
$hadoop fs -count [-q] ===> counting no.of files and directories
$hadoop fs -setrep [-R] [-w] ==> to set replication factor for the file/directory

Hadoop MapReduce program for word replace

Directory structure of WordReplace Program

WordReplaceDriver.java
package com.javatechnical.hadoop.WordReplace;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordReplaceDriver implements Tool{
 
 Configuration conf;

 @Override
 public Configuration getConf() {
  // TODO Auto-generated method stub
  return conf;
 }

 @Override
 public void setConf(Configuration conf) {
  // TODO Auto-generated method stub
  this.conf=conf;
 }

 @Override
 public int run(String[] args) throws Exception {
  
  Job wordReplaceJob = new Job(conf);
  
  wordReplaceJob.setJobName("Word Replace Test");
  
  wordReplaceJob.setJarByClass(this.getClass());
  
  wordReplaceJob.setMapperClass(WordReplaceMapper.class);  
  
  wordReplaceJob.setNumReduceTasks(0);  
  
  
  wordReplaceJob.setMapOutputKeyClass(Text.class);
  
  wordReplaceJob.setMapOutputValueClass(NullWritable.class);
  
  
  wordReplaceJob.setInputFormatClass(TextInputFormat.class);
  
  wordReplaceJob.setOutputFormatClass(TextOutputFormat.class);
  
  Path inputPath = new Path(args[0]);
  
  Path outputPath = new Path(args[1]);
  
  FileInputFormat.addInputPath(wordReplaceJob, inputPath);
  
  FileOutputFormat.setOutputPath(wordReplaceJob, outputPath);
  
  FileSystem fileSystem = outputPath.getFileSystem(conf);
  
  fileSystem.delete(outputPath, true);
  
  int result = wordReplaceJob.waitForCompletion(true)?0:-1; 
  
  return result;
 }
 
 public static void main(String[] args) throws Exception {
  
  Configuration conf = new Configuration();
  
  conf.set("old.word","java");
  
  conf.set("new.word","kava"); 
  
  
  int status = ToolRunner.run(conf, new WordReplaceDriver(), args);
  
  System.out.println("Status : "+status);
  
 }

}
WordReplaceMapper.java
package com.javatechnical.hadoop.WordReplace;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordReplaceMapper extends Mapper {
 
 private final String OLD_WORD="old.word";
 
 private final String NEW_WORD="new.word";
 
 String oldWord;
 String newWord;
 
 @Override
 protected void setup(Context context)
   throws IOException, InterruptedException {
  
  Configuration conf = context.getConfiguration();
  
   oldWord = conf.get(OLD_WORD);
  
   newWord = conf.get(NEW_WORD); 
  
 }

 @Override
 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
  
  
  String line = value.toString();

  if (line.contains(oldWord)) {
   line = line.replaceAll(oldWord, newWord);
  }

  context.write(new Text(line), NullWritable.get());

 }
}
Input file is:
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
output:
linux kava hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance kava c++
linux kava hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance kava c++
linux kava hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance kava c++
linux kava hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance kava c++
linux kava hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance kava c++

MapReduce program for find word length count in hadoop

Directory structure of Hadoop Word Length Count

WordLengthCountMapper.java
package com.javatechnical.hadoop.lengthcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordLengthCountMapper extends Mapper {

 @Override
 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

  
  String line = value.toString();
  
 
  
  String[] words = line.split(" "); 
  
  
  
  for (String word : words) {
   
   context.write(new LongWritable(word.length()), new LongWritable(1));
   
  }
 
  
  
  
  
 }

}
WordLengthCountReducer.java
package com.javatechnical.hadoop.lengthcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class  WordLengthCountReducer extends Reducer {

 @Override
 protected void reduce(LongWritable key, Iterable values, Context context)
   throws IOException, InterruptedException {
  
  long sum=0;
  
  for (LongWritable value : values) {
   
   sum=sum+value.get();
   
  }
  context.write(key, new LongWritable(sum));
  
 }

}
WordLengthCountDriver.java
package com.javatechnical.hadoop.lengthcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class WordLengthCountDriver implements Tool {
 
 Configuration  conf;

 @Override
 public Configuration getConf() {
  // TODO Auto-generated method stub
  return conf;
 }

 @Override
 public void setConf(Configuration conf) {
  this.conf=conf;
  
 }

 @Override
 public int run(String[] args) throws Exception {
  
  Job wordLengthCountJob = new Job(conf);
  
  wordLengthCountJob.setJobName("Word Length CountTest");
  
  wordLengthCountJob.setJarByClass(this.getClass());
  
  wordLengthCountJob.setMapperClass(WordLengthCountMapper.class);
  
  wordLengthCountJob.setReducerClass(WordLengthCountReducer.class);
  
  wordLengthCountJob.setMapOutputKeyClass(LongWritable.class);
  
  wordLengthCountJob.setMapOutputValueClass(LongWritable.class);
  
  wordLengthCountJob.setOutputKeyClass(LongWritable.class);
  
  wordLengthCountJob.setOutputValueClass(LongWritable.class);
  
  wordLengthCountJob.setInputFormatClass(TextInputFormat.class);
  
  wordLengthCountJob.setOutputFormatClass(TextOutputFormat.class);
  
  Path inputPath = new Path(args[0]);
  
  Path outputPath = new Path(args[1]);
  
  
  FileInputFormat.addInputPath(wordLengthCountJob, inputPath);
  
  FileOutputFormat.setOutputPath(wordLengthCountJob, outputPath);
  
  FileSystem fs = outputPath.getFileSystem(conf);
  
  fs.delete(outputPath, true);
  
  
  int result = wordLengthCountJob.waitForCompletion(true)?0:-1;
  
  
  return result;
 }
 
 public static void main(String[] args) throws Exception {
  
  Configuration  conf = new Configuration();
  
 int status = ToolRunner.run(conf, new WordLengthCountDriver(), args);
 
 System.out.println("Gender Count Status "+status);
  
 }

}
Input file is:
linux java unix jsp servlet
hadoop pig sqoop hive hbase
java hadoop linux jsp html
linux sqoop html hive hbase
spring jsp hibernate linux java
sqoop hive servlet hadoop unix
linux java unix servlet
hadoop pig sqoop hive hbase
java hadoop linux jsp html
linux sqoop html hive hbase
output:
3 6
4 17
5 16
6 6
7 3
9 1

word count program with MapReduce in Hadoop

Directory structure for Word count program in Hadoop

WordCountMapper.java
package com.javatechnical.WordCount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper{
 @Override
 protected void map(LongWritable key, Text value,Context context)
   throws IOException, InterruptedException {
  String line=value.toString();
  String [] words=line.split(" ");
  for (String word : words) {
   context.write(new Text(word), new LongWritable(1));
  }
  
 }

}
WordCountReducer.java
package com.javatechnical.WordCount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer {
 @Override
 protected void reduce(Text key, Iterable values,Context context)
   throws IOException, InterruptedException {
  long sum=0;
  for (LongWritable value : values) {
   sum=sum+value.get();
  }
  context.write(key,new LongWritable(sum));
 }

}
WordCountDriver.java
package com.javatechnical.WordCount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCountDriver implements Tool {
 Configuration conf;

 @Override
 public Configuration getConf() {
  // TODO Auto-generated method stub
  return conf;
 }

 @Override
 public void setConf(Configuration conf) {
  this.conf=conf;
  
 }

 @Override
 public int run(String[] args) throws Exception {
  Job wordCuntJob=new Job();
  wordCuntJob.setJobName("Word Count Job");
  wordCuntJob.setJarByClass(getClass());
  wordCuntJob.setMapperClass(WordCountMapper.class);
  wordCuntJob.setReducerClass(WordCountReducer.class);
  wordCuntJob.setMapOutputKeyClass(Text.class);
  wordCuntJob.setMapOutputValueClass(LongWritable.class);
  wordCuntJob.setOutputKeyClass(Text.class);
  wordCuntJob.setOutputValueClass(LongWritable.class);
  wordCuntJob.setInputFormatClass(TextInputFormat.class);
  wordCuntJob.setOutputFormatClass(TextOutputFormat.class);
  Path inputPath=new Path(args[0]);
  Path outputPath=new Path(args[1]);
  FileInputFormat.addInputPath(wordCuntJob, inputPath);
  FileOutputFormat.setOutputPath(wordCuntJob, outputPath);
  FileSystem fs =outputPath.getFileSystem(conf);
  fs.delete(outputPath, true);
  int result=wordCuntJob.waitForCompletion(true)?0:-1;
  return result;
 }
public static void main(String[] args)throws Exception {
 Configuration conf=new Configuration();
 int status=ToolRunner.run(conf, new WordCountDriver(), args);
 System.out.println(status);
}
}
Input file is :
linux java hadoop dba123 sravan gvaspi
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
output:
abstract 5
c++ 5
class 5
dba 4
dba123 1
gvaspi 1
hadoop 5
hibernet 5
implements 10
inheritance 5
interface 5
java 10
linux 5
mysql 5
sql 5
sqoop123 5
sravan 1
string 5
unix 5

MapReduce grep program in Hadoop

project structure of Hadoop grep is:

GrepMapper.java
package com.javatechnical.hadoop.grep;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class GrepMapper extends Mapper {

 @Override
 protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
  
  
 String line = value.toString();
  
 if(line.contains("java"))
 {
  context.write(new Text(line), NullWritable.get());
 }
  
  
 }
}
GrepDriver.java
package com.javatechnical.hadoop.grep;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class GrepDriver implements Tool{
 
 Configuration conf;

 @Override
 public Configuration getConf() {
  // TODO Auto-generated method stub
  return conf;
 }

 @Override
 public void setConf(Configuration conf) {
  // TODO Auto-generated method stub
  this.conf=conf;
 }

 @Override
 public int run(String[] args) throws Exception {
  
  Job genderCountJob = new Job(conf);
  
  genderCountJob.setJobName("Gender Count");
  genderCountJob.setJarByClass(this.getClass());
  
 // genderCountJob.setMapperClass(GrepMapper.class);
  
  
  genderCountJob.setNumReduceTasks(0);  
  
  
  //genderCountJob.setMapOutputKeyClass(Text.class);
  
  //genderCountJob.setMapOutputValueClass(NullWritable.class);
  
  
  genderCountJob.setInputFormatClass(TextInputFormat.class);
  
  genderCountJob.setOutputFormatClass(TextOutputFormat.class);
  
  Path inputPath = new Path(args[0]);
  
  Path outputPath = new Path(args[1]);
  
  FileInputFormat.addInputPath(genderCountJob, inputPath);
  
  FileOutputFormat.setOutputPath(genderCountJob, outputPath);
  
  FileSystem fileSystem = outputPath.getFileSystem(conf);
  
  fileSystem.delete(outputPath, true);
  
  int result = genderCountJob.waitForCompletion(true)?0:-1; 
  
  return result;
 }
 
 public static void main(String[] args) throws Exception {
  
  int status = ToolRunner.run(new Configuration(), new GrepDriver(), args);
  
  System.out.println("Status : "+status);
  
 }

}
Input file ia:
linux java hadoop dba123 sravan gvaspi
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
linux java hadoop dba
unix sqoop123 sql
mysql hibernet string class
abstract interface implements
implements inheritance java c++
Output:
linux java hadoop dba123 sravan gvaspi
implements inheritance java c++
linux java hadoop dba
implements inheritance java c++
linux java hadoop dba
implements inheritance java c++
linux java hadoop dba
implements inheritance java c++
linux java hadoop dba
implements inheritance java c++