欢迎来到 - 花影美文网 !    
当前位置: 首页 > qq日志 > 经典日志 >

Hadoop经典案例Spark实现(七)

时间:2017-12-29 00:34 点击:
Hadoop经典案例Spark实现(七)——日志分析:分析非结构化文件

Hadoop经典案例Spark实现(一)——通过采集的气象数据分析每年的最高温度
Hadoop经典案例Spark实现(二)——数据去重问题
Hadoop经典案例Spark实现(三)——数据排序
Hadoop经典案例Spark实现(四)——平均成绩
Hadoop经典案例Spark实现(五)——求最大最小值问题
Hadoop经典案例Spark实现(六)——求最大的K个值并排序
Hadoop经典案例Spark实现(七)——日志分析:分析非结构化文件


Hadoop经典案例Spark实现(七)——日志分析:分析非结构化文件

1、需求:根据tomcat日志计算url访问了情况,具体的url如下,
要求:区别统计GET和POST URL访问量
结果为:访问方式、URL、访问量

测试数据集:

196.168.2.1 - - [03/Jul/2014:23:36:38 +0800] "GET /course/detail/3.htm HTTP/1.0" 200 38435 0.038 182.131.89.195 - - [03/Jul/2014:23:37:43 +0800] "GET /html/notes/20140617/888.html HTTP/1.0" 301 - 0.000 196.168.2.1 - - [03/Jul/2014:23:38:27 +0800] "POST /service/notes/addViewTimes_23.htm HTTP/1.0" 200 2 0.003 196.168.2.1 - - [03/Jul/2014:23:39:03 +0800] "GET /html/notes/20140617/779.html HTTP/1.0" 200 69539 0.046 196.168.2.1 - - [03/Jul/2014:23:43:00 +0800] "GET /html/notes/20140318/24.html HTTP/1.0" 200 67171 0.049 196.168.2.1 - - [03/Jul/2014:23:43:59 +0800] "POST /service/notes/addViewTimes_779.htm HTTP/1.0" 200 1 0.003 196.168.2.1 - - [03/Jul/2014:23:45:51 +0800] "GET /html/notes/20140617/888.html HTTP/1.0" 200 70044 0.060 196.168.2.1 - - [03/Jul/2014:23:46:17 +0800] "GET /course/list/73.htm HTTP/1.0" 200 12125 0.010 196.168.2.1 - - [03/Jul/2014:23:46:58 +0800] "GET /html/notes/20140609/542.html HTTP/1.0" 200 94971 0.077 196.168.2.1 - - [03/Jul/2014:23:48:31 +0800] "POST /service/notes/addViewTimes_24.htm HTTP/1.0" 200 2 0.003 196.168.2.1 - - [03/Jul/2014:23:48:34 +0800] "POST /service/notes/addViewTimes_542.htm HTTP/1.0" 200 2 0.003 196.168.2.1 - - [03/Jul/2014:23:49:31 +0800] "GET /notes/index-top-3.htm HTTP/1.0" 200 53494 0.041 196.168.2.1 - - [03/Jul/2014:23:50:55 +0800] "GET /html/notes/20140609/544.html HTTP/1.0" 200 183694 0.076 196.168.2.1 - - [03/Jul/2014:23:53:32 +0800] "POST /service/notes/addViewTimes_544.htm HTTP/1.0" 200 2 0.004 196.168.2.1 - - [03/Jul/2014:23:54:53 +0800] "GET /service/notes/addViewTimes_900.htm HTTP/1.0" 200 151770 0.054 196.168.2.1 - - [03/Jul/2014:23:57:42 +0800] "GET /html/notes/20140620/872.html HTTP/1.0" 200 52373 0.034 196.168.2.1 - - [03/Jul/2014:23:58:17 +0800] "POST /service/notes/addViewTimes_900.htm HTTP/1.0" 200 2 0.003 196.168.2.1 - - [03/Jul/2014:23:58:51 +0800] "GET /html/notes/20140617/888.html HTTP/1.0" 200 70044 0.057 186.76.76.76 - - [03/Jul/2014:23:48:34 +0800] "POST /service/notes/addViewTimes_542.htm HTTP/1.0" 200 2 0.003 186.76.76.76 - - [03/Jul/2014:23:46:17 +0800] "GET /course/list/73.htm HTTP/1.0" 200 12125 0.010 8.8.8.8 - - [03/Jul/2014:23:46:58 +0800] "GET /html/notes/20140609/542.html HTTP/1.0" 200 94971 0.077


由于Tomcat日志是不规则的,需要先过滤清洗数据。


2、Hadoop之MapReduce实现:

Map类

import java.io.IOException; import javax.naming.spi.DirStateFactory.Result; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class LogMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private IntWritable val = new IntWritable(1); @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String line = value.toString().trim(); String tmp = handlerLog(line); if(tmp.length()>0){ context.write(new Text(tmp), val); } } //127.0.0.1 - - [03/Jul/2014:23:36:38 +0800] "GET /course/detail/3.htm HTTP/1.0" 200 38435 0.038 private String handlerLog(String line){ String result = ""; try{ if(line.length()>20){ if(line.indexOf("GET")>0){ result = line.substring(line.indexOf("GET"), line.indexOf("HTTP/1.0")).trim(); }else if(line.indexOf("POST")>0){ result = line.substring(line.indexOf("POST"), line.indexOf("HTTP/1.0")).trim(); } } }catch (Exception e) { System.out.println(line); } return result; } public static void main(String[] args){ String line = "127.0.0.1 - - [03/Jul/2014:23:36:38 +0800] \"GET /course/detail/3.htm HTTP/1.0\" 200 38435 0.038"; System.out.println(new LogMapper().handlerLog(line)); } }
Reduce类 import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class LogReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int sum = 0; for(IntWritable val : values){ sum += val.get(); } context.write(key, new IntWritable(sum)); } }
启动类 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class JobMain { /** * @param args */ public static void main(String[] args)throws Exception { Configuration configuration = new Configuration(); Job job = new Job(configuration,"log_job"); job.setJarByClass(JobMain.class); job.setMapperClass(LogMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(LogReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); Path path = new Path(args[1]); FileSystem fs = FileSystem.get(configuration); if(fs.exists(path)){ fs.delete(path, true); } FileOutputFormat.setOutputPath(job, path); System.exit(job.waitForCompletion(true)?0:1); } }

数据统计中,请稍等!
顶一下
(0)
0%
踩一下
(0)
0%
------分隔线----------------------------
大发体育 必赢亚洲 365体育 伟德国际 二八杠游戏 优德娱乐 开心8 博体网 买球 沙巴体育 明升国际 世爵娱乐 全讯网新2 幸运28论坛 365论坛 优博 bbin平台 永盈会 新2 任你博 澳门彩票有限公司 中国赌球 沙龙国际 188比分 新2网址 赌大小 全讯网论坛 88娱乐 世爵平台 现金二八杠 澳门游戏 滚球 金鹰娱乐 二八杠 俄罗斯转盘 ag平台捕鱼王 幸运28技巧 现金牛牛 日博娱乐 pt派通娱乐 新2网址IP 888真人网址 大集汇娱乐 八大胜 必兆娱乐 nba即时比分 金博宝 蓝盾在线下载 体育开户 迅盈比分 宝利棋牌 银河天地娱乐 赌博游戏大厅 电子游艺777娱乐 bwin 888真人信誉 888娱乐网址 多宝平台 足球外围 博狗体育 沙龙365 明升88 多宝娱乐平台 bbin电子 威尼斯人开户 外围足球 娱乐真人 365论坛 二八杠玩法 赌博技术 真钱 必博 鼎盛娱乐 金博宝 盘球网 老挝磨丁赌场 优博国际 华人娱乐论坛 现金娱乐平台 金沙讲坛 豪博娱乐 凤凰全讯网 鸿利在线 真钱游戏下载