欢迎光临
我们一直在努力

pig的基本操作介绍

本篇内容介绍了“pig的基本操作介绍”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!

pig是什么?

我的理解是: pig就相当于 shell ,  hadoop就相当于linux  (所以我尽可能的会使用pig操作hadoop的文件)

1.进入HADOOP_HOME目录。
2.执行sh bin/hadoop
我们可以看到更多命令的说明信息:
Usage: hadoop [–config confdir] COMMAND
where COMMAND is one of:
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  fsck                 run a DFS filesystem checking utility
  fs                   run a generic filesystem user client
  balancer             run a cluster balancing utility
  jobtracker           run the MapReduce job Tracker node
  pipes                run a Pipes job
  tasktracker          run a MapReduce task Tracker node
  job                  manipulate MapReduce jobs
  queue                get information regarding JobQueues
  version              print the version
  jar <jar>            run a jar file
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME <src>* <dest> create a hadoop archive
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

常用pig命令

ls/ pwd/ cd 

例如: 查看文件大小

grunt> fs -du -h -s 文件名
19.4 G  文件名

grunt> help

Commands:
<pig latin statement>; – See the PigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
    fs <fs arguments> – Equivalent to Hadoop dfs command: http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
    describe <alias>[::<alias] – Show the schema for the alias. Inner aliases can be described as A::B.
    explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml] [-param <param_name>=<param_value>]
        [-param_file <file_name>] [<alias>] – Show the execution plan to compute the alias or for entire script.
        -script – Explain the entire script.
        -out – Store the output into directory rather than print to stdout.
        -brief – Don't expand nested plans (presenting a smaller graph for overview).
        -dot – Generate the output in .dot format. Default is text format.
        -xml – Generate the output in .xml format. Default is text format.
        -param <param_name – See parameter substitution for details.
        -param_file <file_name> – See parameter substitution for details.
        alias – Alias to explain.
    dump <alias> – Compute the alias and writes the results to stdout.
Utility Commands:
    exec [-param <param_name>=param_value] [-param_file <file_name>] <script> – 
        Execute the script with access to grunt environment including aliases.
        -param <param_name – See parameter substitution for details.
        -param_file <file_name> – See parameter substitution for details.
        script – Script to be executed.
    run [-param <param_name>=param_value] [-param_file <file_name>] <script> – 
        Execute the script with access to grunt environment. 
        -param <param_name – See parameter substitution for details.
        -param_file <file_name> – See parameter substitution for details.
        script – Script to be executed.
    sh  <shell command> – Invoke a shell command.
    kill <job_id> – Kill the hadoop job specified by the hadoop job id.
    set <key> <value> – Provide execution parameters to Pig. Keys and values are case sensitive.
        The following keys are supported: 
        default_parallel – Script-level reduce parallelism. Basic input size heuristics used by default.
        debug – Set debug on or off. Default is off.
        job.name – Single-quoted name for jobs. Default is PigLatin:<script name>
        job.priority – Priority for jobs. Values: very_low, low, normal, high, very_high. Default is normal
        stream.skippath – String that contains the path. This is used by streaming.
        any hadoop property.
    help – Display this message.
    history [-n] – Display the list statements in cache.
        -n Hide line numbers. 
    quit – Quit the grunt shell.

“pig的基本操作介绍”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注云搜网网站,小编将为大家输出更多高质量的实用文章!

赞(0)
【声明】:本博客不参与任何交易,也非中介,仅记录个人感兴趣的主机测评结果和优惠活动,内容均不作直接、间接、法定、约定的保证。访问本博客请务必遵守有关互联网的相关法律、规定与规则。一旦您访问本博客,即表示您已经知晓并接受了此声明通告。