Home / CSE MCQs / Hadoop MCQs :: Hadoop Pig

CSE MCQs :: Hadoop MCQs

  1. Which of the following function is used to read data in PIG ?
  2. A.
    WRITE
    B.
    READ
    C.
    LOAD
    D.
    None of the mentioned

  3. You can run Pig in interactive mode using the ______ shell.
  4. A.
    Grunt
    B.
    FS
    C.
    HDFS
    D.
    None of the mentioned

  5. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
  6. A.
    Pig Stats
    B.
    PStatistics
    C.
    Pig Statistics
    D.
    None of the mentioned

  7. The ________ class mimics the behavior of the Main class but gives users a statistics object back.
  8. A.
    PigRun
    B.
    PigRunner
    C.
    RunnerPig
    D.
    None of the mentioned

  9. ___________ return a list of hdfs files to ship to distributed cache.
  10. A.
    relativeToAbsolutePath()
    B.
    setUdfContextSignature()
    C.
    getCacheFiles()
    D.
    getShipFiles()

  11. The loader should use ______ method to communicate the load information to the underlying InputFormat.
  12. A.
    relativeToAbsolutePath()
    B.
    setUdfContextSignature()
    C.
    getCacheFiles()
    D.
    setLocation()

  13. Which of the following command can be used for debugging ?
  14. A.
    exec
    B.
    execute
    C.
    error
    D.
    throw

  15. Which of the following file contains user defined functions (UDFs) ?
  16. A.
    script2-local.pig
    B.
    pig.jar
    C.
    tutorial.jar
    D.
    excite.log.bz2

  17. Which of the following script is used to check scripts that have failed jobs ?
  18. A.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);

    b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as 


    job;

    c = filter b by status != 'SUCCESS';

    dump c;

    B.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);

    b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;

    c = group b by (id, user, script_name) parallel 10;

    d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;

    e = filter d by max_reduces == 1;

    dump e; 

    C.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);

    b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;

    c = group b by (id, user, queue) parallel 10;

    d = foreach c generate group.user, group.queue, COUNT(b);

    dump d;

    D.
    None of the mentioned

  19. Which of the following code is used to find scripts that use only the default parallelism ?
  20. A.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);


    b = foreach a generate (Chararray) j#'STATUS' as status, j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, j#'JOBID' as 


    job;


    c = filter b by status != 'SUCCESS';


    dump c;

    B.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);

    b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) r#'NUMBER_REDUCES' as reduces;

    c = group b by (id, user, script_name) parallel 10;

    d = foreach c generate group.user, group.script_name, MAX(b.reduces) as max_reduces;

    e = filter d by max_reduces == 1;

    dump e; 

    C.

    a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]);

    b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'QUEUE_NAME' as queue;

    c = group b by (id, user, queue) parallel 10;

    d = foreach c generate group.user, group.queue, COUNT(b);

    dump d;

    D.
    None of the mentioned