Why Livy?

Monday, August 13, 2018

What is Livy?

Livy is a REST API support open interface for interacting with Spark Cluster, or a REST API that gives remote access to Apache Spark cluster and helps in job submission to the Spark Cluster. In more common words, Livy provides access for remote users to their Spark Cluster. It gives easy interaction and management of SparkContext and SparkSession. Livy creates this entry point for spark application so that user doesn't have to create it.

What Livy Offers

Livy is an Apache certified project, which means it fulfills every requirement of the spark cluster and has many extra features. ● Apache Livy can have long-running Spark Contexts that can be used for multiple Spark jobs, by multiple clients ● Supports sharing cached RDDs or Dataframes across multiple jobs and clients ● Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency ● Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API ● Ensure security via secure authenticated communication ● Apache-licensed, 100% open source

Operation mode of Apache Livy:

Apache Livy provides two ways of interface with spark cluster. ● Interactive Mode: provides spark-shell, pyspark, and sparkR kind of environment. ●Batch Mode: provides spark-submit type environment, to submit a Spark application to cluster without interaction during run-time. A user can select whichever mode he wants to use in Apache Livy. Launching spark jobs with Livy can be done with Curl or Python Request module. And if users want, Jupyter notebook can also be set as a working environment, which will give the ability to users to launch jobs with SQL, Spark, pyspark and sparkR. In this write up I will be launching job with curl commands. Interactive Mode: In interactive mode, the user creates a context once and later uses it to perform jobs or tasks. This mode is similar to spark-shell or pyspark, where we get a development environment to write statements for different jobs. Before creating a session, the Livy server must be up and running. To start livy server use the command given below. $LIVY_HOME/bin/livy-server Now launch a spark Interactive mode with the curl command: curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions Here, data can hold many parameters to specify context type and properties. User impersonation can be made by passing the proxyuser parameter and executor memory. The user can specify the number of cores and session name. Request Body for Livy interactive Cell Mode:

Name	Description	Type
kind	The session kind	Session kind
proxyUser	User to impersonate when starting the session	string
jars	Jars to be used in this session	List of string
pyFiles	Python files to be used in this session	List of string
files	Files to be used in this session	List of string
driverMemory	Amount of memory to use for the driver process	string
driverCores	Number of core for the process	int
numExecutors	Number of executors for process	Int
archive	Archive to use in session	List of string
queue	Name of yarn queue in which job to be submitted	string
Name	Name of the session	string
conf	Spark configuration properties to be used	Key and values
heartbeatTimeoutInSecond	Time out in seconds	int

● To check running sessions use the below curl command: curl localhost:8998/sessions | python -m json.tool ● To perform a task with a session use: curl localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"1 + 1"}' ● To check your code statement result, you can use the below curl command: curl localhost:8998/sessions/0/statements/0 | python -m json.tool Output: { "id": 0, "code": "1 + 1", "state": "available", "output": { "status": "ok", "execution_count": 0, "data": { "text/plain": "2" } }, "progress": 1.0 } Statement result can be seen in the json Output>data. The user can delete a session after completing his requirement. curl localhost:8998/sessions/0 -X DELETE Interactive mode works like a normal shell where all the defined variables will be available to you, until your session is alive. Batch Mode: Batch mode works as a spark-submit where we can submit our application with configuration parameter and application files. In batch mode user can submit a jar, or a .py file to spark cluster with Livy-server. ● To submit a jar file: curl -X POST --data '{"file": "pathToJar/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H "Content-Type: application/json" localhost:8998/batches ● To submit a py file: curl -X POST --data '{"file": "path/codeFile.py"}' -H "Content-Type: application/json" localhost:8998/batches ● To check batch result: curl localhost:8998/batches/0/log | python -m json.tool The data parameter can hold many key-value pair in json which specify the job, like number of cores, dependent jars, configuration properties and many more which are mentioned below. Request Body for Livy batch mode:

Name	Description	Type
file	File containing the application to execute	Path (required)
proxyUser	User to impersonate when running the job	string
className	Application Java/Spark main class	string
args	Command line arguments for the application	List of strings
jars	jars to be used in this session	List of strings
pyFiles	Python files to be used in this session	List of strings
files	files to be used in this session	List of strings
driverMemory	Amount of memory to use for the driver process	string
driverCores	Number of cores to use for the driver process	int
executorMemory	Amount of memory to use per executor process	string
executorCores	Number of cores to use for each executor	int
numExecutors	Number of executors to launch for this session	int
archives	Archives to be used in this session	List of string
queue	The name of the YARN queue in which job to be submitted	string
name	name of this session	string
conf	Spark configuration properties	Map of key=val

What makes Apache Livy unique?

Apache Livy comes with many extraordinary features which make Livy unique from other REST APIs. ● Livy supports Interactive Scala, Python, and R shells ● Job can be submitted in Batch mode using coding languages like Scala, Java, and Python. ● Multiple users can share the same server and everyone can submit jobs and monitor it themselves (impersonation support). ● Can be used for submitting jobs from anywhere with REST ● No need for code modification. ● Livy supports every version of spark (supports spark 1.x and 2.x versions). Hence no version mismatch problem like in other REST APIs. ● Jupyter notebook or any notebook can be used as IDE with Livy and that can support pyspark, spark, sparkR languages. Now we know Livy comes with great features too. Livy supports user impersonation and is compatible with Apache ranger which can secure your cluster from anonymous user hacking and stealing your data. There will be always many REST API servers in Big Data system, but there will be only few which give you what you want and what you need. Livy is one of them!!!

No items found.

Download more info

Cloud Services

HCM Cloud

ERP Cloud

CX Cloud

Oracle Cloud Extension

Oracle Cloud Infrastructure(OCI)

Oracle Integration Cloud (OIC)

EPM Cloud

Managed Services

Lift and Shift

ERP Audit

Grants Management

Supply Chain Management

On-Premise Services

PeopleSoft

JD Edwards

E-Business

Lift and Shift

Managed Services

Implementation / Upgrades

Enhancements

Reporting and Compliance

ERP Audit

Specialized Services

BI and Analytics

Big Data

Digital Services

Application Development & Maintenance

Quality Assurance Testing

Infrastructure Management Services

Database and Middleware Management

Solutions

Smart Onboarding

Employee Off-Boarding

E-Verify

Form I-9

Security, Compliance, and SoD

ePar

Talent Procurement

Supplier Diversity Reporting

Industries' Expertise

Diversified and Higher Education

Financial and Insurance

Govern­ment and Public Sector

Healthcare and Life Sciences

High Tech and FinTech

Industrial Manufa­ctu­ring

Media, Entertainment, and Tele­com­mu­nica­tions

Professional Services and Construction & Engineering

Retail, Wholesale Distribution, and Consumer Packaged Goods

Travel and Transporta­tion and Logistics

Utilities

Company

About Us

Executive Team

Client Reference Videos

Testimonials

Partners

Careers

Smart ERP Solutions Advisory Board

Why Livy?

What is Livy?

What Livy Offers

Operation mode of Apache Livy:

What makes Apache Livy unique?

Government and Public Sector

Industrial Manufacturing

Media, Entertainment, and Telecommunications

Travel and Transportation and Logistics