Hands-On Big Data Modeling
上QQ阅读APP看书,第一时间看更新

Spark SQL

Spark SQL allows for querying structured and semi-structured data inside the Spark program, by using SQL or DataFrame APIs. DataFrames are similar to tables in a relational database. Spark SQL can be embedded into the general programs of native Spark and MLlib, in order to enable interactability between different Spark modules.

Spark SQL provides DataFrame abstractions in different programming languages, such as Python, Java, and Scala, in order to work with structured datasets. It can also read and write data in various structured formats, including JSON, Hive Tables, and Parquet. In addition to that, Spark SQL allows for querying the data by using SQL inside of the Spark program, or by using external tools, for example, connecting to Spark SQL using standard database connectors (JDBC/ODBC).