Getting Data into Modelshop

Often the first step after creating a new model is importing data. Modelshop supports many ways of getting data into models.

Local Files

Users can easily import files from a local file system.

Support files formats

  • Excel (xls, xlsx, xlsm, xlsb)

  • Delimited Text (csv, tsv, etc.)

  • JSON

  • XML

Data Sources

Modelshop additionally supports extracting data from data sources beyond the files system.

Supported Data Sources

  • Web Services

    • REST

    • SOAP

  • Databases

    • Microsoft SQL Server

    • MySQL

    • Postgres

    • Oracle

    • HBase

    • Hive

Imports using the Import Wizard

The import wizard enables the creation of a new DataList or several new DataLists based on a file import. To access the wizard, select Data > Import Wizard.

Select the file you wish to import, and click on the Open button. The wizard will automatically detect the type of file selected and display the applicable settings.

Text/csv

Importing a text file will create a single new DataList. The wizard provides several options for flexibly handling input data.

  • Datalist Name: The name of the new DataList to be created

  • Target Folder: An optional folder for the new DataList to be placed in

  • File Type: Delimited or Fixed Field Text

  • Delimiter tokens: The delimiter character for delimited files. ie commas for csv, tabs for tsv

  • Quote character: Quote character for delimited files

  • Header Row #: The row in the file where header names can be found. These header names will be used for field names in the newly created DataList.

  • Header Rows: Number of rows for header names including the Header Row #.

  • Footer Rows: Number rows at the end of the file to use as a footer (ignore)

  • Read Ahead Rows: The number of records to observe when attempting to detect data types

  • Read All Rows: Observe all records in the file when attempting to detect data types

Excel

Importing a Excel workbook allows users to create a new DataList for each sheet and/or named range in the workbook. Upon upload, the user can select which of these items to use as the basis of new DataLists. Subsequently, the wizard provides several options that can be applied to each of the selected items, similar to those provided with to CSV files.

  • Datalist Name: The name of the new DataList to be created

  • Target Folder: An optional folder for the new DataList to be placed in

  • File Type: Delimited or Fixed Field Text

  • Delimiter tokens: The delimiter character for delimited files. ie commas for csv, tabs for tsv

  • Quote character: Quote character for delimited files

  • Header Row #: The row in the file where header names can be found. These header names will be used for field names in the newly created DataList.

  • Header Rows: Number of rows for header names including the Header Row #.

  • Footer Rows: Number rows at the end of the file to use as a footer (ignore)

  • Read Ahead Rows: The number of records to observe when attempting to detect data types

  • Read All Rows: Observe all records in the file when attempting to detect data types

Hive


Importing data from a Hive database is an easy way to integrate Modelshop with the Hadoop Ecosystem. To create a new connector to have, take the following steps.

  • From the menu within your model, select Data >> Import Wizard.

  • Select the database option in the dialog box.

  • Select HIVE_2 in the Connection Type.

  • To the right of Connection Type, there is a pencil icon. Click on the icon if you wish to specify a custom driver to use with Hive, such as the following:

com.cloudera.hive.jdbc4.HS2Driver
  • In Server URL, enter the location of the server where the Hive database is running. If it’s running on the same computer as modelshop, enter “localhost”.

  • To the right of Server URL, there is a target icon. Click on the icon if you need to customize the URL that modelshop assembles from the settings in the wizard. For example:

jdbc:hive2:hostname:10000/default;principal=johnsmith@MODELSHOP.COM;auth=kerberos;service_name=hive;service_host=hostname
  • The port for Hive is 10000 by default. You may modify if necessary.

Basic Authentication

  • In Auth Type, select BASIC.

  • Enter a username and password.

Kerberos Authentication

  • In Auth Type, select KERBEROS.

  • Enter a username in Username.

  • Click “Upload Keytab” to upload a key table for the user. Modelshop will place the key file in the correct location on the server, and will configure the properties accordingly.

  • Enter the Kerberos realm in KrbRealm.

  • Enter the fully qualified domain name for the host in KrbHostFQDN.

  • Enter the service name in KrbServiceName.

  • Enter the database name in Database Name.

  • If the database server employs SSL when communicating with the Kerberos server, then select the check box labeled, “Use SSL”. The use of SSL will require setting up a client trustStore. See Security Properties below.