nys_parole_scraper.nys_parole_scraper

@author: khayes

Module Contents

Functions

parole_scraper(file_path, directory)

Scrapes NYS parole information from the NYS DOCCS parolee lookup website

nys_parole_scraper.nys_parole_scraper.parole_scraper(file_path, directory)[source]

Scrapes NYS parole information from the NYS DOCCS parolee lookup website based on user inputs and returns a clean dataset and summary statistics.

Parameters
  • file_path (String) – A string representing the path of the CSV file that includes the identifying information of the individuals to be searched for with the scraper. See Readme for more information of the input csv file construction.

  • directory (String) – A string representing the folder path where the user would like an output folder created, in which the final dataset and summary statistics will be exported.

Returns

  • To return the following python objects as DataFrames, assign the function to

  • two variables (ex (df1, df2 = parole_scraper(file_path, directory)). To only export)

  • the dataframes to the provided directory, you do not need to assign the

  • function to variables (ex (parole_scraper(file_path, directory)))

  • full_output (pandas DataFrame, CSV) – The concatenated parole information of all provided individuals found in the DOCCS parole database. The returned dataframe will take on whatever name you assign first to the the function. Example: df1, df2 = parole_scraper(file_path, directory)

    full_output will be returned as df1

    full_putput will also be exported as a csv to an output folder in the directory provided by the directory parameter.

  • stats_list (List, Excel) – A list of the Following DataFrames, returned as the second varible assigned to the function call. All summary statistics and frequency table objects will be exported in separate sheets of an Excel file in the same output folder.

    stats_numericpandas DataFrame

    Summary statistics on age; length of time between release to parole and current date in months; and number of convictions

    race_freqpandas DataFrame

    Frequency table of race/ethncity

    age_freqpandas DataFrame

    Frequency table of age groups

    pstatus_freqpandas DataFrame

    Frequency table of current parole status

    county_unique_freqpandas DataFrame

    Frequency table of unique individuals with convictions in each county

    top_charge_freqpandas DataFrame

    Frequency table of top charges

    convictions_freqpandas DataFrame

    Frequency table of unique individuals per charge type (every conviction, not only top charge)

Example

Returning DataFrames as objects: >>> from parole_scrpaer_MDS import parole_scraper >>> file_path = “C:/Users/parole_scraping.csv” >>> directory = “C:/Users/Output_Folder” >>> df1, df_list = parole_scraper(file_path, directory) df1 = full_output df_list = stats_list full_output exported as CSV stats_list exported as Excel

Without returning DataFrames as objects: >>> from parole_scrpaer_MDS import parole_scraper >>> file_path = “C:/Users/parole_scraping.csv” >>> directory = “C:/Users/Output_Folder” >>> parole_scraper(file_path, directory) full_output exported as CSV stats_list exported as Excel