nys_parole_scraper.nys_parole_scraper
@author: khayes
Module Contents
Functions
|
Scrapes NYS parole information from the NYS DOCCS parolee lookup website |
- nys_parole_scraper.nys_parole_scraper.parole_scraper(file_path, directory)[source]
Scrapes NYS parole information from the NYS DOCCS parolee lookup website based on user inputs and returns a clean dataset and summary statistics.
- Parameters
file_path (String) – A string representing the path of the CSV file that includes the identifying information of the individuals to be searched for with the scraper. See Readme for more information of the input csv file construction.
directory (String) – A string representing the folder path where the user would like an output folder created, in which the final dataset and summary statistics will be exported.
- Returns
To return the following python objects as DataFrames, assign the function to
two variables (ex (df1, df2 = parole_scraper(file_path, directory)). To only export)
the dataframes to the provided directory, you do not need to assign the
function to variables (ex (parole_scraper(file_path, directory)))
full_output (pandas DataFrame, CSV) – The concatenated parole information of all provided individuals found in the DOCCS parole database. The returned dataframe will take on whatever name you assign first to the the function. Example: df1, df2 = parole_scraper(file_path, directory)
full_output will be returned as df1
full_putput will also be exported as a csv to an output folder in the directory provided by the directory parameter.
stats_list (List, Excel) – A list of the Following DataFrames, returned as the second varible assigned to the function call. All summary statistics and frequency table objects will be exported in separate sheets of an Excel file in the same output folder.
- stats_numericpandas DataFrame
Summary statistics on age; length of time between release to parole and current date in months; and number of convictions
- race_freqpandas DataFrame
Frequency table of race/ethncity
- age_freqpandas DataFrame
Frequency table of age groups
- pstatus_freqpandas DataFrame
Frequency table of current parole status
- county_unique_freqpandas DataFrame
Frequency table of unique individuals with convictions in each county
- top_charge_freqpandas DataFrame
Frequency table of top charges
- convictions_freqpandas DataFrame
Frequency table of unique individuals per charge type (every conviction, not only top charge)
Example
Returning DataFrames as objects: >>> from parole_scrpaer_MDS import parole_scraper >>> file_path = “C:/Users/parole_scraping.csv” >>> directory = “C:/Users/Output_Folder” >>> df1, df_list = parole_scraper(file_path, directory) df1 = full_output df_list = stats_list full_output exported as CSV stats_list exported as Excel
Without returning DataFrames as objects: >>> from parole_scrpaer_MDS import parole_scraper >>> file_path = “C:/Users/parole_scraping.csv” >>> directory = “C:/Users/Output_Folder” >>> parole_scraper(file_path, directory) full_output exported as CSV stats_list exported as Excel