## explore the dataset to understand what you are working with

Before diving into your data, it's always best to first explore the dataset to understand what you are working with.

1. Considering that you will be combining your two tables of data together at a later stage, how many rows of data are available for your analysis?
Hint: do headers count?

1. How are unknown values represented in the dataset?
2. What is the salary range covered by this dataset?
\$254,000
\$200,000
\$235,500
\$238,500
3. How many unique industries are covered by the dataset?
Hint: disregard the unknown values.
59
60
58
61
4. How many unique industries are covered by the dataset?
Hint: disregard the unknown values.
61
60
58
59
1. Consider the following point from your boss's email:
"Where do data scientists work?"
In its current form, this question is ambiguous as it is unclear whether it is asking about the location where data scientists work or the industry in which they work.
Think about how you could approach this question, how to best answer it, and how it could be phrased in data terms. Select the most correct option below.
2. What are locations and industries with the highest number (count) of data jobs?
3. What are the locations with the highest proportion (%) of data scientists relative to all data jobs?
4. What are the industries with the highest salary total (sum) for data scientists?
5. What are the locations and industries with the highest number (count) of data scientists?
6. What are the locations and industries with the highest salary total (sum) for data scientists?

7 Now you've explored your data and thought about how to define your questions, you can now transform the data to make it ready for analysis.
In the job_salary table, the job title and job category are currently given together in the Job Title (Job Category) column, e.g: Healthcare Data Scientist (data scientist).

For further analysis, you will need to separate the job title and job category into separate columns (as shown in the image below).
To create a dedicated Job Title column in your job_salary table, which of the following combinations of text manipulation functions could you use?
Note: the X in the formulae below represents the unseparated Job Title (Job Category) column..

=RIGHT(X,FIND(")", X))
=RIGHT(X,FIND(")", X)-1)
=LEFT(X,LEN(X)-FIND("(", X))
=LEFT(X,FIND("(", X)-1)
=LEFT(X,FIND("(", X))

1. Now create a dedicated Job Category column on your job_salary table and use text manipulation functions to split job category from the job title.
Once you have done this, using your newly created job category column, answer the following to check that you have done this step correctly:
How many data scientist, data engineer, and data analyst jobs (in that given order) are there?
311, 123, 103
109, 315, 119
313, 119, 109
319, 113, 109
303, 119, 109
2. You will also need to join your two sheets of data together, the job_salary table and the company datasets.
Which method(s) below could you use to join the datasets together? Select all correct answers.
Use your method of choice to join the datasets together.
Hint 1: Consider whether your datasets have an equal number of rows to each other.
Hint 2: We recommend joining the company dataset onto the salary dataset.
Manually copy the data from one table and insert onto the side of the other table
3. Manually copy the data from one table and insert it below the other table
4. Use INDEX/MATCH
5. Transform the data in both sheets into Tables and then use Excel Power Query to append the tables together
6. Transform the data in both sheets into Tables and then use Excel Power Query to merge Queries
7. There's no need to join the datasets together as you'll analyse each table separately
8. Use VLOOKUP
9. Now that you have consolidated your two tables of data together, answer the following question to check that you have correctly joined your data:
What is the average salary of a Data Scientist (using your newly created Job Category column) in the Aerospace & Defense industry? Round your answer to the nearest whole number.
Hint: remember that filters hide rows of data, but don't remove them. Ensure that your calculation only includes filtered cells and not the hidden cells.
\$110,679
\$66,344
\$128,302
\$63,658
\$123,816
\$120,294

The Excel data for this assignment is attached here.
Please find the file:- Dataset - Data Jobs_real_assgnt.xlsx

Note that we can also provide help for this assignment under the statistics assignment help customized services.

## Analysis with Correlation and Regression item options

Deliverable 6 - Analysis with Correlation and Regression item options
Assignment Content

Competency
Determine the linear correlation and regression equation between two variables to make predictions for the dependent variable.

Student Success Criteria
View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Scenario
According to the U.S. Geological Survey (USGS), the probability of a magnitude 6.7 or greater earthquake in the Greater Bay Area is 63%, about 2 out of 3, in the next 30 years. In April 2008, scientists and engineers released a new earthquake forecast for the State of California called the Uniform California Earthquake Rupture Forecast (UCERF).

As a junior analyst at the USGS, you are tasked to determine whether there is sufficient evidence to support the claim of a linear correlation between the magnitudes and depths from the earthquakes. Your deliverables will be a PowerPoint presentation you will create summarizing your findings and an excel document to show your work.

Concepts Being Studied
Correlation and regression
Creating scatterplots
Constructing and interpreting a Hypothesis Test for Correlation using r as the test statistic

You are given a spreadsheet that contains the following information:

Magnitude measured on the Richter scale
Depth in km

Deliverable 6 - Analysis with Correlation and Regression.xlsx

Using the spreadsheet, you will answer the problems below in a PowerPoint presentation.

What to Submit
The PowerPoint Assignment presentation should answer and explain the following questions based on the spreadsheet provided above.

Slide 1: Title slide

Slide 2: Introduce your scenario and data set including the variables provided.

Slide 3: Construct a scatterplot of the two variables provided in the spreadsheet. Include a description of what you see in the scatterplot.

Slide 4: Find the value of the linear correlation coefficient r and the critical value of r using α = 0.05. Include an explanation on how you found those values.

Slide 5: Determine whether there is sufficient evidence to support the claim of a linear correlation between the magnitudes and the depths from the earthquakes. Explain.

Slide 6: Find the regression equation. Let the predictor (x) variable be the magnitude. Identify the slope and the y-intercept within your regression equation.

Slide 7: Is the equation a good model? Explain. What would be the best predicted depth of an earthquake with a magnitude of 2.0? Include the correct units.

Slide 8: Conclude by recapping your ideas by summarizing the information presented in context of the scenario.

Along with your PowerPoint presentation, you should include your Excel document which shows all calculations.
This assignment falls under the inferential section of your online statistics class. My Course Tutor experts are competent at handling such assignments. The firs step in handling a statistics assignment is to understand the nature of the data, and to review the possible category of data analysis methods within which hypothesis testing falls. The excel file for this assignment has been attached.

Deliverable 6 - Analysis with Correlation and Regression.xlsx

## hypothesis testing for two sample proportions

Deliverable 5 - Hypothesis Tests for Two Samples item options
Assignment Content

Competency
Evaluate hypothesis tests for population parameters from two populations.

Dealing with Two Populations
Inferential statistics involves forming conclusions about a population parameter. We do so by constructing confidence intervals and testing claims about a population mean and other statistics. Typically, these methods deal with a sample from one population. We can extend the methods to situations involving two populations (and there are many such applications). This deliverable looks at two scenarios.

Concept being Studied
Your focus is on hypothesis tests and confidence intervals for two populations using two samples, some of which are independent and some of which are dependent. These concepts are an extension of hypothesis testing and confidence intervals which use statistics from one sample to make conclusions about population parameters.

Student Success Criteria
View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

What to Submit
Your research and analysis should be presented on the spreadsheet provided.

Deliverable 5 - Hypothesis Tests for Two Samples.xlsx

In order to successfully complete this problem, one would need to use the seven hypothesis testing steps included in our other statistics-related assignment. The relevant files associated with this assignment have been attached to this post. A guide for completing this assignment have been prepared by our statistics assignment experts.

Hypothesis test for two samples example.docx

## deliverable 4 on hypothesis testing

Deliverable 4 - Hypothesis Test
Assignment Content

Competency
Evaluate hypothesis tests for population parameters from one population.

Student Success Criteria
View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Instructions
Scenario (information repeated for deliverable 01, 03, and 04)

A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from \$30,000 to \$200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information:

A listing of the jobs by title
The salary (in dollars) for each job

Deliverable 4 - Hypothesis Tests.xlsx
In prior engagements, you have already explained to your client about the basic statistics and discussed the importance of constructing confidence intervals for the population mean. Your client says that he remembers a little bit about hypothesis testing, but he is a little fuzzy. He asks you to give him the full explanation of all steps in hypothesis testing and wants your conclusion about two claims concerning the average salary for all jobs in the state of Minnesota.

Background information on the Data
The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately \$30,000 to \$200,000 for the state of Minnesota.

What to Submit
Your boss wants you to submit the spreadsheet with the completed calculations, answers, and analysis.

Hypothesis testing is a statistical procedure that requires one to test a claim or claims about a population systematically. My Course Tutor experts have mastered the art of conducting hypothesis testing using the seven hypothesis testing steps. We have attached the hypothesis testing steps below for you to review. The excel file containing the instructions required for you to complete this assignment successfully has also been attached. Note that My Course Tutor offers help in Excel assignments under the SPSS and data analysis services category. You can email or WhatsApp us in case you need help with this assignment.

## Descriptive statistics on confidence interval

Deliverable 3 - Confidence Intervals
Assignment Content

Competency
Develop a confidence interval for a population parameter.

Student Success Criteria
View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Instructions
Scenario (information repeated for deliverable 01, 03, and 04)

A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from \$30,000 to \$200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information:

A listing of the jobs by title
The salary (in dollars) for each job

Deliverable 3 - Confidence Intervals.xlsx

You have previously explained some of the basic statistics to your client already, and he really liked your work. Now he wants you to analyze the confidence intervals.

Background information on the Data
The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately \$30,000 to \$200,000 for the state of Minnesota.

What to Submit
Your boss wants you to submit the spreadsheet with the completed calculations, answers, and analysis.
Note that the excel file attached contains the necessary steps for completing the assignment. Your success in this assignment depends on how you address the assignment's requirements, and whether you show the steps in the calculations. My Course Tutor Experts are available to help you with the assignment under the statistics assignment help category. We ensure that you get the highest grade in all statistics assignments we help you with.

Deliverable 3 - Confidence Intervals.xlsx