# explore the dataset to understand what you are working with

Before diving into your data, it's always best to first explore the dataset to understand what you are working with.

- Considering that you will be combining your two tables of data together at a later stage, how many rows of data are available for your analysis?

Hint: do headers count?

Enter Answer here

- How are unknown values represented in the dataset?

Answer here: - What is the salary range covered by this dataset?

$254,000

$200,000

$235,500

$238,500 - How many unique industries are covered by the dataset?

Hint: disregard the unknown values.

59

60

58

61 - How many unique industries are covered by the dataset?

Hint: disregard the unknown values.

61

60

58

59

- Consider the following point from your boss's email:

"Where do data scientists work?"

In its current form, this question is ambiguous as it is unclear whether it is asking about the location where data scientists work or the industry in which they work.

Think about how you could approach this question, how to best answer it, and how it could be phrased in data terms. Select the most correct option below. - What are locations and industries with the highest number (count) of data jobs?
- What are the locations with the highest proportion (%) of data scientists relative to all data jobs?
- What are the industries with the highest salary total (sum) for data scientists?
- What are the locations and industries with the highest number (count) of data scientists?
- What are the locations and industries with the highest salary total (sum) for data scientists?

7 Now you've explored your data and thought about how to define your questions, you can now transform the data to make it ready for analysis.

In the job_salary table, the job title and job category are currently given together in the Job Title (Job Category) column, e.g: Healthcare Data Scientist (data scientist).

For further analysis, you will need to separate the job title and job category into separate columns (as shown in the image below).

To create a dedicated Job Title column in your job_salary table, which of the following combinations of text manipulation functions could you use?

Note: the X in the formulae below represents the unseparated Job Title (Job Category) column..

=RIGHT(X,FIND(")", X))

=RIGHT(X,FIND(")", X)-1)

=LEFT(X,LEN(X)-FIND("(", X))

=LEFT(X,FIND("(", X)-1)

=LEFT(X,FIND("(", X))

- Now create a dedicated Job Category column on your job_salary table and use text manipulation functions to split job category from the job title.

Once you have done this, using your newly created job category column, answer the following to check that you have done this step correctly:

How many data scientist, data engineer, and data analyst jobs (in that given order) are there?

311, 123, 103

109, 315, 119

313, 119, 109

319, 113, 109

303, 119, 109 - You will also need to join your two sheets of data together, the job_salary table and the company datasets.

Which method(s) below could you use to join the datasets together? Select all correct answers.

Use your method of choice to join the datasets together.

Hint 1: Consider whether your datasets have an equal number of rows to each other.

Hint 2: We recommend joining the company dataset onto the salary dataset.

Manually copy the data from one table and insert onto the side of the other table - Manually copy the data from one table and insert it below the other table
- Use INDEX/MATCH
- Transform the data in both sheets into Tables and then use Excel Power Query to append the tables together
- Transform the data in both sheets into Tables and then use Excel Power Query to merge Queries
- There's no need to join the datasets together as you'll analyse each table separately
- Use VLOOKUP
- Now that you have consolidated your two tables of data together, answer the following question to check that you have correctly joined your data:

What is the average salary of a Data Scientist (using your newly created Job Category column) in the Aerospace & Defense industry? Round your answer to the nearest whole number.

Hint: remember that filters hide rows of data, but don't remove them. Ensure that your calculation only includes filtered cells and not the hidden cells.

$110,679

$66,344

$128,302

$63,658

$123,816

$120,294

The Excel data for this assignment is attached here.

Please find the file:- Dataset - Data Jobs_real_assgnt.xlsx

Note that we can also provide help for this assignment under the statistics assignment help customized services.