It appears the function returns the entire tag. To only get the string between the tags, we can use the get_text() function:

Programming with Python

Lab VI

Subjects: Modules

Submission: FirstNameLastNameLab6.py file. This file should contain Parts A, B, C, or D (any two). Please uncomment all parts of your code before submitting it.

Install Modules

We want to install the needed modules. To do this, we will use Thonny’s Manage Packages tool.

Open Thonny, select Tools, and click Manage Packages

Search for and install:
1. bs4
2. matplotlib
3. nltk
4. pandas

Download additional NLTK Componenets

You need to download additional “Collections” from NLTK by first importing NLTK, then using the download functionality:

import nltk

nltk.download()

Result:

[This screen appears]

For now, navigate to “Models” and download “VADER Sentiment Lexicon”. We will be using this in our example.

SSL Errors

Occasionally when navigating internet data using Python, you’ll encounter SSL errors. If this happens to you, add the following to your script:

try:

_create_unverified_https_context = ssl._create_unverified_context

except AttributeError:

pass

else:

ssl._create_default_https_context = _create_unverified_https_context

We will discuss what this block of code does when we learn about error handling.

Part A – Pandas

Add your first and last name and Lab 6 to line 1:

#Michael Deamer Lab 6

Save your script as FirstNameLastNameLab6.py. This is the file you will submit for grading. It should contain the parts you choose to complete, A, B, C or D. You only need to complete two parts to receive credit for this lab. Please uncomment all parts of your code before submitting it.

We are going to read a csv and save its contents to a dataframe. First, create a variable named fileName indicating the path and file name of the tickerInfo.csv:

fileName = ‘C:/Users/Administrator/Desktop/Python IO/ tickerInfo.csv’

Note:

Be sure to use the variable fileName with a lowercase f and an uppercase N.

Import pandas and alias it as pd:

import pandas as pd

Read the csv using the variable created in Step 3:

df = pd.read_csv(fileName)

We can find the maximum closing price:

maxClose = df[‘Close’].max()

print(maxClose)

We can extract a column from the data frame and cast it as a list:

closingPrices = list(df[‘Close’])

Note:

We will use this list in Part B.

Part B – MatPlotLib

We want to graph the List created in Part A.

Import the pyplot class from matplotlib and alias it as plt:

import matplotlib.pyplot as plt

Using the range function, create a list of numbers from 0 to the length of the list closingPrices from Step 1. This will become the x axis of the graph:

x = range(len(closingPrices))

Plot these two lists using the plot function from matplotlib:

plt.plot(x, closingPrices)

Show the graph:

plt.show()

Result:

Part C – NLTK

We want the computer to determine if a sentence indicates positive, negative, or neutral sentiment using NLTK.

Create a comment and save it to a variable named comment:

comment = ‘This is the best product I\’ve ever owned!’

Import a portion of NLTK known as SentimentIntensityAnalyzer:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

We must ‘instantiate’ a SentimentIntensityAnalyzer object so we can use one of its functions. This is somewhat similar to previous labs when we’ve created empty lists just so we could use the append function:

sid = SentimentIntensityAnalyzer()

We can pass the comment from Step 1 to NLTK’s polarity_scores function, which will return the sentiment scores of that sentence:

sentimentScores = sid.polarity_scores(comment)

We can print the compound score:

print(sentimentScores[‘compound’])

Result:

0.6696

Note:

Positive scores are between 0 and 1. Negative scores are between 0 and -1.

Try changing the sentence in Step 1 to test NLTK’s ability to interpret sentiment.

Note:

Do not change the variable name, just the sentence.

Part D – Beautiful Soup

We want to scrape data from musicpriceguide.com. More specifically, we would like to get the list of vinyl records from that website. Not all websites are conducive to scrapping. Some reasons we might not be able to scrape a site include:

The site requires a login or captcha test to view the relevant data
The data appears between <script> tags
The website discourages robots using other methods

We want to if our website permits scrapping before moving forward.

Navigate to musicpriceguide.com
Find an example of the data we’d like to scrape. We’ll use the title of one of the records listed.

View page source HTML code. Depending on which browser you are using, this can be done using cntrl+U, or clicking on white space on the website and selecting ‘View Page Source’. Note, this is not the same thing as ‘View Element Source’.

Search for the data from step b using cntrl+F

Note: At the time this lab was created, there were two matches for Paul McCartney’s record.

If the search does not return any results, this is an indication that the data is generated using a script and this website will not be easy to scrape. This site, however, returned two matches. We need to check if those matches appear between <script> tags.
Result:
<a style=”font-size:1.1em” href=”28951/Paul-McCartney-I-Don-t-Know-Come-On-To-Me-Rare-2018-white-label-7-035-200.html” title=”Paul McCartney I Don t Know Come On To Me Rare 2018 white label 7 035 200″>Paul McCartney I Don t Know Come On To Me Rare 2018 white label 7 035 200</a>

Note:

The title appears as both as an attribute of an a tag and as text between tags. Neither appear between <script> tags. This indicates that we can get this data using BeautifulSoup.

Import the BeautifulSoup class from bs4 and urlopen from urllib:

from bs4 import BeautifulSoup

from urllib.request import urlopen

Use the urlopen function to open the url:

url = ‘http://www.musicpriceguide.com’

page = urlopen(url)

Note:

Although nothing is displayed to the screen, this code causes Python to navigate to the site and retrieve its html code.

Pass the page object to the BeautifulSoup class. This requires a second parameter: ‘html.parser’. The second parameter indicates the type of code our first parameter is using:

soup = BeautifulSoup(page, ‘html.parser’)

If the soup object prints HTML code similar to the HTML code we saw in step C, this indicates that we can scrape this website:

print(soup)

Note:

If we are not able to print the HTML code in this step, the site might block robots and will not be easy to scrape. We may want to find another site.

Remove the print statement from step 4. It was just a test.

From step e, we notice that the titles appear in tags that contain a title Let’s search for all tags with a title attribute:

titleTags = soup.find_all(title = True)

If we print the titleTags variable, we notice the find_all function returns a list object. We can loop over this list using a for loop:

for a in titleTags:

print(a)

Result:

<a href=”29098/Blossom-Toes-We-Are-Ever-So-Clean-Marmalade-UK-Original-Pressing-LP.html” style=”font-size:1.1em” title=”Blossom Toes We Are Ever So Clean Marmalade UK Original Pressing LP”>Blossom Toes We Are Ever So Clean Marmalade UK Original Pressing LP</a>

<a href=”27241/KALEIDOSCOPE-TANGERINE-DREAM-LP-ORIG-UK-1967-FONTANA-1ST-PRESS-RARE-PSYCH.html” style=”font-size:1.1em” title=”KALEIDOSCOPE TANGERINE DREAM LP ORIG UK 1967 FONTANA 1ST PRESS RARE PSYCH”>KALEIDOSCOPE TANGERINE DREAM LP ORIG UK 1967 FONTANA 1ST PRESS RARE PSYCH</a>

[…etc]

It appears the function returns the entire tag. To only get the string between the tags, we can use the get_text() function:

for a in titleTags:

print(a.get_text())

Result:

QUEEN crazy little thing 45 RPM 12 RARE COLOMBIA PROMO ONLY PURPLE COLOR

The Rolling Stones Let It Bleed UK Vinyl LP 1969 1st Press Mono EX EX

XTC The Complete and Utter Dukes of Stratosphear Stratosphere 2 LP CD BOX SET
[…etc]

Continue to order Get a quote

Calculate the price of your order

Type of paper needed:

Pages:

550 words

Academic level:

We'll send you the first draft for approval by September 11, 2018 at 10:52 AM

Total price:

$26

The price is based on these factors:

Academic level

Number of pages

Urgency

Basic features

Free title page and bibliography
Unlimited revisions
Plagiarism-free guarantee
Money-back guarantee
24/7 support

On-demand options

Writer’s samples
Part-by-part delivery
Overnight delivery
Copies of used sources
Expert Proofreading

Paper format

275 words per page
12 pt Arial/Times New Roman
Double line spacing
Any citation style (APA, MLA, Chicago/Turabian, Harvard)

It appears the function returns the entire tag. To only get the string between the tags, we can use the get_text() function:

Recent Posts

Recent Comments

Calculate the price of your order

Our guarantees

Money-back guarantee

Zero-plagiarism guarantee

Free-revision policy

Privacy policy

Fair-cooperation guarantee