It appears the function returns the entire tag. To only get the string between the tags, we can use the get_text() function:

Programming with Python

Lab VI

Subjects: Modules

Submission: FirstNameLastNameLab6.py file. This file should contain Parts A, B, C, or D (any two). Please uncomment all parts of your code before submitting it.

 

Install Modules

We want to install the needed modules. To do this, we will use Thonny’s Manage Packages tool.

  1. Open Thonny, select Tools, and click Manage Packages

 

  1. Search for and install:
    1. bs4
    2. matplotlib
    3. nltk
    4. pandas

 

Download additional NLTK Componenets

You need to download additional “Collections” from NLTK by first importing NLTK, then using the download functionality:

import nltk

nltk.download()

                Result:

[This screen appears]

 

For now, navigate to “Models” and download “VADER Sentiment Lexicon”. We will be using this in our example.

 

SSL Errors

Occasionally when navigating internet data using Python, you’ll encounter SSL errors. If this happens to you, add the following to your script:

 

try:

    _create_unverified_https_context = ssl._create_unverified_context

except AttributeError:

    pass

else:

    ssl._create_default_https_context = _create_unverified_https_context

 

We will discuss what this block of code does when we learn about error handling.

Part A – Pandas

  1. Add your first and last name and Lab 6 to line 1:

 

#Michael Deamer Lab 6

 

  1. Save your script as FirstNameLastNameLab6.py. This is the file you will submit for grading. It should contain the parts you choose to complete, A, B, C or D. You only need to complete two parts to receive credit for this lab. Please uncomment all parts of your code before submitting it.

 

  1. We are going to read a csv and save its contents to a dataframe. First, create a variable named fileName indicating the path and file name of the tickerInfo.csv:

 

fileName = ‘C:/Users/Administrator/Desktop/Python IO/ tickerInfo.csv’

 

Note:

Be sure to use the variable fileName with a lowercase f and an uppercase N.

  1. Import pandas and alias it as pd:

 

import pandas as pd

 

  1. Read the csv using the variable created in Step 3:

 

df = pd.read_csv(fileName)

 

  1. We can find the maximum closing price:

 

maxClose = df[‘Close’].max()

print(maxClose)

 

  1. We can extract a column from the data frame and cast it as a list:

 

closingPrices = list(df[‘Close’])

 

Note:

We will use this list in Part B.

 

Part B – MatPlotLib

We want to graph the List created in Part A.

  1. Import the pyplot class from matplotlib and alias it as plt:

 

import matplotlib.pyplot as plt

 

  1. Using the range function, create a list of numbers from 0 to the length of the list closingPrices from Step 1. This will become the x axis of the graph:

 

x = range(len(closingPrices))

 

  1. Plot these two lists using the plot function from matplotlib:

 

plt.plot(x, closingPrices)

 

  1. Show the graph:

plt.show()

 

Result:

 

 

 

Part C – NLTK

We want the computer to determine if a sentence indicates positive, negative, or neutral sentiment using NLTK.

  1. Create a comment and save it to a variable named comment:

 

comment = ‘This is the best product I\’ve ever owned!’

 

  1. Import a portion of NLTK known as SentimentIntensityAnalyzer:

 

from nltk.sentiment.vader import SentimentIntensityAnalyzer

 

  1. We must ‘instantiate’ a SentimentIntensityAnalyzer object so we can use one of its functions. This is somewhat similar to previous labs when we’ve created empty lists just so we could use the append function:

 

sid = SentimentIntensityAnalyzer()

 

  1. We can pass the comment from Step 1 to NLTK’s polarity_scores function, which will return the sentiment scores of that sentence:

 

sentimentScores = sid.polarity_scores(comment)

 

  1. We can print the compound score:

 

print(sentimentScores[‘compound’])

 

Result:

0.6696

 

Note:

Positive scores are between 0 and 1. Negative scores are between 0 and -1.

 

  1. Try changing the sentence in Step 1 to test NLTK’s ability to interpret sentiment.

 

Note:

Do not change the variable name, just the sentence.

 

Part D – Beautiful Soup

We want to scrape data from musicpriceguide.com. More specifically, we would like to get the list of vinyl records from that website. Not all websites are conducive to scrapping. Some reasons we might not be able to scrape a site include:

  • The site requires a login or captcha test to view the relevant data
  • The data appears between <script> tags
  • The website discourages robots using other methods

We want to if our website permits scrapping before moving forward.

  1. Navigate to musicpriceguide.com
  2. Find an example of the data we’d like to scrape. We’ll use the title of one of the records listed.

 

  1. View page source HTML code. Depending on which browser you are using, this can be done using cntrl+U, or clicking on white space on the website and selecting ‘View Page Source’. Note, this is not the same thing as ‘View Element Source’.

 

  1. Search for the data from step b using cntrl+F

 

 

Note: At the time this lab was created, there were two matches for Paul McCartney’s record.

 

  1. If the search does not return any results, this is an indication that the data is generated using a script and this website will not be easy to scrape. This site, however, returned two matches. We need to check if those matches appear between <script> tags.

    Result:
    <a style=”font-size:1.1em” href=”28951/Paul-McCartney-I-Don-t-Know-Come-On-To-Me-Rare-2018-white-label-7-035-200.html” title=”Paul McCartney I Don t Know Come On To Me Rare 2018 white label 7 035 200″>Paul McCartney I Don t Know   Come On To Me  Rare 2018 white label 7   035 200</a>

 

Note:

The title appears as both as an attribute of an a tag and as text between tags. Neither appear between <script> tags. This indicates that we can get this data using BeautifulSoup.

 

  1. Import the BeautifulSoup class from bs4 and urlopen from urllib:

 

from bs4 import BeautifulSoup

from urllib.request import urlopen

 

  1. Use the urlopen function to open the url:

 

url = ‘http://www.musicpriceguide.com’

page = urlopen(url)

                Note:

Although nothing is displayed to the screen, this code causes Python to navigate to the site and retrieve its html code.

 

  1. Pass the page object to the BeautifulSoup class. This requires a second parameter: ‘html.parser’. The second parameter indicates the type of code our first parameter is using:

 

soup = BeautifulSoup(page, ‘html.parser’)

 

  1. If the soup object prints HTML code similar to the HTML code we saw in step C, this indicates that we can scrape this website:

 

print(soup)

 

Note:

If we are not able to print the HTML code in this step, the site might block robots and will not be easy to scrape. We may want to find another site.

 

  1. Remove the print statement from step 4. It was just a test.

 

  1. From step e, we notice that the titles appear in tags that contain a title Let’s search for all tags with a title attribute:

 

titleTags = soup.find_all(title = True)

 

  1. If we print the titleTags variable, we notice the find_all function returns a list object. We can loop over this list using a for loop:

 

for a in titleTags:

    print(a)

 

Result:

<a href=”29098/Blossom-Toes-We-Are-Ever-So-Clean-Marmalade-UK-Original-Pressing-LP.html” style=”font-size:1.1em” title=”Blossom Toes We Are Ever So Clean Marmalade UK Original Pressing LP”>Blossom Toes   We Are Ever So Clean   Marmalade   UK Original Pressing LP</a>

<a href=”27241/KALEIDOSCOPE-TANGERINE-DREAM-LP-ORIG-UK-1967-FONTANA-1ST-PRESS-RARE-PSYCH.html” style=”font-size:1.1em” title=”KALEIDOSCOPE TANGERINE DREAM LP ORIG UK 1967 FONTANA 1ST PRESS RARE PSYCH”>KALEIDOSCOPE   TANGERINE DREAM LP ORIG UK 1967 FONTANA 1ST PRESS RARE PSYCH</a>

 

[…etc]

 

  1. It appears the function returns the entire tag. To only get the string between the tags, we can use the get_text() function:

 

for a in titleTags:

    print(a.get_text())

 

Result:

QUEEN crazy little thing 45 RPM 12 RARE COLOMBIA PROMO ONLY PURPLE COLOR

The Rolling Stones Let It Bleed UK Vinyl LP 1969 1st Press Mono EX EX

XTC The Complete and Utter Dukes of Stratosphear Stratosphere 2 LP CD BOX SET
[…etc]

 

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
error: Content is protected !!