Stage 4: adding comments to your website
Introduction to networks
The Internet is a network of computers. Here are a few important words to remember:
- Network: a group of entities that can communicate even though they are not ALL directly connected. There is a need to be able to
encode and interpret messages, a way to route the message and rules to decide who gets to use the resources
- Latency: time it gets for the start of a message to gets from its source to its destination - measured in seconds or milliseconds
- Bandwith: amount of information that can be transmitted per unit time - measured in bits or megabits per second
- Bit: smallest unit of information. Yes and No (0 and 1)
- Protocol: how two entities will talk to each other: the client (web browser) and the server (webserver). We use the HyperText Transfer Protocol.
Making the Internet work for you
A few reminders and new concepts for URLs:
- HTML documnets are made of markups
- URL stands for Uniform Resource Locator: it is devided between the protocol (http), the host (www.udacity.com) and the path (/...)
- Query parameters (GET parameter): added to the url, they add extra functionality. The first one is separated from the url with a question mark and the others with a "&" sign: http://www.udacity.com/foo?p=1&q=great&z=2
- Fragments: after GET parameters, they are delimited with "#". It is used to refer to a particular part of the page or there are other uses in Javascript. It is not sent to the server, it only lives in the browser.
- Ports: by default the port is "80". If you want to use a different port, you have to specify it with a semi collon between the host and the path: http://localhost:8000/
A few reminders/new concepts for HTTP:
- GET request: when typing "http://www.udacity.com/foo", we are sending the following request: "GET /foo HTTP/1.1" there is no need for the host as we are already connected to it. See more about GET and POST requests below.
- Headers: "NAME: value". Paricularly important is the use of user-agent header which enables you to taylor how your site responds depending on who or what is visiting your site. "HOST: www.udacity.com", useful as a server can have several names (dependig on how many websites they host). "USER-AGENT: Chrome" Use it appropriately to handle traffic or do anything else you like.
- Response: Status line + status code + response phrase (English meaning of the status code. Example "HTTP/1.1 200 ok". Common staus codes are things like "200 Ok", "302 Found", "404 Not Found" or "500 Server Error"
- Responses contain headers too: "Date: Tues Mar 2012 04:33:33 GMT", "Server:Apache/2.2.3" - try to avoid for security reasons, "content-type:text/html", "Content-length:1539"
Servers: their purpose is to respond to and HTTP request. Their response can be static (pre-written file, image...) or dynamic (file generated dynamically by a program called a web application). In the case of user entering data through a form, the servers will first present the form through a GET request then send the data through a POST request with a possible redirect.
Forms
There is a tag called <form>. It has a closing tag as well and is used to create forms through which the isers can enter data. Inside a form we can have:
- <input> that will create a field on a page. the input can have a name as well: <input name = "q">. If the user enters data and presses enter, the name paramater gets passed to the URL.
- type="hidden" for the input tag. Example: <input type="hidden" name="food" value="egg">
- Buttons are created with "type = submit". Clicking on submit is like pressing enter, it passes the parameter to the url.
You can also add an action parameter to the form tag to change where the data is being sent <form action = "http://www.google.com/search">
- Buttons are also created with the <button> tag.
- Input types can have differnt values and behaviours. the default inpiut type is "text" if nothing is specified. "password" will create a field with dots but will NOT send the password securely. "checkbox": if checked, the value is "on", if not, the parameter doesn't appear at all. The same thing happens for "radio", unless all radio buttos have the same name parameter and a different value parameter
- The value parameter is used to distinguish radio buttons with the same name
- You can also use the <label> with some text to surround your input elements
- You can use the <select> tag with several <option> tags to create a dropdown menu. The value paramater is also available
Break Modulus and Dictionaries
The Modulus is the remainder of a division: 13 % 11 = 2,
Dictionaries: they use curly brackets and associate keys with values: d = {"hydrogen": 1, "helium": 2}. In this case print d["hydrogen"] would return 1. Note that we still use the square brackets to access the elements. Like lists, dictionaries are mutable (strings aren't).
You can check if an element is in a dictionary by using in: print 'lithium' in elements. In the case above this would evaluate to False.
You can also add elements by using the following form: dictionary['newKey'] = value. Note that when using this code, if the key already exist, it will just update the value.
What makes dictionaries interesting is that the value of a key can be anything, like strings and numbers, but also other dictionnaries.
Google App Engine and GET/POST methods
From the Google developper console we can create and manage projects which have a unique ID. This ID can be overrriden as long as it is unique. We can then download some starter code. The starter code imports webapp2 which has been downloaded at the same time as the app engine.
In every starting folder there should be two files: a main.py file and a app.yaml file.
The yaml file points the app engine to the Google project and therefore tells it where to deploy the code. Can be overriden if necessary.
There is also a main.pyc file after the code has been compiled a first time. The Python intepreter creates it to not have to recompile .py files unless necessary.
To use the app engine properly, it is important to understand the differences between GET requests (default requests) and POST requests (specified method).
GET methods:
- Parameters are in the URL
- Used for fetching documents
- Maximum url length
- OK to cache
- Shouldn't change the server
POST methods:
- Parameters are in the body
- Used for updating data
- No max length (even though they can have a length but usually a few megabites)
- Not ok to cache
- OK to change the server ("destructive in nature"...)
It is nice to redirect after a form subimission because this way reloading the page doesn't resubmit the form and because it also enables us to have distinct pages for forms and success pages.
Validation
- Escaping: even though you could write your own escaping function, yo should probably use the built in one by first importing the cgi module then using the following form: cgi.escape(string, quote = True)
- The importance of validation and escaping: validating user data entry is paramount usability and security reasons. We wouldn't a user to accidentally enter data that wouldn't be handled by our code. Similarly, we woulnd't want a user to maliciously inject some SQL or Javascript code that could impair our application.
String formating
String formating will become very handy when implementing templates. There are three types of substitution:
- Simple substitution, where %s can be replaced by a string or a variable just once: "Hello %s" % "John"
or "Hello %s" % name
- More complicated substitution where multiple %s can be replace by several unique strings or variables: "Hello %s %s" % ("John", "Smith") or "Hello %s %s" % (name, surname)
- Complex substitution using a dictionary where some strings can be used more than once: '%(last)s, %(first)s %(last)s' % {'first': "James", 'last': "Bond"}
HTML Templates
Templates are extremely useful for various reasons. One of the main ones is that by offering inheritance and code handling, they limit the data that has to be created more than once, i.e.code duplication, which is an important source of errors and bugs. They also offer escaping, which is a significant security feature.
Variable substitution in Jinja2: use double curly braces:
{{variable}}. These double curly braces basically mean "print" and the "variable" can be any piece of Python code.
For example: <h2>Hello, {{name}}</h2>
Statement syntax in Jinja2:
{%statement%}
output
{%end statement%}
example:
{% if n == 1 %}
n equals 1
{% else %}
n does NOT equal 1
{% endif %}
For loop synthax:
{% for statement %}
body
{% endfor %}
example:
{% for x in range(1,n+1) %}
<li>{{x ** 2}}</li>
{% endfor %}
In Jinja2 (and other templated languages, there are two was of ecaping fields. One at template level when setting up the template: jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape = True) and one at field level with the pipe symbol acting as a filter: <li>{{ item | escape }}</li>. If you used the autoescape symbol, you can use the safe filter to opt the field out: <li>{{ item | safe }}</li>
A few tips:
- Always automatically escape variables when possible
- Minimise the aount of code in your HTML templates
- Minimise the amount of HTML in your code - keep to zero, really...
Template inheritance: Jinja makes it really easy to maintian html pages by using inheritance. In between the <body> tags, yop can use the following syntax to include a block of content:
{% block content%}
{% endblock %}
You will have to use the two same tags on your actual template at the top of which you will have to inclue the "extends" line:
{% extends "file.html" %}
In a nutshell, templates have several advantages:
- Separate different types of code
- Make more readable code
- Make more secure websites
- html that is easier to modify
You can find out more about jinja
here.
Databases
Database are used to store and retrieve large amounts of organised data. The word database can refer to the software used, the machine that actually runs the software or the group of machines that performs the tasks. Databases use tables that contain the data. Querying them with Python is very hard and tedious(named tuples,lambda, .sort(), dictionaries).
There are different types of databases:
- Relational databases: usually using some kind of SQL like Postgresql (Reddit, Hipmunk), MySQL (Facebook and everybody...), SQLite, Oracle
- Google App Engine Datastore:
- Dynamo (Amazon)
- NoSQL databases: Mongo, Couch
SQL stands for Structured Query Language and was invented in the 1970's. It has the following format: SELECT * FROM links WHERE id = 5
Python does have a module called sqlite3 that can be imported.
See query_exercise_1.py to query_exercise_5.py
Joins: we don't really use them
Indexes: theu increase the speed of databases reads (but not inserts...) as there is no need to scan all data. If we have a function to build the index table once, then we don't need to scan through all thedata every time with the other funtions:
def build_link_index():
my_index = {}
for l in links:
my_index[l.id] = l
return my_index
Hash tables are not sorted but tree data structures are. They are also slower as the size of the tree increases
ACID: Atomicity, Consistency, Isolation and Durability:
- Atomicity: all parts of a transaction (simultaneous updates of several tables)succeed or fail together
- Consistency: the datbase will always be consistent
- Isolation: no transaction can interfere with another: if two transactions affect the same row at the same time (upvote and downvote , the row gets locked so that only one transaction can happen at the same time
- Durability: committed transactions don't get lost (during a computer crash for example)
The Google App Engine Datastore:
- Tables = entities. Columns are not fixed. They all have an ID (provided or chosen). They have parents and ancestors