BogoToBogo
  • Home
  • About
  • Big Data
  • Machine Learning
  • AngularJS
  • Python
  • C++
  • go
  • DevOps
  • Kubernetes
  • Algorithms
  • More...
    • Qt 5
    • Linux
    • FFmpeg
    • Matlab
    • Django 1.8
    • Ruby On Rails
    • HTML5 & CSS

Serialization with pickle and json

python_logo




Bookmark and Share





bogotobogo.com site search:

Serialization

Serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and resurrected later in the same or another computer environment.


When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object.

This process of serializing an object is also called deflating or marshalling an object. The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called inflating or unmarshalling). wiki.

In Python, we have the pickle module. The bulk of the pickle module is written in C, like the Python interpreter itself. It can store arbitrarily complex Python data structures. It is a cross-version customisable but unsafe (not secure against erroneous or malicious data) serialization format.

The standard library also includes modules serializing to standard data formats:
  1. json with built-in support for basic scalar and collection types and able to support arbitrary types via encoding and decoding hooks.
  2. XML-encoded property lists. (plistlib), limited to plist-supported types (numbers, strings, booleans, tuples, lists, dictionaries, datetime and binary blobs)

Finally, it is recommended that an object's __repr__ be evaluable in the right environment, making it a rough match for Common Lisp's print-object. wiki




Pickle

What data type can pickle store?

Here are the things that the pickle module store:

  1. All the native datatypes that Python supports: booleans, integers, floating point numbers, complex numbers, strings, bytes objects, byte arrays, and None.

  2. Lists, tuples, dictionaries, and sets containing any combination of native datatypes.

  3. Lists, tuples, dictionaries, and sets containing any combination of lists, tuples, dictionaries, and sets containing any combination of native datatypes (and so on, to the maximum nesting level that Python supports).

  4. Functions, classes, and instances of classes (with caveats).



Constructing Pickle data

We will use two Python Shells, 'A' & 'B':

>>> shell = 'A'

Open another Shell:

>>> shell = 'B'

Here is the dictionary type data for Shell 'A':

>>> shell
'A'
>>> book = {}
>>> book['title'] = 'Light Science and Magic: An Introduction to Photographic Lighting, Kindle Edition'
>>> book['page_link'] = 'http://www.amazon.com/Fil-Hunter/e/B001ITTV7A'
>>> book['comment_link'] = None
>>> book['id'] = b'\xAC\xE2\xC1\xD7'
>>> book['tags'] = ('Photography', 'Kindle', 'Light')
>>> book['published'] = True
>>> import time
>>> book['published_time'] = time.strptime('Mon Sep 10 23:18:32 2012')
>>> book['published_time']
time.struct_time(tm_year=2012, tm_mon=9, tm_mday=10, tm_hour=23,
 tm_min=18, tm_sec=32, tm_wday=0, tm_yday=254, tm_isdst=-1)
>>> 

Here, we're trying to use as many data types as possible.
The time module contains a data structure, struct_time to represent a point in time and functions to manipulate time structs. The strptime() function takes a formatted string an converts it to a struct_time. This string is in the default format, but we can control that with format codes. For more details, visit the time module.




Saving data as a pickle file

Now, we have a dictionay that has all the information about the book. Let's save it as a pickle file:

>>> import pickle
>>> with open('book.pickle', 'wb') as f:
	pickle.dump(book, f)

We set the file mode to wb to open the file for writing in binary mode. Wrap it in a with statement to ensure the file is closed automatically when we're done with it. The dump() function in the pickle module takes a serializable Python data structure, serializes it into a binary, Python-specific format using the latest version of the pickle protocol, and saves it to an open file.

  1. The pickle module takes a Python data structure and saves it to a file.
  2. Serializes the data structure using a data format called the pickle protocol.
  3. The pickle protocol is Python-specific; there is no guarantee of cross-language compatibility.
  4. Not every Python data structure can be serialized by the pickle module. The pickle protocol has changed several times as new data types have been added to the Python language, but there are still limitations.
  5. So, there is no guarantee of compatibility between different versions of Python itself.
  6. Unless we specify otherwise, the functions in the pickle module will use the latest version of the pickle protocol.
  7. The latest version of the pickle protocol is a binary format. Be sure to open our pickle files in binary mode, or the data will get corrupted during writing.



Loading data from a pickle file

Let's load the saved data from a pickle file on another Python Shell B.

>>> shell
'B'
>>> import pickle
>>> with open('book.pickle', 'rb') as f:
	b = pickle.load(f)

>>> b
{'published_time': time.struct_time(tm_year=2012, tm_mon=9, 
tm_mday=10, tm_hour=23, tm_min=18, tm_sec=32, tm_wday=0, tm_yday=254, tm_isdst=-1), 
'title': 'Light Science and Magic: An Introduction to Photographic Lighting, Kindle Edition', 
'tags': ('Photography', 'Kindle', 'Light'), 
'page_link': 'http://www.amazon.com/Fil-Hunter/e/B001ITTV7A', 
'published': True, 'id': b'\xac\xe2\xc1\xd7', 'comment_link': None}
  1. There is no book variable defined here since we defined a book variable in Python Shell A.
  2. We opened the book.pickle file we created in Python Shell A. The pickle module uses a binary data format, so we should always open pickle files in binary mode.
  3. The pickle.load() function takes a stream object, reads the serialized data from the stream, creates a new Python object, recreates the serialized data in the new Python object, and returns the new Python object.
  4. The pickle.dump()/pickle.load() cycle creates a new data structure that is equal to the original data structure.

Let's switch back to Python Shell A.

>>> shell
'A'
>>> with open('book.pickle', 'rb') as f:
	book2 = pickle.load(f)
	
>>> book2 == book
True
>>> book2 is book
False
  1. We opened the book.pickle file, and loaded the serialized data into a new variable, book2.
  2. The two dictionaries, book and book2, are equal.
  3. After we serialized this dictionary and stored it in the book.pickle file, and then read it back the serialized data from that file and created a perfect replica of the original data structure.
  4. Equality is not the same as identity. We've created a perfect replica of the original data structure, which is true. But it's still a copy.



Serializing data in memory with pickle

if we don't want use a file, we can still serialize an object in memory.

>>> shell
'A'
>>> m = pickle.dumps(book)
>>> type(m)
<class 'bytes'>
>>> book3 = pickle.loads(m)
>>> book3 == book
True
  1. The pickle.dumps() function (note that we're using the s at the end of the function name, not the dump()) performs the same serialization as the pickle.dump() function. Instead of taking a stream object and writing the serialized data to a file on disk, it simply returns the serialized data.
  2. Since the pickle protocol uses a binary data format, the pickle.dumps() function returns a bytes object.
  3. The pickle.loads() function (again, note the s at the end of the function name) performs the same deserialization as the pickle.load() function. Instead of taking a stream object and reading the serialized data from a file, it takes a bytes object containing serialized data, such as the one returned by the pickle.dumps() function.
  4. The end result is the same: a perfect replica of the original dictionary.



Python serialized object and JSON

The data format used by the pickle module is Python-specific. It makes no attempt to be compatible with other programming languages. If cross-language compatibility is one of our requirements, we need to look at other serialization formats. One such format is json.

JSON(JavaScript Object Notation) is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship with JavaScript, it is language-independent, with parsers available for many languages. json is explicitly designed to be usable across multiple programming languages. The JSON format is often used for serializing and transmitting structured data over a network connection. It is used primarily to transmit data between a server and web application, serving as an alternative to XML - from wiki

Python 3 includes a json module in the standard library. Like the pickle module, the json module has functions for serializing data structures, storing the serialized data on disk, loading serialized data from disk, and unserializing the data back into a new Python object. But there are some important differences, too.

  1. The json data format is text-based, not binary. All json values are case-sensitive.
  2. As with any text-based format, there is the issue of whitespace. json allows arbitrary amounts of whitespace (spaces, tabs, carriage returns, and line feeds) between values. This whitespace is insignificant, which means that json encoders can add as much or as little whitespace as they like, and json decoders are required to ignore the whitespace between values. This allows us to pretty-print our json data, nicely nesting values within values at different indentation levels so we can read it in a standard browser or text editor. Python's json module has options for pretty-printing during encoding.
  3. There's the perennial problem of character encoding. json encodes values as plain text, but as we know, there are no such thing as plain text. json must be stored in a Unicode encoding (UTF-32, UTF-16, or the default, utf-8). Regarding an encoding with json, please visit RFC 4627



Saving data to JSON

We're going to create a new data structure instead of re-using the existing entry data structure. json is a text-based format, which means we need to open this file in text mode and specify a character encoding. We can never go wrong with utf-8.

try:
    import simplejson as json
except:
    import json

book = {}
book['title'] = 'Light Science and Magic: An Introduction to Photographic Lighting, Kindle Edition'
book['tags'] = ('Photography', 'Kindle', 'Light')
book['published'] = True
book['comment_link'] = None
book['id'] = 1024

with open('ebook.json',  'w') as f:
	json.dump(book, f)

Like the pickle module, the json module defines a dump() function which takes a Python data structure and a writable stream object. The dump() function serializes the Python data structure and writes it to the stream object. Doing this inside a with statement will ensure that the file is closed properly when we're done.

Let's see what's in ebook.json file:

$ cat ebook.json
{"published": true, "tags": ["Photography", "Kindle", "Light"], "id": 1024, "com
ment_link": null, "title": "Light Science and Magic: An Introduction to Photographic Lighting, Kindle Edition"}

It's clearly more readable than a pickle file. But json can contain arbitrary whitespace between values, and the json module provides an easy way to take advantage of this to create even more readable json files:

>>> with codecs.open('book_more_friendly.json', mode='w', encoding='utf-8') as f:
	json.dump(book, f, indent=3)

We passed an indent parameter to the json.dump() function, and it made the resulting json file more readable, at the expense of larger file size. The indent parameter is an integer.

$ cat book_more_friendly.json
{
   "published": true,
   "tags": [
      "Photography",
      "Kindle",
      "Light"
   ],
   "id": 1024,
   "comment_link": null,
   "title": "Light Science and Magic: An Introduction to Photographic Lighting,
Kindle Edition"
}

Here is another example for json:

#!/isr/bin/python
import psutil
import os
import subprocess
import string
import bitstring
import json
import codecs

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

procs_id = 0
procs = {}
procs_data = []

ismvInfo = {
   'baseName':' ',
   'video': {
      'src':[],
      'TrackIDvalue':[],
      'Duration': 0,
      'QualityLevels': 1,
      'Chunks': 0,
      'Url': '',
      'index':[],
      'bitrate':[],
      'fourCC':[],
      'width': [],
      'height':[],
      'codecPrivateData': [],
      'fragDurations':[]
   },
   'audio': {
      'src':[],
      'TrackIDvalue':[],
      'QualityLevels': 1,
      'index':[],
      'bitrate':[],
      'fourCC':[],
      'samplingRate':[],
      'channels':[],
      'bitsPerSample':[],
      'packetSize':[],
      'audioTag': [],
      'codecPrivateData': [],
      'fragDurations': [],
   }
}

def runCommand(cmd, use_shell = False, return_stdout = False, busy_wait = True, poll_duration = 0.5):
    # Sanitize cmd to string
    cmd = map(lambda x: '%s' % x, cmd)
    if use_shell:
        command = ' '.join(cmd)
    else:
        command = cmd

    if return_stdout:
        proc = psutil.Popen(cmd, shell = use_shell, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
    else:
        proc = psutil.Popen(cmd, shell = use_shell,
                                stdout = open('/dev/null', 'w'),
                                stderr = open('/dev/null', 'w'))

    global procs_id
    global procs
    global procs_data
    proc_id = procs_id
    procs[proc_id] = proc
    procs_id += 1
    data = { }

    while busy_wait:
        returncode = proc.poll()
        if returncode == None:
            try:
                data = proc.as_dict(attrs = ['get_io_counters', 'get_cpu_times'])
            except Exception, e:
                pass
            time.sleep(poll_duration)
        else:
            break

    (stdout, stderr) = proc.communicate()
    returncode = proc.returncode
    del procs[proc_id]

    if returncode != 0:
        raise Exception(stderr)
    else:
        if data:
            procs_data.append(data)
        return stdout

# server manifest
def ismParse(data):
    # need to remove the string below to make xml parse work
    data = data.replace(' xmlns="http://www.w3.org/2001/SMIL20/Language"','')
    root = ET.fromstring(data)

    # head 
    for m in root.iter('head'):
        for p in m.iter('meta'):
            ismvInfo['baseName'] = (p.attrib['content']).split('.')[0]

    # videoAttributes
    for v in root.iter('video'):
        ismvInfo['video']['src'].append(v.attrib['src'])
        for p in v.iter('param'):
            ismvInfo['video']['TrackIDvalue'].append(p.attrib['value'])

    # audioAttributes
    for a in root.iter('audio'):
        ismvInfo['audio']['src'].append(a.attrib['src'])
        for p in a.iter('param'):
            ismvInfo['audio']['TrackIDvalue'].append(p.attrib['value'])

# client manifest
def ismcParse(data):
    root = ET.fromstring(data)

    # duration
    # streamDuration = root.attrib['Duration']
    ismvInfo['video']['Duration'] = root.attrib['Duration']

    for s in root.iter('StreamIndex'):
        if(s.attrib['Type'] == 'video'):
            ismvInfo['video']['QualityLevels'] = s.attrib['QualityLevels']
            ismvInfo['video']['Chunks'] = s.attrib['Chunks']
            ismvInfo['video']['Url'] = s.attrib['Url']
            for q in s.iter('QualityLevel'):
                ismvInfo['video']['index'].append(q.attrib['Index'])
                ismvInfo['video']['bitrate'].append(q.attrib['Bitrate'])
                ismvInfo['video']['fourCC'].append(q.attrib['FourCC'])
                ismvInfo['video']['width'].append(q.attrib['MaxWidth'])
                ismvInfo['video']['height'].append(q.attrib['MaxHeight'])
                ismvInfo['video']['codecPrivateData'].append(q.attrib['CodecPrivateData'])

            # video frag duration
            for c in s.iter('c'):
                ismvInfo['video']['fragDurations'].append(c.attrib['d'])

        elif(s.attrib['Type'] == 'audio'):
            ismvInfo['audio']['QualityLevels'] = s.attrib['QualityLevels']
            ismvInfo['audio']['Url'] = s.attrib['Url']
            for q in s.iter('QualityLevel'):
                #ismvInfo['audio']['index'] = q.attrib['Index'] 
                ismvInfo['audio']['index'].append(q.attrib['Index'])
                ismvInfo['audio']['bitrate'].append(q.attrib['Bitrate'])
                ismvInfo['audio']['fourCC'].append(q.attrib['FourCC'])
                ismvInfo['audio']['samplingRate'].append(q.attrib['SamplingRate'])
                ismvInfo['audio']['channels'].append(q.attrib['Channels'])
                ismvInfo['audio']['bitsPerSample'].append(q.attrib['BitsPerSample'])
                ismvInfo['audio']['packetSize'].append(q.attrib['PacketSize'])
                ismvInfo['audio']['audioTag'].append(q.attrib['AudioTag'])
                ismvInfo['audio']['codecPrivateData'].append(q.attrib['CodecPrivateData'])
            # audio frag duration
            for c in s.iter('c'):
                #audioFragDuration.append(c.attrib['d'])
                ismvInfo['audio']['fragDurations'].append(c.attrib['d'])

def populateManifestMetadata(base):
    try:
        # parse server manifest and populate ismv info data
        with open(base+'.ism', 'rb') as manifest:
            ismData = manifest.read()
            ismParse(ismData)

        # parse client manifest and populate ismv info data
        with open(base+'.ismc', 'rb') as manifest:
            ismcData = manifest.read()
            ismcParse(ismcData)

    except Exception, e:
        raise RuntimeError("issue opening ismv manifest file")

# input 
# ismvFIles - list of ismv files
# base      - basename of ismv files
def setManifestMetadata(ismvFiles, base):
    #cmd = ['ismindex','-n', ismTmpName,'bunny_400.ismv','bunny_894.ismv','bunny_2000.ismv' ] 
    cmd = ['ismindex','-n', base]
    for ism in ismvFiles:
        cmd.append(ism)
    stdout = runCommand(cmd, return_stdout = True, busy_wait = False)
    populateManifestMetadata(base)

if __name__ == '__main__':

   ismvFiles = ['bunny_400.ismv','bunny_894.ismv','bunny_2000.ismv']
   base = 'bunny'

   setManifestMetadata(ismvFiles, base)

   # save to json file
   with codecs.open('ismvInfo.json', 'w', encoding='utf-8') as f:
        json.dump(ismvInfo, f)

The output is ismvInfo.json.



Data type mapping

There are some mismatches in JSON's coverage of Python datatypes. Some of them are simply naming differences, but there are two important Python datatypes that are completely missing: tuples and bytes.

Python3 Json
dictionary object
list array
tuple N/A
bytes N/A
float real number
True true
False false
None null



Loading data from a JSON file
>>> import json
>>> import codecs
>>> with codecs.open('j.json', 'r', encoding='utf-8') as f:
	data_from_jason = json.load(f)

	
>>> data_from_jason
{'title': 'Light Science and Magic: An Introduction to Photographic Lighting, Kindle Edition', 
'tags': ['Photography', 'Kindle', 'Light'], 'id': 1024, 'comment_link': None, 'published': True}
>>> 




List to JSON file

The following code makes a list of dictionary items and the save it to json. The input used in the code is semicolon separated with three columns like this:

protocol;service;plugin

Before making it as a list of dictionary items, we add additional info field, 'value':

try:
    import simplejson as json
except ImportError:
    import json

def get_data(dat):
    with open('input.txt', 'rb') as f:
        for l in f:
            d = {}
            line = ((l.rstrip()).split(';'))
            line.append(0)
            d['protocol'] = line[0]
            d['service'] = line[1]
            d['plugin'] = line[2]
            d['value'] = line[3]
            dat.append(d)
    return dat

def convert_to_json(data):
    with open('data.json', 'w') as f:
        json.dump(data, f)

if __name__ == '__main__':
    data = []
    data = get_data(data)
    convert_to_json(data)

The output json file looks like this:

[{"protocol": "pro1", "value": 0, "service": "service1", "plugin": "check_wmi_plus.pl -H 10.6.88.72 -m checkfolderfilecount -u administrator -p c0c1c -w 1000 -c 2000 -a 's:' -o 'error/' --nodatamode"}, {"protocol": "proto2", "value": 1, "service": "service2", "plugin": "check_wmi_plus.pl -H 10.6.88.72 -m checkdirage -u administrator -p a23aa8 --nodatamode -c :1 -a s -o input/ -3 `date --utc --date '-30 mins' +\"%Y%m%d%H%M%S.000000+000\" `"},...]






Some of the sections (pickle) of this chapter is largely based on http://getpython3.com/diveintopython3/serializing.html







more



Python tutorial



Python Home

Introduction

Running Python Programs (os, sys, import)

Modules and IDLE (Import, Reload, exec)

Object Types - Numbers, Strings, and None

Strings - Escape Sequence, Raw String, and Slicing

Strings - Methods

Formatting Strings - expressions and method calls

Files and os.path

Traversing directories recursively

Subprocess Module

Regular Expressions with Python

Regular Expressions Cheat Sheet

Object Types - Lists

Object Types - Dictionaries and Tuples

Functions def, *args, **kargs

Functions lambda

Built-in Functions

map, filter, and reduce

Decorators

List Comprehension

Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism

Hashing (Hash tables and hashlib)

Dictionary Comprehension with zip

The yield keyword

Generator Functions and Expressions

generator.send() method

Iterators

Classes and Instances (__init__, __call__, etc.)

if__name__ == '__main__'

argparse

Exceptions

@static method vs class method

Private attributes and private methods

bits, bytes, bitstring, and constBitStream

json.dump(s) and json.load(s)

Python Object Serialization - pickle and json

Python Object Serialization - yaml and json

Priority queue and heap queue data structure

Graph data structure

Dijkstra's shortest path algorithm

Prim's spanning tree algorithm

Closure

Functional programming in Python

Remote running a local file using ssh

SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table

SQLite 3 - B. Selecting, updating and deleting data

MongoDB with PyMongo I - Installing MongoDB ...

Python HTTP Web Services - urllib, httplib2

Web scraping with Selenium for checking domain availability

REST API : Http Requests for Humans with Flask

Blog app with Tornado

Multithreading ...

Python Network Programming I - Basic Server / Client : A Basics

Python Network Programming I - Basic Server / Client : B File Transfer

Python Network Programming II - Chat Server / Client

Python Network Programming III - Echo Server using socketserver network framework

Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn

Python Coding Questions I

Python Coding Questions II

Python Coding Questions III

Python Coding Questions IV

Python Coding Questions V

Python Coding Questions VI

Python Coding Questions VII

Python Coding Questions VIII

Python Coding Questions IX

Python Coding Questions X

Image processing with Python image library Pillow

Python and C++ with SIP

PyDev with Eclipse

Matplotlib

Redis with Python

NumPy array basics A

NumPy Matrix and Linear Algebra

Pandas with NumPy and Matplotlib

Celluar Automata

Batch gradient descent algorithm

Longest Common Substring Algorithm

Python Unit Test - TDD using unittest.TestCase class

Simple tool - Google page ranking by keywords

Google App Hello World

Google App webapp2 and WSGI

Uploading Google App Hello World

Python 2 vs Python 3

virtualenv and virtualenvwrapper

Uploading a big file to AWS S3 using boto module

Scheduled stopping and starting an AWS instance

Cloudera CDH5 - Scheduled stopping and starting services

Removing Cloud Files - Rackspace API with curl and subprocess

Checking if a process is running/hanging and stop/run a scheduled task on Windows

Apache Spark 1.3 with PySpark (Spark Python API) Shell

Apache Spark 1.2 Streaming

bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...

Flask app with Apache WSGI on Ubuntu14/CentOS7 ...

Fabric - streamlining the use of SSH for application deployment

Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App

Neural Networks with backpropagation for XOR using one hidden layer

NLP - NLTK (Natural Language Toolkit) ...

RabbitMQ(Message broker server) and Celery(Task queue) ...

OpenCV3 and Matplotlib ...

Simple tool - Concatenating slides using FFmpeg ...

iPython - Signal Processing with NumPy

iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github

iPython and Jupyter Notebook with Embedded D3.js

Downloading YouTube videos using youtube-dl embedded with Python

Machine Learning : scikit-learn ...

Django 1.6/1.8 Web Framework ...










Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization

YouTubeMy YouTube channel

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong







Python tutorial



Python Home

Introduction

Running Python Programs (os, sys, import)

Modules and IDLE (Import, Reload, exec)

Object Types - Numbers, Strings, and None

Strings - Escape Sequence, Raw String, and Slicing

Strings - Methods

Formatting Strings - expressions and method calls

Files and os.path

Traversing directories recursively

Subprocess Module

Regular Expressions with Python

Regular Expressions Cheat Sheet

Object Types - Lists

Object Types - Dictionaries and Tuples

Functions def, *args, **kargs

Functions lambda

Built-in Functions

map, filter, and reduce

Decorators

List Comprehension

Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism

Hashing (Hash tables and hashlib)

Dictionary Comprehension with zip

The yield keyword

Generator Functions and Expressions

generator.send() method

Iterators

Classes and Instances (__init__, __call__, etc.)

if__name__ == '__main__'

argparse

Exceptions

@static method vs class method

Private attributes and private methods

bits, bytes, bitstring, and constBitStream

json.dump(s) and json.load(s)

Python Object Serialization - pickle and json

Python Object Serialization - yaml and json

Priority queue and heap queue data structure

Graph data structure

Dijkstra's shortest path algorithm

Prim's spanning tree algorithm

Closure

Functional programming in Python

Remote running a local file using ssh

SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table

SQLite 3 - B. Selecting, updating and deleting data

MongoDB with PyMongo I - Installing MongoDB ...

Python HTTP Web Services - urllib, httplib2

Web scraping with Selenium for checking domain availability

REST API : Http Requests for Humans with Flask

Blog app with Tornado

Multithreading ...

Python Network Programming I - Basic Server / Client : A Basics

Python Network Programming I - Basic Server / Client : B File Transfer

Python Network Programming II - Chat Server / Client

Python Network Programming III - Echo Server using socketserver network framework

Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn

Python Coding Questions I

Python Coding Questions II

Python Coding Questions III

Python Coding Questions IV

Python Coding Questions V

Python Coding Questions VI

Python Coding Questions VII

Python Coding Questions VIII

Python Coding Questions IX

Python Coding Questions X

Image processing with Python image library Pillow

Python and C++ with SIP

PyDev with Eclipse

Matplotlib

Redis with Python

NumPy array basics A

NumPy Matrix and Linear Algebra

Pandas with NumPy and Matplotlib

Celluar Automata

Batch gradient descent algorithm

Longest Common Substring Algorithm

Python Unit Test - TDD using unittest.TestCase class

Simple tool - Google page ranking by keywords

Google App Hello World

Google App webapp2 and WSGI

Uploading Google App Hello World

Python 2 vs Python 3

virtualenv and virtualenvwrapper

Uploading a big file to AWS S3 using boto module

Scheduled stopping and starting an AWS instance

Cloudera CDH5 - Scheduled stopping and starting services

Removing Cloud Files - Rackspace API with curl and subprocess

Checking if a process is running/hanging and stop/run a scheduled task on Windows

Apache Spark 1.3 with PySpark (Spark Python API) Shell

Apache Spark 1.2 Streaming

bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...

Flask app with Apache WSGI on Ubuntu14/CentOS7 ...

Selenium WebDriver

Fabric - streamlining the use of SSH for application deployment

Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App

Neural Networks with backpropagation for XOR using one hidden layer

NLP - NLTK (Natural Language Toolkit) ...

RabbitMQ(Message broker server) and Celery(Task queue) ...

OpenCV3 and Matplotlib ...

Simple tool - Concatenating slides using FFmpeg ...

iPython - Signal Processing with NumPy

iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github

iPython and Jupyter Notebook with Embedded D3.js

Downloading YouTube videos using youtube-dl embedded with Python

Machine Learning : scikit-learn ...

Django 1.6/1.8 Web Framework ...


Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong






OpenCV 3 image and video processing with Python



OpenCV 3 with Python

Image - OpenCV BGR : Matplotlib RGB

Basic image operations - pixel access

iPython - Signal Processing with NumPy

Signal Processing with NumPy I - FFT and DFT for sine, square waves, unitpulse, and random signal

Signal Processing with NumPy II - Image Fourier Transform : FFT & DFT

Inverse Fourier Transform of an Image with low pass filter: cv2.idft()

Image Histogram

Video Capture and Switching colorspaces - RGB / HSV

Adaptive Thresholding - Otsu's clustering-based image thresholding

Edge Detection - Sobel and Laplacian Kernels

Canny Edge Detection

Hough Transform - Circles

Watershed Algorithm : Marker-based Segmentation I

Watershed Algorithm : Marker-based Segmentation II

Image noise reduction : Non-local Means denoising algorithm

Image object detection : Face detection using Haar Cascade Classifiers

Image segmentation - Foreground extraction Grabcut algorithm based on graph cuts

Image Reconstruction - Inpainting (Interpolation) - Fast Marching Methods

Video : Mean shift object tracking

Machine Learning : Clustering - K-Means clustering I

Machine Learning : Clustering - K-Means clustering II

Machine Learning : Classification - k-nearest neighbors (k-NN) algorithm




Machine Learning with scikit-learn



scikit-learn installation

scikit-learn : Features and feature extraction - iris dataset

scikit-learn : Machine Learning Quick Preview

scikit-learn : Data Preprocessing I - Missing / Categorical data

scikit-learn : Data Preprocessing II - Partitioning a dataset / Feature scaling / Feature Selection / Regularization

scikit-learn : Data Preprocessing III - Dimensionality reduction vis Sequential feature selection / Assessing feature importance via random forests

Data Compression via Dimensionality Reduction I - Principal component analysis (PCA)

scikit-learn : Data Compression via Dimensionality Reduction II - Linear Discriminant Analysis (LDA)

scikit-learn : Data Compression via Dimensionality Reduction III - Nonlinear mappings via kernel principal component (KPCA) analysis

scikit-learn : Logistic Regression, Overfitting & regularization

scikit-learn : Supervised Learning & Unsupervised Learning - e.g. Unsupervised PCA dimensionality reduction with iris dataset

scikit-learn : Unsupervised_Learning - KMeans clustering with iris dataset

scikit-learn : Linearly Separable Data - Linear Model & (Gaussian) radial basis function kernel (RBF kernel)

scikit-learn : Decision Tree Learning I - Entropy, Gini, and Information Gain

scikit-learn : Decision Tree Learning II - Constructing the Decision Tree

scikit-learn : Random Decision Forests Classification

scikit-learn : Support Vector Machines (SVM)

scikit-learn : Support Vector Machines (SVM) II

Flask with Embedded Machine Learning I : Serializing with pickle and DB setup

Flask with Embedded Machine Learning II : Basic Flask App

Flask with Embedded Machine Learning III : Embedding Classifier

Flask with Embedded Machine Learning IV : Deploy

Flask with Embedded Machine Learning V : Updating the classifier

scikit-learn : Sample of a spam comment filter using SVM - classifying a good one or a bad one




Machine learning algorithms and concepts

Batch gradient descent algorithm

Single Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function

Batch gradient descent versus stochastic gradient descent

Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method

Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD)

Logistic Regression

VC (Vapnik-Chervonenkis) Dimension and Shatter

Bias-variance tradeoff

Maximum Likelihood Estimation (MLE)

Neural Networks with backpropagation for XOR using one hidden layer

minHash

tf-idf weight

Natural Language Processing (NLP): Sentiment Analysis I (IMDb & bag-of-words)

Natural Language Processing (NLP): Sentiment Analysis II (tokenization, stemming, and stop words)

Natural Language Processing (NLP): Sentiment Analysis III (training & cross validation)

Natural Language Processing (NLP): Sentiment Analysis IV (out-of-core)

Locality-Sensitive Hashing (LSH) using Cosine Distance (Cosine Similarity)




Artificial Neural Networks (ANN)

[Note] Sources are available at Github - Jupyter notebook files

1. Introduction

2. Forward Propagation

3. Gradient Descent

4. Backpropagation of Errors

5. Checking gradient

6. Training via BFGS

7. Overfitting & Regularization

8. Deep Learning I : Image Recognition (Image uploading)

9. Deep Learning II : Image Recognition (Image classification)

10 - Deep Learning III : Deep Learning III : Theano, TensorFlow, and Keras









Contact

BogoToBogo
contactus@bogotobogo.com

Follow Bogotobogo

About Us

contactus@bogotobogo.com

YouTubeMy YouTube channel
Pacific Ave, San Francisco, CA 94115

Pacific Ave, San Francisco, CA 94115

Copyright © 2024, bogotobogo
Design: Web Master