Objective-C: ARC

How to correctly use Automatic Reference Counting in your applications

Objective-C is a simple, elegant, and easy to use language, and it gives developers the opportunity to write great applications with it. Starting with iOS 5, a new very cool and powerful technology was introduced –  Automatic Reference Counting. Before that, to manage the life-cycle of objects, we had to use manual objects counting, where we have to take care of allocating and deallocating memory for objects automatically, and to avoid writing code that could lead to memory leaks or application crashing situations.

To do everything manually was always an undertaking task, especially for new programmers on the Apple’s platform, and memory management has been a common problem for many newbie programmers before. Fortunately, Automatic Reference Counting, or shortly ARC, eliminates the manual and error-prone steps involved in reference counted objects. This way we can concentrate more to designing objects relationships and the functionality of our application.

Whether you are new or already familiar to Objective-C, there is plenty to learn about this powerful technology, and the more you learn about it, you’ll be able to write simpler, more robust and maintainable code.

What exactly is ARC?

Automatic Reference Counting implements automatic reference counting for Objective-C objects and blocks, freeing the programmer from the need to explicitly insert retains and releases.

If I don’t take care of deallocating memory explicitly, who does it for me?

LLVM Compiler 3 does the memory management for us. It insert all the necessary lines of code to release the memory at the corresponding points in our programs, and more importantly, it does these all things at compile-time, not at run-time.

Simply said, with ARC you have to forget about retain, release and autorelease. Also, the compiler gives you errors, and not warnings whenever you send any of these messages to your objects.

If you add a Stack class to your Project (which is using ARC), and write the code as below:

 

1

 

2

After sending the release message to the _array object, you will notice the red exclamation sign on the left of the last line inside the pushint method:

 3

 

The above line will be added by the compiler at compile-time, and because of that, we are not allowed to write that when using ARC.

 We can also use convenience class methods:

 4

 

In the example above, nothing changes if we are using ARC or not.

 But if we are using convenience methods with instance variables, like in the following case:

 5

 

In this example, we’ll get surprised how our application crashes if we are doing it under manual memory management. The problem appears because the array created inside the if statement is autoreleased, and because it’s not being retained, it will be deallocated just after the init method is finishing execution. Any other lines of code we’ll try to execute thereafter that involve manipulating with that  instance variable will make our app crash. To fix that, we can replace the line:

 6

with the following line:

 7

Now we own the array and it won’t be deallocated. The app will not crash anymore (at least because of this line of code), but this made a memory leak to appear inside our program. To make everything work as supposed, we need to override the dealloc method:

8

 

The example above illustrate how easily we can cause memory leaks when doing memory management manually, if we just forget to release some objects inside the dealloc method. Also we can get in trouble trying to use objects that have already been deallocated, but we are not aware of it.

Under ARC, we don’t have this head-ache of releasing the allocated objects, because the compiler does everything on our behalf: keeps track of all the instance variables, and releases them accordingly.  And we do not have to override the dealloc method anymore, because it is done automatically for us. Basically this code is inserted in our program at compile-time:

9

 

Notice that the LLVM Compiler also calls the dealloc method of the superclass, and it will send the release messages to all the other instance variables if there are any. Actually, when using ARC, if we can  override the dealloc method, but we are not allowed to invoke [super dealloc]; or [_someObject release];.  It simply won’t compile. 

With ARC, we may even use the convenience class method (unlike when we compiled under manual reference counting which made the program crash):

10

 

And it works perfectly, just like if we would write:

11

 

If we add a pop method to the Stack class like this:

12

 

Under manual reference counting, this method will not work well, and may actually crash. This may happen because that lastObject that we want to return may only be retained by the array itself. In that case the when the object is removed from the array, the retain count is decremented, and the object is deallocated. So, this method is possible to return a deallocated object, which will make us unhappy at some point.

 We may fix a little the problem by writing this code:

 13

 

But another problem appears again: we will retain an retained object, which will cause a memory leak, and also violates the naming conventions used in Objective-C.  To make things go well, we have to autorelease the x object when we return it, as follows:

 14

 

This is how the problem is solved without ARC. When using ARC, we write robust code like this:

 15

 

And it works perfectly and error-free. What happens is that the x object will hold a strong reference to the lastObject of the array, and the removeLastObject message sent to the array object simply removes the strong reference of the array to that specific object, but the object itself is still present in memory and good to use. The other object in our program that may get initialized by sending the pop message to the object of the Stack class also holds a strong reference to that object, and when it gets destroyed – that object that persisted in memory will get destroyed as well.

Posted in Programming

Speech Tagging Using Python

Introduction

Natural Language Processing (NLP) is one of the most up and coming fields in scientific research. Natural language is the language we use
in day-to-day life. There are various reasons which make analysis of natural language complex in nature, like context, polysemy etc.
Various techniques are being developed by which computer programs can understand and interpret natural language. These programs are being used in various ways in the industry. For example:

  1. Automatic analysis of customer reviews
  2. Automatic categorizing of web pages
  3. Recommendation systems for various things

Why Python?

Python is a language that offers simple paradigms for creating powerful programs. It has excellent library for processing natural laguage- NLTK.
NLTK offers simple, consistent, extensible and modular way to create programs for natural language processing. In addition it has text corpora associated with it which can be used by learners and researchers for their purposes.

Required Software

The following software is needed for this tutorial:

Python
NLTK is supported by version 2.4 to 2.7
NLTK
All information on how to install NLTK can be found here.
NLTK-Data
This includes various text corpora which can be used in various ways. All information about it can be found here.

What is POS tagging?

Every word in any language has a particular class or lexical category– like noun, verb, adjective etc. This class is called part-of-speech (POS). The process of identifying a word’s class and labeling it accordingly is called POS tagging.

Some text corpora in NLTK come already tagged, you can use them for testing your program. Some classes are also available which will help you in tagging words. We will be discussing them here in detail.
Every text xorpus has their own set of tags they like to use. This set of tags is known as tagset. In this tutorial we will be using the following tagset:

TagMeaningExample

N Noun house, pen, mouse
NP Proper noun Anne, London, December
V Verb is, has, get, put
P Preposition on, of, with, in, into
PRO Pronoun he, she, them, they
CNJ Conjunction while, but, if, and
UH Interjection oops, bang, whee
ADJ Adjective good, bad, ugly, careful, reddish
ADV Adverb truly, falsely, mildly,swiftly, carefully
DET Determiner a, an, the, every, no

Automatically Tagging Words

Default Tagger

The concept of this tagger is pretty simple, you deifne a tagger which will assign the same tag to every word. For example consider the following code:

>>> import nltk
>>> raw_text='I am a little teapot, short and stout'
>>> tokens=nltk.word_tokenize(raw_text)
>>> default_tagger=nltk.DefaultTagger('NN')
>>> default_tagger.tag(tokens)
[('I', 'NN'), ('am', 'NN'), ('a', 'NN'), ('little', 'NN'), ('teapot', 'NN'), (',', 'NN'), ('short', 'NN'), ('and', 'NN'), ('stout', 'NN')]

This doesn’t really give accurate answer does it? After all little, short and stout are not nouns. However as we progress you will see that this has its own uses.

Tagging using Regular Expressions

You can tag words based on a regular expression. For example, words ending with “ing” are verbs(VBG) like running, playing, boxing; words ending with “er” are comparative adjectives(ADJ). Consider the following:

>>> import nltk
>>> text='I am running faster than light, I am lighter than light'
>>> text_tokens=nltk.word_tokenize(text)
>>> patterns=[
(r'.*ing$', 'V'),
(r'.*er$', 'ADJR'),
(r'.*est$', 'ADJS'),
(r'.*','N')
]
>>> reg_tagger=nltk.RegexpTagger(patterns)
>>> reg_tagger.tag(text_tokens)
[('I', 'N'), ('am', 'N'), ('running', 'V'), ('faster', 'ADJR'), ('than', 'N'), ('light', 'N'), (',', 'N'), ('I', 'N'), ('am', 'N'), ('lighter', 'ADJR'), ('than', 'N'), ('light', 'N')]

However defining regular expression for each and every word is difficult in most natural languages. Therefore the regular expression tagger is not so useful.

Unigram Tagging

In most natural languages one word can behave in different ways in a sentence. Many times a word can be used in 2 or more parts-of-speech. For example,th word ‘free’ behaves as an adjective in sentence 1 and as a verb in sentence two.

Sentence 1: 'After the civil war he was a free man.'
Sentence 2: 'He could finally free the legs of man buried under the car'

The unigram tagger works in a very simple manner. It assigns the most likely tag to a particular word. To find the “most likely” tag the unigram tagger must be trained first. This is where we will be using the tagged corpora

For training we are using the brown tagged corpus. This corpus has text on various categories. We use one category to train out tagger:

>>> tagged_sents=brown.tagged_sents(categories='lore')
>>> untagged_sents=brown.sents(categories='lore')
>>> unigram_tagger=nltk.UnigramTagger(tagged_sents)

Once the tagger has been trained we can tag different words using it.

>>> text='After the civil war he was a free man'
>>> tokens=nltk.word_tokenize(text)
>>> unigram_tagger.tag(tokens)
[('After', 'IN'), ('the', 'AT'), ('civil', 'JJ'), ('war', 'NN'), ('he', 'PPS'), ('was', 'BEDZ'), ('a', 'AT'), ('free', 'JJ'), ('man', 'NN')]
>>> text2='He could finally free the legs of man buried under the car'
>>> tokens2=nltk.word_tokenize(text2)
>>> unigram_tagger.tag(tokens2)
[('He', 'PPS'), ('could', 'MD'), ('finally', 'RB'), ('free', 'JJ'), ('the', 'AT'), ('legs', 'NNS'), ('of', 'IN'), ('man', 'NN'), ('buried', 'VBN'), ('under', 'IN'), ('the', 'AT'),
('car', 'NN')]

As you can see here again in the second example ‘free’ was tagged as JJ i.e. adjective while actually it was a verb. How ever unigram tagger is more accurate that the taggers we have seen before.

N-gram Tagging

The weakness of unigram tagger is that while tagging a particular word it doesn’t consider the words surrounding it. In the second sentence of above example if the tagger has see finally and adverb before free
then it would have tagged free as verb. N-gram tagging works on this principle. A N-gram tagger is a generalization of a unigram tagger whose context is the current word together with the part-of-speech tags of the n-1 preceding tokens.

Here I have used an bi-gram tagger:

>>> text='After the civil war he was a free man'
>>> tokens=nltk.word_tokenize(text)
>>> bigram_tagger=nltk.BigramTagger(tagged_sents)
>>> bigram_tagger.tag(tokens)
[('After', 'IN'), ('the', 'AT'), ('civil', 'JJ'), ('war', 'NN'), ('he', 'PPS'), ('was', 'BEDZ'), ('a', 'AT'), ('free', 'JJ'), ('man', 'NN')]

Conclusion

More often than not a combination of these taggers are used to identify parts-of-speech of a given word. Ngram tagging has given the most accurate result so far but the time it takes for Ngram tagging
is not suitable for real time applications.

Posted in Programming

MySQL Trigger

Below is a primer on MySQL Trigger I wrote for the MySQL course I taught.

Like the name suggests, triggers are MySQL objects that will objects when a specific event occurs (Insert, Update, Delete). So, for example I have an employee table that I’d like to track all updates to it:

Create the table : (Employee)

CREATE TABLE `employee` (

  `employee_id` int(11) NOT NULL,

  `name` varchar(50) NOT NULL,

  `salary` decimal(10,2) NOT NULL,

  PRIMARY KEY  (`employee_id`)

)

 

Create my log table: (update_logs)

CREATE TABLE update_logs (

update_id int(11) NOT NULL AUTO_INCREMENT,

employee_id int(11) NOT NULL,

`salary` decimal(10,2) NOT NULL,

  `name` varchar(50) NOT NULL,

updated_date datetime DEFAULT NULL,

PRIMARY KEY (update_id)

)

 

Now, Creating a trigger which generate the log of updates in the table

DELIMITER $$

CREATE TRIGGER before_update_employee

BEFORE UPDATE ON employee

FOR EACH ROW BEGIN

INSERT INTO update_logs SET

employee_id = OLD.employee_id,

salary = OLD.salary,

name = OLD.name,

updated_date = NOW(); END

 

To check the trigger, Update any row in the employee table, for example

UPDATE `employee` SET `name` = ‘test-updated’ WHERE `employee_id` =100;

You should see a new row inserted in the update_logs table.

 

Note: The above trigger  uses the BEFORE UPDATE Event, so the trigger is actually executed before the actual update in the employee table. This will allow me to track any “attempts”, including failed ones, to modify the Employee table.

Posted in Programming